Data Engineering

Forum Posts

Sorted by:

by UmaMahesh1 • Honored Contributor III

12-01-2022 11:26:31 AM

8835 Views
2 replies
15 kudos

Resolved! Pyspark dataframe column comparison

I have a string column which is a concatenation of elements with a hyphen as follows. Let 3 values from that column looks like below, Row 1 - A-B-C-D-E-FRow 2 - A-B-G-C-D-E-FRow 3 - A-B-G-D-E-FI want to compare 2 consecutive rows and create a column ...

Data Engineering

8835 Views
2 replies
15 kudos

12-01-2022 11:26:31 AM

View Replies

Latest Reply

NhatHoang
Valued Contributor II

12-02-2022 8:03:13 PM

15 kudos

Hi,I think you can follow these steps:1. Use window function to create a new column by shifting, then your df will look like thisid value lag1 A-B-C-D-E-F null2 A-B-G-C-D-E-F A-B-C-D-E-F3 A-B-G-D-E-F ...

15 kudos

12-02-2022 8:03:13 PM

1 More Replies

by cozos • New Contributor III

11-30-2022 9:06:46 PM

8326 Views
5 replies
5 kudos

What does "ScalaDriverLocal: User Code Compile error" mean?

22/11/30 01:45:31 WARN ScalaDriverLocal: loadLibraries: Libraries failed to be installed: Set() 22/11/30 01:50:14 INFO Utils: resolved command to be run: WrappedArray(getconf, PAGESIZE) 22/11/30 01:50:15 WARN ScalaDriverLocal: User Code Compile err...

Data Engineering

8326 Views
5 replies
5 kudos

11-30-2022 9:06:46 PM

View Replies

Latest Reply

cozos
New Contributor III

12-01-2022 1:35:53 PM

5 kudos

Hi @Werner Stinckens thanks for the help. Unfortunately I don't think its so simple - I do have a JAR that I submitted as a Databricks JAR task, and the JAR does have the org.apache.beam class: I guess what I'm trying to understand is what does Scal...

5 kudos

12-01-2022 1:35:53 PM

4 More Replies

by vr • Valued Contributor

11-26-2022 4:26:24 PM

20569 Views
11 replies
9 kudos

Why is execution too fast?

I have a table, full scan of which takes ~20 minutes on my cluster. The table has "Time" TIMESTAMP column and "day" DATE column. The latter is computed (manually) as "Time" truncated to day and used for partitioning.I query the table using predicate ...

Data Engineering

20569 Views
11 replies
9 kudos

11-26-2022 4:26:24 PM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

11-27-2022 6:40:45 AM

9 kudos

Hi @Vladimir Ryabtsev ,Because you are creating a delta table, I think that you are seeing a performance improvement because of Dynamic Partition pruning, According to the documentation, "Partition pruning can take place at query compilation time wh...

9 kudos

11-27-2022 6:40:45 AM

10 More Replies

by jd1 • New Contributor II

11-10-2022 6:56:33 AM

2118 Views
1 replies
3 kudos

Hello, When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path ...

Hello,When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path will add the full path to the cell in the notebook. This is annoying behaviour, since you end up with...

Data Engineering

2118 Views
1 replies
3 kudos

11-10-2022 6:56:33 AM

View Replies

Latest Reply

UmaMahesh1
Honored Contributor III

12-02-2022 12:35:43 PM

3 kudos

Someone heard you In the experimental Monaco editor, I found this particular issue not appearing.

3 kudos

12-02-2022 12:35:43 PM

by stinodego • New Contributor III

11-18-2022 12:48:49 AM

6985 Views
8 replies
19 kudos

Python job run error messages are unreadable

This has been going on for some time now; all errors look like this (note the weird `[0;34m` marks everywhere). How can we fix this?We're not doing anything crazy, this is just the latest runtime with pretty much the simplest possible hello world pro...

Data Engineering

6985 Views
8 replies
19 kudos

11-18-2022 12:48:49 AM

View Replies

Latest Reply

VaibB
Contributor

12-02-2022 12:03:34 PM

19 kudos

Have you tried detaching and reattaching the notebook? Or Cluster restart? Did you check you are not importing any specific library someone else with the right access might have installed some library with install to all clusters as checked.

19 kudos

12-02-2022 12:03:34 PM

7 More Replies

by cmilligan • Contributor II

12-02-2022 10:32:10 AM

11048 Views
2 replies
6 kudos

Resolved! How to go up two folders using relative path in %run?

I'm wanting to store a notebook with functions two folders up from the current notebook. I know that I can start the path with ../ to go up one folder but when I've tried .../ it won't go up two folders. Is there a way to do this?

Data Engineering

11048 Views
2 replies
6 kudos

12-02-2022 10:32:10 AM

View Replies

Latest Reply

VaibB
Contributor

12-02-2022 11:40:41 AM

6 kudos

In order to access a notebook in the current folder use ../notebook_2to go 2 folders up and access (say notebook "secret") use ../../secret

6 kudos

12-02-2022 11:40:41 AM

1 More Replies

by Smitha1 • Databricks Partner

12-02-2022 10:26:47 AM

1601 Views
1 replies
6 kudos

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis @Nadia Elsayed @Vidula Khanna @Jose Gonzalez @Harshjot Singh you al...

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis @Nadia Elsayed @Vidula Khanna @Jose Gonzalez @Harshjot Singh you all are fabulous bunch of teams and very helpful.Thanks very much for your responses when asked. Happy...

Data Engineering

1601 Views
1 replies
6 kudos

12-02-2022 10:26:47 AM

View Replies

Latest Reply

Harshjot
Contributor III

12-02-2022 10:51:58 AM

6 kudos

@Smitha Nelapati so happy to see that the issue is resolved

6 kudos

12-02-2022 10:51:58 AM

by vr • Valued Contributor

11-26-2022 4:03:04 PM

9694 Views
5 replies
6 kudos

Resolved! How to avoid trimming in EXPLAIN?

I am looking on EXPLAIN EXTENDED plan for a statement.In == Physical Plan == section, I go down to FileScan node and see a lot of ellipsis, like +- FileScan parquet schema.table[Time#8459,TagName#8460,Value#8461,Quality#8462,day#8...

Data Engineering

9694 Views
5 replies
6 kudos

11-26-2022 4:03:04 PM

View Replies

Latest Reply

SS2
Valued Contributor

11-29-2022 11:09:23 AM

6 kudos

I also faced the same

6 kudos

11-29-2022 11:09:23 AM

4 More Replies

by Retko • Contributor

12-01-2022 5:04:08 AM

26291 Views
5 replies
8 kudos

Databricks notebook sometime takes too long to run query (even on empty table)

Hi,sometime I notice that running a query takes too long - even simple queries - and next time when I run same query it runs much faster. I have cluster running (DBR 10.4 LTS • 5 workers) and it has constantly several workers.An Example of query is s...

Data Engineering

26291 Views
5 replies
8 kudos

12-01-2022 5:04:08 AM

View Replies

Latest Reply

j_afanador
Contributor II

12-02-2022 10:03:48 AM

8 kudos

Probably the cluster is always in use and the query always falls into the processing query, or the cluster auto stops every time that you use it.

8 kudos

12-02-2022 10:03:48 AM

4 More Replies

by Nayan7276 • Databricks Partner

11-30-2022 12:03:40 AM

4208 Views
4 replies
26 kudos

First post on databricks community

Hello Guys!This my first databricks community post. Looking forward to contribute from my end

Data Engineering

4208 Views
4 replies
26 kudos

11-30-2022 12:03:40 AM

View Replies

Latest Reply

Diva
Contributor

12-02-2022 9:32:42 AM

26 kudos

Welcome to community

26 kudos

12-02-2022 9:32:42 AM

3 More Replies

by augustin • New Contributor II

11-02-2022 8:16:05 AM

7575 Views
5 replies
5 kudos

Mount an uncrypted AWS EFS in AWS Databricks

Hi,I want to mount an uncrypted AWS EFS in AWS Databricks. When I do:mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2,noresvport fs-abcdef.efs.region.amazonaws.com:/ /mnt/efs-uncryptedI get this error:mount.nfs4: moun...

Data Engineering

7575 Views
5 replies
5 kudos

11-02-2022 8:16:05 AM

View Replies

Latest Reply

Andrei_Radulesc
Contributor III

12-02-2022 8:46:43 AM

5 kudos

"To support NFS under LXC, some of the apparmor protections need to be lifted." (see https://theorangeone.net/posts/mount-nfs-inside-lxc/)

5 kudos

12-02-2022 8:46:43 AM

4 More Replies

by sqlshep • New Contributor III

11-05-2022 5:47:26 PM

6334 Views
3 replies
1 kudos

Hello, i have a dashboard using map markers that has been working for the last few days, suddenly i am getting an error in the dashboard and in the rendered map on the query.

Data Engineering

6334 Views
3 replies
1 kudos

11-05-2022 5:47:26 PM

View Replies

Latest Reply

sqlshep
New Contributor III

12-02-2022 5:50:10 AM

1 kudos

Its broken again, i am seeing this several times a week, and it is offline for hours at a time.

1 kudos

12-02-2022 5:50:10 AM

2 More Replies

by hitesh1 • New Contributor III

08-17-2022 3:08:40 PM

10924 Views
1 replies
5 kudos

java.util.NoSuchElementException: key not found

Hello,We are using a Azure Databricks with Standard DS14_V2 Cluster with Runtime 9.1 LTS, Spark 3.1.2 and Scala 2.12 and facing the below issue frequently when running our ETL pipeline. As part of the operation that is failing there are several joins...

Data Engineering

10924 Views
1 replies
5 kudos

08-17-2022 3:08:40 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-02-2022 1:53:14 AM

5 kudos

Hey man,Please use these configuration in your cluster and it will work,spark.sql.storeAssignmentPolicy LEGACYspark.sql.parquet.binaryAsString truespark.speculation falsespark.sql.legacy.timeParserPolicy LEGACYif it wont work let me know what problem...

5 kudos

12-02-2022 1:53:14 AM

by Jack • New Contributor II

05-02-2022 6:43:59 AM

10203 Views
1 replies
1 kudos

Python: Generate new dfs from a list of dataframes using for loop

I have a list of dataframes (for this example 2) and want to apply a for-loop to the list of frames to generate 2 new dataframes. To start, here is my starting dataframe called df_final:First, I create 2 dataframes: df2_b2c_fast, df2_b2b_fast:for x i...

Data Engineering

10203 Views
1 replies
1 kudos

05-02-2022 6:43:59 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-02-2022 1:44:45 AM

1 kudos

thanks

1 kudos

12-02-2022 1:44:45 AM

by isaac_gritz • Databricks Employee

08-22-2022 11:29:14 PM

3218 Views
1 replies
6 kudos

Databricks Security Review

Conducting a security review or vendor assessment of Databricks and looking to learn more about our security features, compliance information, and privacy policies?You can find the latest on Databricks security features, architecture, compliance and ...

Data Engineering

3218 Views
1 replies
6 kudos

08-22-2022 11:29:14 PM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

12-02-2022 1:43:19 AM

6 kudos

thanks man

6 kudos

12-02-2022 1:43:19 AM

Databricks Community

Forum Posts

Resolved! Pyspark dataframe column comparison

What does "ScalaDriverLocal: User Code Compile error" mean?

Why is execution too fast?

Hello, When working in a python notebook and using tab-complete to navigate the file system, I find that pressing enter on a partially completed path ...

Python job run error messages are unreadable

Resolved! How to go up two folders using relative path in %run?

Just a shout out to Databricks Support team and customers!@Joseph Kambourakis @Nadia Elsayed @Vidula Khanna @Jose Gonzalez @Harshjot Singh you al...

Resolved! How to avoid trimming in EXPLAIN?

Databricks notebook sometime takes too long to run query (even on empty table)

First post on databricks community

Mount an uncrypted AWS EFS in AWS Databricks

Hello, i have a dashboard using map markers that has been working for the last few days, suddenly i am getting an error in the dashboard and in the rendered map on the query.

java.util.NoSuchElementException: key not found

Python: Generate new dfs from a list of dataframes using for loop

Databricks Security Review

json file existing in volume but not showing in UI

Soumitra dutta : What are the essential concepts a...

Recurring Historical Data Modeling Patterns

How execute SET spark.sql.sources.partitionOverwri...

Databricks Python stored procedures