Data Engineering

Forum Posts

Sorted by:

by Thefan • New Contributor II

04-28-2022 2:44:41 AM

2232 Views
0 replies
1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

Data Engineering

2232 Views
0 replies
1 kudos

04-28-2022 2:44:41 AM

by shawncao • New Contributor II

04-27-2022 11:25:17 PM

4680 Views
0 replies
0 kudos

REST api to execute SQL query and read output

Hi there,I'm using these two APIs to execute SQL statements and read output back when it's finished. However, seems it always returns only 1000 rows even though I need all the results (millions of rows), is there a solution for this? execute SQL: htt...

Data Engineering

4680 Views
0 replies
0 kudos

04-27-2022 11:25:17 PM

by Jackie • New Contributor II

03-08-2022 2:55:26 PM

7264 Views
3 replies
6 kudos

Resolved! speed up a for loop in python (azure databrick)

code example# a list of file pathlist_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]# copy all file above to this folderdest_path=""/dbfs/mnt/..."for file_path in list_files_path: # copy function copy_file(file_path, dest_path)I am runni...

Data Engineering

7264 Views
3 replies
6 kudos

03-08-2022 2:55:26 PM

View Replies

Latest Reply

Hemant
Valued Contributor II

04-27-2022 7:07:59 PM

6 kudos

@Jackie Chan , What's the data size you want to copy? If it's bigger, then use ADF.

6 kudos

04-27-2022 7:07:59 PM

2 More Replies

by 818674 • New Contributor III

03-28-2022 4:30:11 PM

13415 Views
10 replies
8 kudos

Resolved! How to perform a cross-check for data in multiple columns in same table?

I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...

Data Engineering

13415 Views
10 replies
8 kudos

03-28-2022 4:30:11 PM

View Replies

Latest Reply

818674
New Contributor III

04-26-2022 2:58:02 PM

8 kudos

Hi,Thank you very much for following up. I no longer need assistance with this issue.

8 kudos

04-26-2022 2:58:02 PM

9 More Replies

by deisou • New Contributor

02-28-2022 9:56:37 PM

5399 Views
4 replies
2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

Data Engineering

5399 Views
4 replies
2 kudos

02-28-2022 9:56:37 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 9:33:07 AM

2 kudos

Hi @deisou Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

2 kudos

04-27-2022 9:33:07 AM

3 More Replies

by rgrosskopf • New Contributor II

04-27-2022 9:06:04 AM

1631 Views
0 replies
1 kudos

How to use Databricks Feature Store for time series forecasts?

I've seen the Databricks documentation on time series here. I'm using forecasts as a feature and those forecasts have both an as-of timestamp (when the forecast was generated) and a time step label (timestamp indicating the time of the forecasted obs...

Data Engineering

1631 Views
0 replies
1 kudos

04-27-2022 9:06:04 AM

by Kyle • New Contributor II

02-15-2022 1:57:15 PM

27006 Views
5 replies
4 kudos

Resolved! What's the best way to manage multiple versions of the same datasets?

We have use cases that require multiple versions of the same datasets to be available. For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names ri...

Data Engineering

27006 Views
5 replies
4 kudos

02-15-2022 1:57:15 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 9:00:03 AM

4 kudos

Hey there @Kyle Gao Hope you are doing well. Thank you for posting your query.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Cheers!

4 kudos

04-27-2022 9:00:03 AM

4 More Replies

by Darshana_Ganesh • New Contributor II

02-16-2022 9:59:55 PM

4268 Views
4 replies
2 kudos

Resolved! Post upgrading the Azure databricks cluster from 8.3 (includes Apache Spark 3.1.1, Scala 2.12) to 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I am getting intermittent error.

The error is as below. The error is intermittent. eg. - The same code throws the below issue for run 3 but doesn't throws issue for run 4. Then again throws issue for run 5.An error occurred while calling o1509.getCause. Trace:py4j.security.Py4JSecur...

Data Engineering

4268 Views
4 replies
2 kudos

02-16-2022 9:59:55 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 8:46:55 AM

2 kudos

Hey @Darshana Ganesh Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

2 kudos

04-27-2022 8:46:55 AM

3 More Replies

by Development • New Contributor III

04-12-2022 11:25:00 PM

7426 Views
5 replies
5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

Data Engineering

7426 Views
5 replies
5 kudos

04-12-2022 11:25:00 PM

View Replies

Latest Reply

Development
New Contributor III

04-27-2022 8:27:46 AM

5 kudos

@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....

5 kudos

04-27-2022 8:27:46 AM

4 More Replies

by manasa • Databricks Partner

02-13-2022 11:30:27 PM

6553 Views
7 replies
2 kudos

Resolved! Recursive view error while using spark 3.2.0 version

This happens while creating temp view using below code blocklatest_data.createOrReplaceGlobalTempView("e_test")ideally this command should replace the view if e_test already exists instead it is throwing"Recursive view `global_temp`.`e_test` detecte...

Data Engineering

6553 Views
7 replies
2 kudos

02-13-2022 11:30:27 PM

View Replies

Latest Reply

shan_chandra
Databricks Employee

03-30-2022 7:10:40 PM

2 kudos

Hi, @Manasa, could you please check SPARK-38318 and use Spark 3.1.2, Spark 3.2.2, or Spark 3.3.0 to allow cyclic reference?

2 kudos

03-30-2022 7:10:40 PM

6 More Replies

by sophia1 • New Contributor

04-27-2022 12:17:05 AM

1255 Views
0 replies
0 kudos

back pain 26

Hundreds of thousands of people in the United States suffer from back pain at some point in their lives. Because of this, you don't have to suffer greatly. The advice in this article can assist you in lessening the daily agony that you experience. Pa...

Data Engineering

1255 Views
0 replies
0 kudos

04-27-2022 12:17:05 AM

by AmanSehgal • Honored Contributor III

04-26-2022 5:15:17 AM

9319 Views
1 replies
11 kudos

Resolved! How to merge all the columns into one column as JSON?

I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF:

Data Engineering

9319 Views
1 replies
11 kudos

04-26-2022 5:15:17 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

04-27-2022 12:14:26 AM

11 kudos

I was able to do this by converting df to rdd and then by applying map function to it.rdd_1 = df.rdd.map(lambda row: (row['ID'], row.asDict() ) ) ...

11 kudos

04-27-2022 12:14:26 AM

by Anonymous • Not applicable

04-26-2022 5:32:19 PM

2582 Views
0 replies
0 kudos

How Can we pass parameters from the data factory to databricks Job that is using a notebook

How Can I pass parameters from the data factory to databricks Jobs that is using a notebook but I know how to pass parameters from data factory to databricks notebooks when ADF calling directly the Notebook.

Data Engineering

2582 Views
0 replies
0 kudos

04-26-2022 5:32:19 PM

by Emiel_Smeenk • New Contributor III

04-12-2022 10:15:02 AM

18485 Views
5 replies
8 kudos

Resolved! Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Hello,We are working to migrate to databricks runtime 10.4 LTS from 9.1 LTS but we're running into weird behavioral issues. Our existing code works up until runtime 10.3 and in 10.4 it stopped working.Problem:We have a nested json file that we are fl...

Data Engineering

18485 Views
5 replies
8 kudos

04-12-2022 10:15:02 AM

View Replies

Latest Reply

Emiel_Smeenk
New Contributor III

04-20-2022 8:59:22 AM

8 kudos

It seems like the issue was miraculously resolved. I did not make any code changes but everything is now running as expected. Maybe the latest runtime 10.4 fix released on April 19th also resolved this issue unintentionally.

8 kudos

04-20-2022 8:59:22 AM

4 More Replies

by nickg • New Contributor III

03-30-2022 11:16:10 AM

6972 Views
6 replies
3 kudos

Resolved! I am looking to use the pivot function with Spark SQL (not Python)

Hello. I am trying to using the Pivot function for email addresses. This is what I have so far:Select fname, lname, awUniqueID, Email1, Email2From xxxxxxxxPivot ( count(Email) as Test For Email In (1 as Email1, 2 as Email2) )I get everyth...

Data Engineering

6972 Views
6 replies
3 kudos

03-30-2022 11:16:10 AM

View Replies

Latest Reply

nickg
New Contributor III

03-30-2022 11:46:23 AM

3 kudos

source data:fname lname awUniqueID EmailJohn Smith 22 jsmith@gmail.comJODI JONES 22 jsmith@live.comDesired output:fname lname awUniqueID Em...

3 kudos

03-30-2022 11:46:23 AM

5 More Replies

Databricks Community

Forum Posts

Koalas dropna in DLT

REST api to execute SQL query and read output

Resolved! speed up a for loop in python (azure databrick)

Resolved! How to perform a cross-check for data in multiple columns in same table?

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

How to use Databricks Feature Store for time series forecasts?

Resolved! What's the best way to manage multiple versions of the same datasets?

Resolved! Post upgrading the Azure databricks cluster from 8.3 (includes Apache Spark 3.1.1, Scala 2.12) to 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I am getting intermittent error.

Delta Table with 130 columns taking time

Resolved! Recursive view error while using spark 3.2.0 version

back pain 26

Resolved! How to merge all the columns into one column as JSON?

How Can we pass parameters from the data factory to databricks Job that is using a notebook

Resolved! Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Resolved! I am looking to use the pivot function with Spark SQL (not Python)

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template