Data Engineering

Forum Posts

Sorted by:

by alejandrofm • Valued Contributor

02-28-2022 5:43:25 AM

3851 Views
4 replies
4 kudos

Resolved! Are there any recommended spark config settings for Delta/Databricks?

Hi! I'm starting to test configs on DataBricks, for example, to avoid corrupting data if two processes try to write at the same time:.config('spark.databricks.delta.multiClusterWrites.enabled', 'false')Or if I need more partitions than default .confi...

Data Engineering

3851 Views
4 replies
4 kudos

02-28-2022 5:43:25 AM

View Replies

Latest Reply

Anonymous
Not applicable

04-28-2022 9:27:40 AM

4 kudos

Hey there @Alejandro Martinez Hope everything is going well.Just wanted to see if you were able to find an answer to your question. If yes, would you be happy to let us know and mark it as best so that other members can find the solution more quickl...

4 kudos

04-28-2022 9:27:40 AM

3 More Replies

by DejanSunderic • New Contributor III

08-04-2016 10:49:04 AM

14112 Views
11 replies
3 kudos

is command stuck?

I created some ETL using DataFrames in python. It used to run 180 sec. But it is not taking ~ 1200 sec. I have been changing it, so it could be something that I introduced, or something in the environment.Part of the process is appending results into...

Data Engineering

14112 Views
11 replies
3 kudos

08-04-2016 10:49:04 AM

View Replies

Latest Reply

Carneiro
New Contributor II

04-28-2022 9:09:29 AM

3 kudos

I am having a problem very similar. Since yesterday, without a known reason, some commands that used to run daily are now stuck in a "Running command" state. Commands like: dataframe.show(n=1) dataframe.toPandas() dataframe.description() dataframe.wr...

3 kudos

04-28-2022 9:09:29 AM

10 More Replies

by Thefan • New Contributor II

04-28-2022 2:44:41 AM

1064 Views
0 replies
1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

Data Engineering

1064 Views
0 replies
1 kudos

04-28-2022 2:44:41 AM

by shawncao • New Contributor II

04-27-2022 11:25:17 PM

4149 Views
0 replies
0 kudos

REST api to execute SQL query and read output

Hi there,I'm using these two APIs to execute SQL statements and read output back when it's finished. However, seems it always returns only 1000 rows even though I need all the results (millions of rows), is there a solution for this? execute SQL: htt...

Data Engineering

4149 Views
0 replies
0 kudos

04-27-2022 11:25:17 PM

by Jackie • New Contributor II

03-08-2022 2:55:26 PM

6252 Views
3 replies
6 kudos

Resolved! speed up a for loop in python (azure databrick)

code example# a list of file pathlist_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]# copy all file above to this folderdest_path=""/dbfs/mnt/..."for file_path in list_files_path: # copy function copy_file(file_path, dest_path)I am runni...

Data Engineering

6252 Views
3 replies
6 kudos

03-08-2022 2:55:26 PM

View Replies

Latest Reply

Hemant
Valued Contributor II

04-27-2022 7:07:59 PM

6 kudos

@Jackie Chan , What's the data size you want to copy? If it's bigger, then use ADF.

6 kudos

04-27-2022 7:07:59 PM

2 More Replies

by 818674 • New Contributor III

03-28-2022 4:30:11 PM

9623 Views
10 replies
8 kudos

Resolved! How to perform a cross-check for data in multiple columns in same table?

I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...

Data Engineering

9623 Views
10 replies
8 kudos

03-28-2022 4:30:11 PM

View Replies

Latest Reply

818674
New Contributor III

04-26-2022 2:58:02 PM

8 kudos

Hi,Thank you very much for following up. I no longer need assistance with this issue.

8 kudos

04-26-2022 2:58:02 PM

9 More Replies

by deisou • New Contributor

02-28-2022 9:56:37 PM

3824 Views
4 replies
2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

Data Engineering

3824 Views
4 replies
2 kudos

02-28-2022 9:56:37 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 9:33:07 AM

2 kudos

Hi @deisou Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

2 kudos

04-27-2022 9:33:07 AM

3 More Replies

by rgrosskopf • New Contributor II

04-27-2022 9:06:04 AM

1100 Views
0 replies
1 kudos

How to use Databricks Feature Store for time series forecasts?

I've seen the Databricks documentation on time series here. I'm using forecasts as a feature and those forecasts have both an as-of timestamp (when the forecast was generated) and a time step label (timestamp indicating the time of the forecasted obs...

Data Engineering

1100 Views
0 replies
1 kudos

04-27-2022 9:06:04 AM

by Kyle • New Contributor II

02-15-2022 1:57:15 PM

24885 Views
5 replies
4 kudos

Resolved! What's the best way to manage multiple versions of the same datasets?

We have use cases that require multiple versions of the same datasets to be available. For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names ri...

Data Engineering

24885 Views
5 replies
4 kudos

02-15-2022 1:57:15 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 9:00:03 AM

4 kudos

Hey there @Kyle Gao Hope you are doing well. Thank you for posting your query.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Cheers!

4 kudos

04-27-2022 9:00:03 AM

4 More Replies

by Darshana_Ganesh • New Contributor II

02-16-2022 9:59:55 PM

3252 Views
4 replies
2 kudos

Resolved! Post upgrading the Azure databricks cluster from 8.3 (includes Apache Spark 3.1.1, Scala 2.12) to 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I am getting intermittent error.

The error is as below. The error is intermittent. eg. - The same code throws the below issue for run 3 but doesn't throws issue for run 4. Then again throws issue for run 5.An error occurred while calling o1509.getCause. Trace:py4j.security.Py4JSecur...

Data Engineering

3252 Views
4 replies
2 kudos

02-16-2022 9:59:55 PM

View Replies

Latest Reply

Anonymous
Not applicable

04-27-2022 8:46:55 AM

2 kudos

Hey @Darshana Ganesh Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

2 kudos

04-27-2022 8:46:55 AM

3 More Replies

by Development • New Contributor III

04-12-2022 11:25:00 PM

5589 Views
5 replies
5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

Data Engineering

5589 Views
5 replies
5 kudos

04-12-2022 11:25:00 PM

View Replies

Latest Reply

Development
New Contributor III

04-27-2022 8:27:46 AM

5 kudos

@Kaniz Fatma @Parker Temple I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing serialization issue ....

5 kudos

04-27-2022 8:27:46 AM

4 More Replies

by manasa • Contributor

02-13-2022 11:30:27 PM

5105 Views
7 replies
2 kudos

Resolved! Recursive view error while using spark 3.2.0 version

This happens while creating temp view using below code blocklatest_data.createOrReplaceGlobalTempView("e_test")ideally this command should replace the view if e_test already exists instead it is throwing"Recursive view `global_temp`.`e_test` detecte...

Data Engineering

5105 Views
7 replies
2 kudos

02-13-2022 11:30:27 PM

View Replies

Latest Reply

shan_chandra
Databricks Employee

03-30-2022 7:10:40 PM

2 kudos

Hi, @Manasa, could you please check SPARK-38318 and use Spark 3.1.2, Spark 3.2.2, or Spark 3.3.0 to allow cyclic reference?

2 kudos

03-30-2022 7:10:40 PM

6 More Replies

by sophia1 • New Contributor

04-27-2022 12:17:05 AM

879 Views
0 replies
0 kudos

back pain 26

Hundreds of thousands of people in the United States suffer from back pain at some point in their lives. Because of this, you don't have to suffer greatly. The advice in this article can assist you in lessening the daily agony that you experience. Pa...

Data Engineering

879 Views
0 replies
0 kudos

04-27-2022 12:17:05 AM

by AmanSehgal • Honored Contributor III

04-26-2022 5:15:17 AM

7674 Views
1 replies
11 kudos

Resolved! How to merge all the columns into one column as JSON?

I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF:

Data Engineering

7674 Views
1 replies
11 kudos

04-26-2022 5:15:17 AM

View Replies

Latest Reply

AmanSehgal
Honored Contributor III

04-27-2022 12:14:26 AM

11 kudos

I was able to do this by converting df to rdd and then by applying map function to it.rdd_1 = df.rdd.map(lambda row: (row['ID'], row.asDict() ) ) ...

11 kudos

04-27-2022 12:14:26 AM

by Anonymous • Not applicable

04-26-2022 5:32:19 PM

1992 Views
0 replies
0 kudos

How Can we pass parameters from the data factory to databricks Job that is using a notebook

How Can I pass parameters from the data factory to databricks Jobs that is using a notebook but I know how to pass parameters from data factory to databricks notebooks when ADF calling directly the Notebook.

Data Engineering

1992 Views
0 replies
0 kudos

04-26-2022 5:32:19 PM

User

Count

1611

768

348

286

252

Databricks Community

Forum Posts

Resolved! Are there any recommended spark config settings for Delta/Databricks?

is command stuck?

Koalas dropna in DLT

REST api to execute SQL query and read output

Resolved! speed up a for loop in python (azure databrick)

Resolved! How to perform a cross-check for data in multiple columns in same table?

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

How to use Databricks Feature Store for time series forecasts?

Resolved! What's the best way to manage multiple versions of the same datasets?

Resolved! Post upgrading the Azure databricks cluster from 8.3 (includes Apache Spark 3.1.1, Scala 2.12) to 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I am getting intermittent error.

Delta Table with 130 columns taking time

Resolved! Recursive view error while using spark 3.2.0 version

back pain 26

Resolved! How to merge all the columns into one column as JSON?

How Can we pass parameters from the data factory to databricks Job that is using a notebook

Join Us as a Local Community Builder!

global temp view issue

Dlt pipeline showing legacy , even though all thin...

SERVERLESS SQL WAREHOUSE

Unity Catalog Table in Databricks Asset Bundle

Databricks data engineer associate exam