cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Thefan
by New Contributor II
  • 2232 Views
  • 0 replies
  • 1 kudos

Koalas dropna in DLT

Greetings !I've been trying out DLT for a few days but I'm running into an unexpected issue when trying to use Koalas dropna in my pipeline.My goal is to drop all columns that contain only null/na values before writing it.Current code is this : @dlt...

  • 2232 Views
  • 0 replies
  • 1 kudos
shawncao
by New Contributor II
  • 4680 Views
  • 0 replies
  • 0 kudos

REST api to execute SQL query and read output

Hi there,I'm using these two APIs to execute SQL statements and read output back when it's finished. However, seems it always returns only 1000 rows even though I need all the results (millions of rows), is there a solution for this? execute SQL: htt...

  • 4680 Views
  • 0 replies
  • 0 kudos
Jackie
by New Contributor II
  • 7264 Views
  • 3 replies
  • 6 kudos

Resolved! speed up a for loop in python (azure databrick)

code example# a list of file pathlist_files_path = ["/dbfs/mnt/...", ..., "/dbfs/mnt/..."]# copy all file above to this folderdest_path=""/dbfs/mnt/..."for file_path in list_files_path: # copy function copy_file(file_path, dest_path)I am runni...

  • 7264 Views
  • 3 replies
  • 6 kudos
Latest Reply
Hemant
Valued Contributor II
  • 6 kudos

@Jackie Chan​ , What's the data size you want to copy? If it's bigger, then use ADF.

  • 6 kudos
2 More Replies
818674
by New Contributor III
  • 13415 Views
  • 10 replies
  • 8 kudos

Resolved! How to perform a cross-check for data in multiple columns in same table?

I am trying to check whether a certain datapoint exists in multiple locations.This is what my table looks like:I am checking whether the same datapoint is in two locations. The idea is that this datapoint should exist in BOTH locations, and be counte...

Table Examples of Results for Cross-Checking
  • 13415 Views
  • 10 replies
  • 8 kudos
Latest Reply
818674
New Contributor III
  • 8 kudos

Hi,Thank you very much for following up. I no longer need assistance with this issue.

  • 8 kudos
9 More Replies
deisou
by New Contributor
  • 5399 Views
  • 4 replies
  • 2 kudos

Resolved! What is the best strategy for backing up a large Databricks Delta table that is stored in Azure blob storage?

I have a large delta table that I would like to back up and I am wondering what is the best practice for backing it up. The goal is so that if there is any accidental corruption or data loss either at the Azure blob storage level or within Databricks...

  • 5399 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hi @deisou​ Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark the answer as best? If not, please tell us so we can help you.Cheers!

  • 2 kudos
3 More Replies
rgrosskopf
by New Contributor II
  • 1631 Views
  • 0 replies
  • 1 kudos

How to use Databricks Feature Store for time series forecasts?

I've seen the Databricks documentation on time series here. I'm using forecasts as a feature and those forecasts have both an as-of timestamp (when the forecast was generated) and a time step label (timestamp indicating the time of the forecasted obs...

  • 1631 Views
  • 0 replies
  • 1 kudos
Kyle
by New Contributor II
  • 27006 Views
  • 5 replies
  • 4 kudos

Resolved! What's the best way to manage multiple versions of the same datasets?

We have use cases that require multiple versions of the same datasets to be available. For example, we have a knowledge graph made of entities of relations, and we have multiple versions of the knowledge graph that's distinguished by schema names ri...

  • 27006 Views
  • 5 replies
  • 4 kudos
Latest Reply
Anonymous
Not applicable
  • 4 kudos

Hey there @Kyle Gao​ Hope you are doing well. Thank you for posting your query.Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Cheers!

  • 4 kudos
4 More Replies
Darshana_Ganesh
by New Contributor II
  • 4268 Views
  • 4 replies
  • 2 kudos

Resolved! Post upgrading the Azure databricks cluster from 8.3 (includes Apache Spark 3.1.1, Scala 2.12) to 9.1 LTS (includes Apache Spark 3.1.2, Scala 2.12), I am getting intermittent error.

The error is as below. The error is intermittent. eg. - The same code throws the below issue for run 3 but doesn't throws issue for run 4. Then again throws issue for run 5.An error occurred while calling o1509.getCause. Trace:py4j.security.Py4JSecur...

  • 4268 Views
  • 4 replies
  • 2 kudos
Latest Reply
Anonymous
Not applicable
  • 2 kudos

Hey @Darshana Ganesh​ Just wanted to check in if you were able to resolve your issue or do you need more help? We'd love to hear from you.Thanks!

  • 2 kudos
3 More Replies
Development
by New Contributor III
  • 7426 Views
  • 5 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 7426 Views
  • 5 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
4 More Replies
manasa
by Databricks Partner
  • 6553 Views
  • 7 replies
  • 2 kudos

Resolved! Recursive view error while using spark 3.2.0 version

This happens while creating temp view using below code blocklatest_data.createOrReplaceGlobalTempView("e_test")ideally this command should replace the view if e_test already exists instead it is throwing"Recursive view `global_temp`.`e_test` detecte...

  • 6553 Views
  • 7 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

Hi, @Manasa​, could you please check SPARK-38318 and use Spark 3.1.2, Spark 3.2.2, or Spark 3.3.0 to allow cyclic reference?

  • 2 kudos
6 More Replies
sophia1
by New Contributor
  • 1255 Views
  • 0 replies
  • 0 kudos

back pain 26

Hundreds of thousands of people in the United States suffer from back pain at some point in their lives. Because of this, you don't have to suffer greatly. The advice in this article can assist you in lessening the daily agony that you experience. Pa...

  • 1255 Views
  • 0 replies
  • 0 kudos
AmanSehgal
by Honored Contributor III
  • 9319 Views
  • 1 replies
  • 11 kudos

Resolved! How to merge all the columns into one column as JSON?

I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF: 

image image
  • 9319 Views
  • 1 replies
  • 11 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 11 kudos

I was able to do this by converting df to rdd and then by applying map function to it.rdd_1 = df.rdd.map(lambda row: (row['ID'], row.asDict() ) )   ...

  • 11 kudos
Emiel_Smeenk
by New Contributor III
  • 18485 Views
  • 5 replies
  • 8 kudos

Resolved! Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Hello,We are working to migrate to databricks runtime 10.4 LTS from 9.1 LTS but we're running into weird behavioral issues. Our existing code works up until runtime 10.3 and in 10.4 it stopped working.Problem:We have a nested json file that we are fl...

image image image
  • 18485 Views
  • 5 replies
  • 8 kudos
Latest Reply
Emiel_Smeenk
New Contributor III
  • 8 kudos

It seems like the issue was miraculously resolved. I did not make any code changes but everything is now running as expected. Maybe the latest runtime 10.4 fix released on April 19th also resolved this issue unintentionally.

  • 8 kudos
4 More Replies
nickg
by New Contributor III
  • 6972 Views
  • 6 replies
  • 3 kudos

Resolved! I am looking to use the pivot function with Spark SQL (not Python)

Hello. I am trying to using the Pivot function for email addresses. This is what I have so far:Select fname, lname, awUniqueID, Email1, Email2From xxxxxxxxPivot (    count(Email) as Test    For Email    In (1 as Email1, 2 as Email2)    )I get everyth...

  • 6972 Views
  • 6 replies
  • 3 kudos
Latest Reply
nickg
New Contributor III
  • 3 kudos

source data:fname lname awUniqueID EmailJohn Smith 22 jsmith@gmail.comJODI JONES 22 jsmith@live.comDesired output:fname lname awUniqueID Em...

  • 3 kudos
5 More Replies
Labels