cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Development
by New Contributor III
  • 6046 Views
  • 5 replies
  • 5 kudos

Delta Table with 130 columns taking time

Hi All,We are facing one un-usual issue while loading data into Delta table using Spark SQL. We have one delta table which have around 135 columns and also having PARTITIONED BY. For this trying to load 15 millions of data volume but its not loading ...

  • 6046 Views
  • 5 replies
  • 5 kudos
Latest Reply
Development
New Contributor III
  • 5 kudos

@Kaniz Fatma​ @Parker Temple​  I found an root cause its because of serialization. we are using UDF to drive an column on dataframe, when we are trying to load data into delta table or write data into parquet file we are facing  serialization issue ....

  • 5 kudos
4 More Replies
manasa
by Contributor
  • 5396 Views
  • 7 replies
  • 2 kudos

Resolved! Recursive view error while using spark 3.2.0 version

This happens while creating temp view using below code blocklatest_data.createOrReplaceGlobalTempView("e_test")ideally this command should replace the view if e_test already exists instead it is throwing"Recursive view `global_temp`.`e_test` detecte...

  • 5396 Views
  • 7 replies
  • 2 kudos
Latest Reply
shan_chandra
Databricks Employee
  • 2 kudos

Hi, @Manasa​, could you please check SPARK-38318 and use Spark 3.1.2, Spark 3.2.2, or Spark 3.3.0 to allow cyclic reference?

  • 2 kudos
6 More Replies
sophia1
by New Contributor
  • 949 Views
  • 0 replies
  • 0 kudos

back pain 26

Hundreds of thousands of people in the United States suffer from back pain at some point in their lives. Because of this, you don't have to suffer greatly. The advice in this article can assist you in lessening the daily agony that you experience. Pa...

  • 949 Views
  • 0 replies
  • 0 kudos
AmanSehgal
by Honored Contributor III
  • 8042 Views
  • 1 replies
  • 11 kudos

Resolved! How to merge all the columns into one column as JSON?

I have a task to transform a dataframe. The task is to collect all the columns in a row and embed it into a JSON string as a column.Source DF:Target DF: 

image image
  • 8042 Views
  • 1 replies
  • 11 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 11 kudos

I was able to do this by converting df to rdd and then by applying map function to it.rdd_1 = df.rdd.map(lambda row: (row['ID'], row.asDict() ) )   ...

  • 11 kudos
Emiel_Smeenk
by New Contributor III
  • 15630 Views
  • 5 replies
  • 8 kudos

Resolved! Databricks Runtime 10.4 LTS - AnalysisException: No such struct field id in 0, 1 after upgrading

Hello,We are working to migrate to databricks runtime 10.4 LTS from 9.1 LTS but we're running into weird behavioral issues. Our existing code works up until runtime 10.3 and in 10.4 it stopped working.Problem:We have a nested json file that we are fl...

image image image
  • 15630 Views
  • 5 replies
  • 8 kudos
Latest Reply
Emiel_Smeenk
New Contributor III
  • 8 kudos

It seems like the issue was miraculously resolved. I did not make any code changes but everything is now running as expected. Maybe the latest runtime 10.4 fix released on April 19th also resolved this issue unintentionally.

  • 8 kudos
4 More Replies
nickg
by New Contributor III
  • 5403 Views
  • 6 replies
  • 3 kudos

Resolved! I am looking to use the pivot function with Spark SQL (not Python)

Hello. I am trying to using the Pivot function for email addresses. This is what I have so far:Select fname, lname, awUniqueID, Email1, Email2From xxxxxxxxPivot (    count(Email) as Test    For Email    In (1 as Email1, 2 as Email2)    )I get everyth...

  • 5403 Views
  • 6 replies
  • 3 kudos
Latest Reply
nickg
New Contributor III
  • 3 kudos

source data:fname lname awUniqueID EmailJohn Smith 22 jsmith@gmail.comJODI JONES 22 jsmith@live.comDesired output:fname lname awUniqueID Em...

  • 3 kudos
5 More Replies
HarshaK
by New Contributor III
  • 16186 Views
  • 4 replies
  • 6 kudos

Resolved! Partition By () on Delta Files

Hi All,I am trying to Partition By () on Delta file in pyspark language and using command:df.write.format("delta").mode("overwrite").option("overwriteSchema","true").partitionBy("Partition Column").save("Partition file path") -- It doesnt seems to w...

  • 16186 Views
  • 4 replies
  • 6 kudos
Latest Reply
Anonymous
Not applicable
  • 6 kudos

Hey @Harsha kriplani​ Hope you are well. Thank you for posting in here. It is awesome that you found a solution. Would you like to mark Hubert's answer as best?  It would be really helpful for the other members too.Cheers!

  • 6 kudos
3 More Replies
Manoj
by Contributor II
  • 2038 Views
  • 2 replies
  • 5 kudos

Resolved! Does job cluster helps the jobs that are fighting for Resources on all purpose cluster ?

Hi Team, Does job cluster helps the jobs that are fighting for Resources on all purpose cluster ?With job cluster the drawback that i see is creation of cluster every time when the job starts, Its taking 2 mins for spinning up the cluster. Instead of...

  • 2038 Views
  • 2 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@Manoj Kumar Rayalla​ , You can in the job set to use an all-purpose cluster (that feature was added recently)You can use the pool to limit job cluster starting time (but it still can take a moment),

  • 5 kudos
1 More Replies
LorenRD
by Contributor
  • 10927 Views
  • 9 replies
  • 13 kudos

Resolved! Is it possible to connect Databricks SQL with AWS Redshift DB?

I would like to know if it's possible to connect Databricks SQL module with not just internal Metastore DB and tables from Data Science and Engineering module but also connect with an AWS Redshift DB to do queries and create alerts. 

image
  • 10927 Views
  • 9 replies
  • 13 kudos
Latest Reply
LorenRD
Contributor
  • 13 kudos

Hi @Kaniz Fatma​ I contacted Customer support explaining this issue, they told me that this feature is not implemented yet but it's in the roadmap with no ETA. It would be great if you ping me back when it's possible to access Redshift tables from SQ...

  • 13 kudos
8 More Replies
gazzyjuruj
by Contributor II
  • 2091 Views
  • 1 replies
  • 4 kudos

Resolved! databricks_error_message: time out placing nodes

Hi, today i'm receiving this error:-databricks_error_message :Timed out while placing nodes. what should be done to fix it?

  • 2091 Views
  • 1 replies
  • 4 kudos
Latest Reply
User16764241763
Honored Contributor
  • 4 kudos

Hello @Ghazanfar Uruj​  This can happen for a bunch of reasons. Could you please file a support case with details, if the issue still persists?

  • 4 kudos
AmanSehgal
by Honored Contributor III
  • 4330 Views
  • 2 replies
  • 10 kudos

Migrating data from delta lake to RDS MySQL and ElasticSearch

There are mechanisms (like DMS) to get data from RDS to delta lake and store the data in parquet format, but is it possible to reverse of this in AWS?I want to send data from data lake to MySQL RDS tables in batch mode.And the next step is to send th...

  • 4330 Views
  • 2 replies
  • 10 kudos
Latest Reply
AmanSehgal
Honored Contributor III
  • 10 kudos

@Kaniz Fatma​  and @Hubert Dudek​  - writing to MySQL RDS is relatively simpler. I'm finding ways to export data into Elasticsearch

  • 10 kudos
1 More Replies
Michael_Galli
by Contributor III
  • 12668 Views
  • 4 replies
  • 8 kudos

Resolved! Monitoring Azure Databricks in an Azure Log Analytics Workspace

Does anyone have experience with the mspnp/spark-monitoring library ?Is this best practice, or are there better ways to monitor a Databricks Cluster?

  • 12668 Views
  • 4 replies
  • 8 kudos
Latest Reply
User16764241763
Honored Contributor
  • 8 kudos

@Michael Galli​  I don't think you can monitor metrics captured by mspnp/spark-monitoring in datadog, there is a service called Azure Log Analytics workspace where these logs are available for querying.You can also check out below if you are interest...

  • 8 kudos
3 More Replies
kjoth
by Contributor II
  • 1473 Views
  • 0 replies
  • 0 kudos

Unmanaged Table - Newly added data directories are not reflected in the table We have created an unmanaged table with partitions on the dbfs location, using SQL. After creating the tables, via SQL we are running

We have created an unmanaged table with partitions on the dbfs location, using SQL.example: %sql CREATE TABLE EnterpriseDailyTrafficSummarytest(EnterpriseID String,ServiceLocationID String, ReportDate String ) USING parquet PARTITIONED BY(ReportDate)...

  • 1473 Views
  • 0 replies
  • 0 kudos
Daba
by New Contributor III
  • 5955 Views
  • 3 replies
  • 5 kudos

Resolved! DLT+AutoLoader: where is the schema and checkpoint hide?

Hi, I'm exploring the DLT with AutoLoader feature and wondering where are the schema and checkpoint hide? I want to wipe these two to reset/reinitialize the flow but unlike the "regular" AutoLoader the checkpoint and schema folder are not there.Thank...

  • 5955 Views
  • 3 replies
  • 5 kudos
Latest Reply
Hubert-Dudek
Esteemed Contributor III
  • 5 kudos

@Alexander Plepler​ , There is a storage option in pipeline settings - A path to a DBFS directory for storing checkpoints and tables created by the pipeline.Additionally, delta is registered in metastore, so the table schema is there.

  • 5 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels