Data Engineering

Forum Posts

Sorted by:

by erigaud • Honored Contributor

10-27-2023 12:04:48 AM

5145 Views
1 replies
0 kudos

Combining DLT and workflow - MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED

Hello everyone !I currently have a DLT pipeline that loads into several Delta LIVE tables (both streaming and materialized view).The end table of my DLT pipeline is a materialized view called "silver.my_view".In a later step I need to join/union/merg...

Data Engineering

5145 Views
1 replies
0 kudos

10-27-2023 12:04:48 AM

View Replies

by Rani • New Contributor

11-23-2016 8:27:33 AM

11269 Views
2 replies
0 kudos

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

I have to divide a dataframe into multiple smaller dataframes based on values in columns like - gender and state , the end goal is to pick up random samples from each dataframeI am trying to implement a sample as explained below, I am quite new to th...

Data Engineering

11269 Views
2 replies
0 kudos

11-23-2016 8:27:33 AM

View Replies

Latest Reply

subham0611
New Contributor II

10-27-2023 2:02:07 AM

0 kudos

@raela I also have similar usecase. I am writing data to different databricks tables based on colum value.But I am getting insufficient disk space error and driver is getting killed. I am suspecting df.select(colName).distinct().collect()step is taki...

0 kudos

10-27-2023 2:02:07 AM

1 More Replies

by Leszek • Contributor

09-16-2022 4:46:08 AM

8308 Views
1 replies
2 kudos

IDENTITY columns generating every other number when merging

Hi,I'm doing merge to my Delta Table which has IDENTITY column:Id BIGINT GENERATED ALWAYS AS IDENTITYInserted data has in the id column every other number, like this:Is this expected behavior? Is there any workaround to make number increasing by 1?

Data Engineering

8308 Views
1 replies
2 kudos

09-16-2022 4:46:08 AM

View Replies

Latest Reply

Dataspeaksss
New Contributor II

10-26-2023 11:02:40 PM

2 kudos

Were you able to resolve it? I'm facing the same issue.

2 kudos

10-26-2023 11:02:40 PM

by Mohammad_Younus • New Contributor

10-26-2023 7:35:31 PM

5672 Views
0 replies
0 kudos

Merge delta tables with data more than 200 million

HI Everyone,Im trying to merge two delta tables who have data more than 200 million in each of them. These tables are properly optimized. But upon running the job, the job is taking a long time to execute and the memory spills are huger (1TB-3TB) rec...

Data Engineering

5672 Views
0 replies
0 kudos

10-26-2023 7:35:31 PM

by Joe1912 • New Contributor III

10-26-2023 6:33:56 PM

1509 Views
0 replies
0 kudos

Issue with MERGE INTO for first batch

I have source data with multiple rows and columns, 1 of column is city. I want to get unique city into other table by stream data from source table. So I trying to use merge into and foreachBatch with my merge function. My merge condition is : On so...

Data Engineering

1509 Views
0 replies
0 kudos

10-26-2023 6:33:56 PM

by JD2 • Contributor

10-26-2023 2:55:14 PM

1735 Views
0 replies
0 kudos

cursor type\loop question

Hello:In my Hive Metastore, I have 35 tables in database that I want to export in excel. I need help on query that can loop one table at a time export one table to excel.Any help is appreciated.Thanking in advance for your kind help.

Data Engineering

1735 Views
0 replies
0 kudos

10-26-2023 2:55:14 PM

by Sahha_Krishna • New Contributor

07-27-2023 1:57:28 PM

10274 Views
1 replies
0 kudos

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Unable to start the Cluster in AWS-hosted Databricks because of the below reason{ "reason": { "code": "BOOTSTRAP_TIMEOUT", "parameters": { "databricks_error_message": "[id: InstanceId(i-0634ee9c2d420edc8), status: INSTANCE_INITIALIZIN...

Data Engineering

AWS

EC2

VPC

10274 Views
1 replies
0 kudos

07-27-2023 1:57:28 PM

View Replies

Latest Reply

User16539034020
Databricks Employee

10-26-2023 10:48:23 AM

0 kudos

Hi, Sahha: Thanks for contacting Databricks Support. This is the common type of error, which indicates that the bootstrap failed due to a misconfigured data plane network. Databricks requested EC2 instances for a new cluster, but encountered a long ...

0 kudos

10-26-2023 10:48:23 AM

by feng_2014 • New Contributor

10-26-2023 10:29:02 AM

1623 Views
0 replies
0 kudos

Geoparquet support with Use Photon Acceleration enabled

Hi Experts,Recently our team noticed that when we are using Aparch Sedona to create the parquet file with Geoparquet format, the geo metedata was not created inside the parquet file. But if we turn off the Photon setting, everything was working as ex...

Data Engineering

1623 Views
0 replies
0 kudos

10-26-2023 10:29:02 AM

by Hubert-Dudek • Databricks MVP

10-23-2023 1:29:32 PM

8539 Views
1 replies
1 kudos

The perfect table

Unlock the Power of #Databricks: The Perfect Table in 8 Simple Steps!

Data Engineering

8539 Views
1 replies
1 kudos

10-23-2023 1:29:32 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-26-2023 10:24:05 AM

1 kudos

Hi @Hubert-Dudek, Thank you for sharing this great post

1 kudos

10-26-2023 10:24:05 AM

by Madhur • Databricks Partner

08-14-2023 2:40:14 AM

1711 Views
1 replies
0 kudos

Resolved! What is the difference between the Auto Optimize set on Spark Session to the one set on Delta Table.

Data Engineering

1711 Views
1 replies
0 kudos

08-14-2023 2:40:14 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-26-2023 9:34:11 AM

0 kudos

Hi @Madhur, The difference between Auto Optimize set on Spark Session and the one set on Delta Table lies in their scope and precedence. Auto Optimize on Spark Session will apply to all Delta tables in the current session. It is a global configuratio...

0 kudos

10-26-2023 9:34:11 AM

by krishnaarige • New Contributor

08-22-2023 1:47:35 PM

2670 Views
1 replies
0 kudos

OperationalError: 250003: Failed to get the response. Hanging? method: get

OperationalError: 250003: Failed to get the response. Hanging? method: get, url: https://cdodataplatform.east-us-2.privatelink.snowflakecomputing.com:443/queries/01ae7ab6-0c04-e4bd-011c-e60552f6cf63/result?request_guid=315c25b7-f17d-4123-a2e5-6d82605...

Data Engineering

2670 Views
1 replies
0 kudos

08-22-2023 1:47:35 PM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-26-2023 9:25:23 AM

0 kudos

could you please share the full error stack trace?

0 kudos

10-26-2023 9:25:23 AM

by igorgatis • New Contributor II

10-11-2023 7:32:57 AM

4543 Views
1 replies
1 kudos

How to improve Spark UI Job Description for pyspark?

I find it quite hard to understand Spark UI for my pyspark pipelines. For example, when one writes `spark.read.table("sometable").show()` it shows:I learned that `DataFrame` API actually may spawn jobs before running the actual job. In the example ab...

Data Engineering

4543 Views
1 replies
1 kudos

10-11-2023 7:32:57 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

10-26-2023 9:24:06 AM

1 kudos

Hi @igorgatis, A polite reminder. Have you had a chance to review my colleague's reply? Please inform us if it contributes to resolving your query.

1 kudos

10-26-2023 9:24:06 AM

by pygreg • New Contributor

10-26-2023 6:58:13 AM

2360 Views
0 replies
0 kudos

Workflows "Run now with different parameters" UI proposal

Hello everyone!I've been working with the Databricks platform for a few months now and I have a suggestion/proposal regarding the UI interface of Workflows.First, let me explain what I find not so ideal.Let's say we have a job with three Notebook Tas...

Data Engineering

2360 Views
0 replies
0 kudos

10-26-2023 6:58:13 AM

by Rafal9 • New Contributor III

10-26-2023 2:58:42 AM

5935 Views
1 replies
2 kudos

DAB: NameError: name 'file' is not defined

Hi Everyone,I am running job task using Asset Bundle.Bundle has been validated and deployed according to: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/bundles/work-tasksPart of the databricks.yml bundle: name: etldatabricks resourc...

Data Engineering

5935 Views
1 replies
2 kudos

10-26-2023 2:58:42 AM

View Replies

by Akshay9 • New Contributor

10-25-2023 10:33:49 PM

1096 Views
0 replies
0 kudos

Databricks Optimization

I am trying to read 30 xml files and create a dataframe of the data of each node but i takes alot of time approximately 8 mins to run those files what i can i do to optimize the databricks notebook and i append the data in a databricks delta table

Data Engineering

1096 Views
0 replies
0 kudos

10-25-2023 10:33:49 PM

Databricks Community

Forum Posts

Combining DLT and workflow - MATERIALIZED_VIEW_OPERATION_NOT_ALLOWED

Divide a dataframe into multiple smaller dataframes based on values in multiple columns in Scala

IDENTITY columns generating every other number when merging

Merge delta tables with data more than 200 million

Issue with MERGE INTO for first batch

cursor type\loop question

Unable to start Cluster in Databricks because of `BOOTSTRAP_TIMEOUT`

Geoparquet support with Use Photon Acceleration enabled

The perfect table

Resolved! What is the difference between the Auto Optimize set on Spark Session to the one set on Delta Table.

OperationalError: 250003: Failed to get the response. Hanging? method: get

How to improve Spark UI Job Description for pyspark?

Workflows "Run now with different parameters" UI proposal

DAB: NameError: name 'file' is not defined

Databricks Optimization

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template