Data Engineering

Forum Posts

Sorted by:

by sonalitotade • New Contributor II

01-18-2023 7:58:25 AM

981 Views
2 replies
0 kudos

Capture events such as Start, Stop and Terminate of cluster.

Hi,I am using databricks with AWS.I need to capture events such as Start, Stop and Terminate of cluster and perform some other action based on the events that happened on the cluster.Is there a way I can achieve this in databricks?

Data Engineering

981 Views
2 replies
0 kudos

01-18-2023 7:58:25 AM

View Replies

Latest Reply

sonalitotade
New Contributor II

01-31-2023 1:06:33 AM

0 kudos

Hi Daniel, thanks for the responseI would like to know if we can capture the event logs as shown in the image below when an event occurs on the cluster.

0 kudos

01-31-2023 1:06:33 AM

1 More Replies

by KVNARK • Honored Contributor II

01-30-2023 7:56:46 PM

6552 Views
2 replies
5 kudos

Resolved! pyspark optimizations and best practices

What and all we can implement maximum to attain the best optimization and which are all the best practices using PySpark end to end.

Data Engineering

6552 Views
2 replies
5 kudos

01-30-2023 7:56:46 PM

View Replies

Latest Reply

daniel_sahal
Esteemed Contributor

01-30-2023 11:55:41 PM

5 kudos

@KVNARK . This video is cool.https://www.youtube.com/watch?v=daXEp4HmS-E

5 kudos

01-30-2023 11:55:41 PM

1 More Replies

by Gandham • New Contributor II

01-28-2023 10:30:02 AM

2001 Views
3 replies
1 kudos

Maven Libraries are failing on restarting the cluster.

I have installed "com.databricks:spark-xml_2.12:0.16.0" maven libraries to a cluster. The installation was successful. But when I restart the cluster, even this successful installation becomes failed. This happens with all Maven Libraries. Here is th...

Data Engineering

2001 Views
3 replies
1 kudos

01-28-2023 10:30:02 AM

View Replies

Latest Reply

Aviral-Bhardwaj
Esteemed Contributor III

01-30-2023 7:37:26 PM

1 kudos

it is intermittent issue, we also faced this issue earlier ,try to upgrade DBR version

1 kudos

01-30-2023 7:37:26 PM

2 More Replies

by Therdpong • New Contributor II

01-18-2023 8:22:41 AM

1005 Views
2 replies
0 kudos

how to check what jobs cluster to have expanddisk.

We would like to know how to check what jobs cluster to have to expand disk.

Data Engineering

1005 Views
2 replies
0 kudos

01-18-2023 8:22:41 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-30-2023 2:40:04 PM

0 kudos

You can check in the cluster's event logs. You can type in the search box, "disk" and you will see all the events there.

0 kudos

01-30-2023 2:40:04 PM

1 More Replies

by SS2 • Valued Contributor

11-29-2022 12:06:54 PM

1110 Views
2 replies
1 kudos

Spark out of memory error.

Sometimes in Databricks you can see the out of memory error then in that case you can change the cluster size. As per requirement to resolve the issue.

Data Engineering

1110 Views
2 replies
1 kudos

11-29-2022 12:06:54 PM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-30-2023 4:38:22 PM

1 kudos

Hi @S S,Could you provide more details on your issue? for example, error stack traces, code snippet, etc. We will be able to help you if you share more details

1 kudos

01-30-2023 4:38:22 PM

1 More Replies

by rocky5 • New Contributor III

11-30-2022 1:14:24 AM

1243 Views
1 replies
2 kudos

Cannot create delta live table

I created a simple definition of delta live table smth like:CREATE OR REFRESH STREAMING LIVE TABLE customers_silverAS SELECT * FROM STREAM(LIVE.customers_bronze)But I am getting an error when running a pipeline:com.databricks.sql.transaction.tahoe.De...

Data Engineering

1243 Views
1 replies
2 kudos

11-30-2022 1:14:24 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-30-2023 4:26:13 PM

2 kudos

You might need to execute the following on your tables to avoid this error message ALTER TABLE <table_name> SET TBLPROPERTIES ( 'delta.minReaderVersion' = '2', 'delta.minWriterVersion' = '5', 'delta.columnMapping.mode' = 'name' )Docs https...

2 kudos

01-30-2023 4:26:13 PM

by BL • New Contributor III

01-14-2023 4:09:25 AM

2772 Views
4 replies
3 kudos

Error reading in Parquet file

I am trying to read a .parqest file from a ADLS gen2 location in azure databricks . But facing the below error:spark.read.parquet("abfss://............/..._2023-01-14T08:01:29.8549884Z.parquet")org.apache.spark.SparkException: Job aborted due to stag...

Data Engineering

2772 Views
4 replies
3 kudos

01-14-2023 4:09:25 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-30-2023 2:51:18 PM

3 kudos

Can you access the executor logs? When you cluster is up and running, you can access the executor's logs. For example, the error shows:org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent ...

3 kudos

01-30-2023 2:51:18 PM

3 More Replies

by cmilligan • Contributor II

01-18-2023 8:57:36 AM

1115 Views
3 replies
0 kudos

Undescriptive error when trying to insert overwrite into a table

I have a query that I'm trying to insert overwrite into a table. In an effort to try and speed up the query I added a range join hint. After adding it I started getting the error below.I can get around this though by creating a temporary view of the ...

Data Engineering

1115 Views
3 replies
0 kudos

01-18-2023 8:57:36 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-30-2023 2:11:55 PM

0 kudos

Could you share your code and the full error stack trace please? Check the driver logs for the full stack trace.

0 kudos

01-30-2023 2:11:55 PM

2 More Replies

by jagac • New Contributor

01-23-2023 1:09:46 PM

700 Views
2 replies
0 kudos

Cannot log into Community Edition.

Hi there, I recently made an account on the Community Edition and cannot seem to log in. Error says the following:Invalid email address or passwordNote: Emails/usernames are case-sensitiveSo I tried to reset my password and still could not log in. I ...

Data Engineering

700 Views
2 replies
0 kudos

01-23-2023 1:09:46 PM

View Replies

Latest Reply

Anonymous
Not applicable

01-23-2023 10:21:02 PM

0 kudos

Hi @jagac petrovic Thank you for reaching out, and we’re sorry to hear about this log-in issue! We have this Community Edition login troubleshooting post on Community. Please take a look, and follow the troubleshooting steps. If the steps do not res...

0 kudos

01-23-2023 10:21:02 PM

1 More Replies

by User16835756816 • Valued Contributor

01-23-2023 3:55:06 PM

1505 Views
3 replies
1 kudos

How can I optimize my data pipeline?

Delta Lake provides optimizations that can help you accelerate your data lake operations. Here’s how you can improve query speed by optimizing the layout of data in storage.There are two ways you can optimize your data pipeline: 1) Notebook Optimizat...

Data Engineering

1505 Views
3 replies
1 kudos

01-23-2023 3:55:06 PM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-24-2023 10:40:50 AM

1 kudos

some tips from me:Look for data skews; some partitions can be huge, some small because of incorrect partitioning. You can use Spark UI to do that but also debug your code a bit (get getNumPartitions()), especially SQL can divide it unequally to parti...

1 kudos

01-24-2023 10:40:50 AM

2 More Replies

by Arun_Kumar • New Contributor II

12-01-2022 1:50:45 AM

1503 Views
4 replies
1 kudos

List of Databricks tables created by a user

Hi team,Could you please confirm below clarifications1. How can we get the list of tables created by a user in particular workspace?2. How can we get the list of tables created by user from multiple workspaces? ( Same user has access to 10 workspace...

Data Engineering

1503 Views
4 replies
1 kudos

12-01-2022 1:50:45 AM

View Replies

Latest Reply

ashish1
New Contributor III

01-30-2023 1:26:46 PM

1 kudos

Hi Arun, hope your query is answered. Please select the best answer or let us know if any further questions.

1 kudos

01-30-2023 1:26:46 PM

3 More Replies

by AndriusVitkausk • New Contributor III

12-07-2022 5:31:55 AM

792 Views
1 replies
0 kudos

Reading multi-dimensional json files

So I've been having some issues reading a json file that's been provided to the business with another nesting layer, so instead of a json being an:'array of objects' -> [ {} ,{} ,{} ] It's an 'array of arrays of objects' -> [ [ {}, {} ,{} ], [ {} ,{}...

Data Engineering

792 Views
1 replies
0 kudos

12-07-2022 5:31:55 AM

View Replies

Latest Reply

ashish1
New Contributor III

01-30-2023 1:20:09 PM

0 kudos

You can use the explode function to flatten the array to rows, can you post a simple example of your data?

0 kudos

01-30-2023 1:20:09 PM

by LavaLiah_85929 • New Contributor II

12-05-2022 12:33:03 PM

607 Views
2 replies
0 kudos

"desc history" shows versions older than the default logRetentionDuration of 30 days

I have a cdc enabled table where no data changes were made since July 28. Then updates started occurring from November 22 onwards. The first checkpoint occurred on Nov 28. Based on the corresponding timestamp of checkpoint and log files, it looks lik...

Data Engineering

607 Views
2 replies
0 kudos

12-05-2022 12:33:03 PM

View Replies

Latest Reply

shyam_9
Valued Contributor

01-30-2023 12:04:42 PM

0 kudos

Hi @Laval Liahkim, could you please try running the VACUUM with 30 days retention?Please confirm when you last run the cmd with the 30-day retention period. Also, when you created this table and do you see old data files were deleted?Also, when disk...

0 kudos

01-30-2023 12:04:42 PM

1 More Replies

by espenol • New Contributor III

12-05-2022 6:08:58 AM

1442 Views
3 replies
0 kudos

How to debug Workflow Jobs timing out and DLT pipelines running forever?

So I'm the designated data engineer for a proof of concept we're running, I'm working with one infrastructure guy who's setting up everything in Terraform (company policy). He's got the setup down for Databricks so we can configure clusters and run n...

Data Engineering

1442 Views
3 replies
0 kudos

12-05-2022 6:08:58 AM

View Replies

Latest Reply

shan_chandra
Honored Contributor III

01-30-2023 12:12:32 PM

0 kudos

@Espen Solvang - Just thought of checking with you, could you please let us know if you require further assistance on this?

0 kudos

01-30-2023 12:12:32 PM

2 More Replies

by nimble • New Contributor

12-06-2022 3:35:35 AM

1181 Views
2 replies
0 kudos

How can I run a streaming query on a new table with tbl property: change data feed enabled?

In Databricks on AWS, I am trying to run a streaming query (trigger=Once) with delta.enableChangeDataFeed=true in the table definition as instructed, but this always fails with :ERROR: Some streams terminated before this command could finish! com.d...

Data Engineering

1181 Views
2 replies
0 kudos

12-06-2022 3:35:35 AM

View Replies

Latest Reply

swethaNandan
New Contributor III

01-30-2023 12:03:40 PM

0 kudos

Hi @daniel e Can you try running the select command on table changes from 0th version and see if you get output?SELECT * FROM table_changes('tableName', 0)Also, Please share the streaming query that you are running.

0 kudos

01-30-2023 12:03:40 PM

1 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Capture events such as Start, Stop and Terminate of cluster.

Resolved! pyspark optimizations and best practices

Maven Libraries are failing on restarting the cluster.

how to check what jobs cluster to have expanddisk.

Spark out of memory error.

Cannot create delta live table

Error reading in Parquet file

Undescriptive error when trying to insert overwrite into a table

Cannot log into Community Edition.

How can I optimize my data pipeline?

List of Databricks tables created by a user

Reading multi-dimensional json files

"desc history" shows versions older than the default logRetentionDuration of 30 days

How to debug Workflow Jobs timing out and DLT pipelines running forever?

How can I run a streaming query on a new table with tbl property: change data feed enabled?

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...