Data Engineering

Forum Posts

Sorted by:

by Asterol • New Contributor III

01-20-2023 2:50:06 AM

3966 Views
4 replies
5 kudos

Data Engineer Associate and Professional tittle holders count

How many people hold tittles of certified Databricks Data Engineers Asociate/Professional right now?Is there any place I can check the global certificate count?

Data Engineering

3966 Views
4 replies
5 kudos

01-20-2023 2:50:06 AM

View Replies

Latest Reply

sher
Valued Contributor II

01-22-2023 4:40:57 AM

5 kudos

check here: https://credentials.databricks.com/collection/da21363e-5c7d-410a-b144-dd07d3e22942?_ga=2.163643839.1823848454.1674389186-2106443313.1667211405&_gac=1.49521364.1672812437.CjwKCAiAwc-dBhA7EiwAxPRylBN9S-JeQ8779ec3GXJYBQPfnu_qkv5l_MKO1u4jw2w-...

5 kudos

01-22-2023 4:40:57 AM

3 More Replies

by Ogi • New Contributor II

01-04-2023 2:57:08 AM

3986 Views
3 replies
1 kudos

Resolved! Azure CosmosDB change feed ingestion via DLT

Is there a way to ingest Azure CosmosDB data via Delta Live Tables? If I use regular workflows it works well, but with DLT I'm not able to set CosmosDB Connector on a cluster.

Data Engineering

3986 Views
3 replies
1 kudos

01-04-2023 2:57:08 AM

View Replies

Latest Reply

Ogi
New Contributor II

02-01-2023 5:31:42 AM

1 kudos

Thanks a lot! Just wanted to doublecheck whether this natively exists.

1 kudos

02-01-2023 5:31:42 AM

2 More Replies

by andrew0117 • Contributor

01-19-2023 8:37:31 PM

3414 Views
2 replies
0 kudos

depth of view exceeds the maximum view resolution depth (100).

I got this error after updating a view. How can I increase the value of spark.sql.view.maNestedViewDepth to work around this? Thanks!

Data Engineering

3414 Views
2 replies
0 kudos

01-19-2023 8:37:31 PM

View Replies

Latest Reply

Debayan
Databricks Employee

01-24-2023 12:12:56 AM

0 kudos

Hi, Could you please confirm if you are showing the view? (https://docs.databricks.com/sql/language-manual/sql-ref-syntax-aux-show-views.html) also, it will be helpful if you post the screenshot of the error.

0 kudos

01-24-2023 12:12:56 AM

1 More Replies

by User16765131552 • Contributor III

06-24-2021 11:16:49 AM

2144 Views
1 replies
3 kudos

Delta Sharing Costs

When Delta Sharing is enabled and a link is shared, I understand that the data transfer happens directly and not through the sharing server. I'm curious how costs are calculated. Is the entity making the share available charged for data egress and ...

Data Engineering

2144 Views
1 replies
3 kudos

06-24-2021 11:16:49 AM

View Replies

Latest Reply

Databricks_love
New Contributor II

02-01-2023 2:48:16 AM

3 kudos

Any news

3 kudos

02-01-2023 2:48:16 AM

by blackcoffeeAR • Contributor

01-26-2023 5:24:27 AM

5246 Views
5 replies
2 kudos

Cannot install com.microsoft.azure.kusto:kusto-spark

Hello,I'm trying to install/update the library com.microsoft.azure.kusto:kusto-spark_3.0_2.12:3.1.xTried to install with Maven central repository and using Terraform.It was working previously and now the installation always ends with error:│ Error: c...

Data Engineering

5246 Views
5 replies
2 kudos

01-26-2023 5:24:27 AM

View Replies

Latest Reply

phisolani
New Contributor II

02-01-2023 1:41:46 AM

2 kudos

I have the same problem with a slightly different version of the connector (change on the minor version). I have a job that runs every hour and specifically, this started to happen on the 23rd of January onwards. The error indeed does say the same:Ru...

2 kudos

02-01-2023 1:41:46 AM

4 More Replies

by Dipesh • New Contributor II

01-31-2023 6:27:07 AM

5964 Views
4 replies
2 kudos

Pausing a scheduled Azure Databricks job after failure

Hi All,I have a job/workflow scheduled in Databricks to run after every hour.How can I configure my Job to pause whenever a job run fails? (Pause the job/workflow on first failure)I would want to prevent triggering multiple runs due to the scheduled/...

Data Engineering

5964 Views
4 replies
2 kudos

01-31-2023 6:27:07 AM

View Replies

Latest Reply

Dipesh
New Contributor II

01-31-2023 8:53:45 PM

2 kudos

Hi @Hubert Dudek , Thank you for your suggestion.I understand that we can use Jobs API to change the pasue_status of job on errors, but sometimes we observed that the workflow/job fails due to cluster issues (while the job clusters are getting creat...

2 kudos

01-31-2023 8:53:45 PM

3 More Replies

by User16783853906 • Contributor III

06-07-2021 2:14:43 PM

3105 Views
1 replies
1 kudos

Understanding file retention with Vacuum

I have seen few instances where users reported that they run OPTIMIZE for the past week worth of data and they follow by VACUUM with RETAIN of 168 HOURS (for example), the old files aren't being deleted, "VACUUM is not removing old files from the tab...

Data Engineering

3105 Views
1 replies
1 kudos

06-07-2021 2:14:43 PM

View Replies

Latest Reply

Priyanka_Biswas
Databricks Employee

01-31-2023 7:46:06 PM

1 kudos

Hello @Venkatesh Kottapalli VACUUM removes all files from the table directory that are not managed by Delta, as well as data files that are no longer in the latest state of the transaction log for the table and are older than a retention threshold. ...

1 kudos

01-31-2023 7:46:06 PM

by User16826992666 • Valued Contributor

06-25-2021 12:20:16 PM

1910 Views
1 replies
3 kudos

When developing Delta Live Tables, is there a way to see the query history?

I am not sure where I can look currently to see how my DLT queries are performing. How can I investigate the query plan for past DLT runs?

Data Engineering

1910 Views
1 replies
3 kudos

06-25-2021 12:20:16 PM

View Replies

Latest Reply

Priyanka_Biswas
Databricks Employee

01-31-2023 7:25:22 PM

3 kudos

Hello @Trevor Bishop You can check the query plan in the Spark UI , SQL tab. You would need to select the past run from dropdown and click on SparkUIAdditionally an event log is created and maintained for every Delta Live Tables pipeline. The event ...

3 kudos

01-31-2023 7:25:22 PM

by databicky • Contributor II

01-31-2023 5:28:45 AM

2584 Views
2 replies
1 kudos

how to get the status of notebook in different notebook

i want to run two notebook like if the count is not equal to zero, first i want to trigger the first notebook and i want to check the particular notebook is succeeded or not ,until the success it need to wait like sleep, if its succeeded means then ...

Data Engineering

2584 Views
2 replies
1 kudos

01-31-2023 5:28:45 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-31-2023 11:07:18 AM

1 kudos

You can use dbutils.notebook.run() to execute a notebook from another notebook if conditions are met in your custom logic; you can also use dbutils.jobs.taskValues to pass values between notebooks https://docs.databricks.com/workflows/jobs/how-to-sha...

1 kudos

01-31-2023 11:07:18 AM

1 More Replies

by Dipesh • New Contributor II

01-31-2023 6:19:58 AM

3006 Views
1 replies
1 kudos

Resolved! Bulk updating Delta tables in Databricks

Hi All,I have some data in Delta table with multiple columns and each record has a unique identifier.I want to update some columns as per the new values coming in for each of these unique records. However updating one record at a time is taking a lot...

Data Engineering

3006 Views
1 replies
1 kudos

01-31-2023 6:19:58 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-31-2023 11:12:31 AM

1 kudos

yes by using MERGE statment

1 kudos

01-31-2023 11:12:31 AM

by venkat09 • New Contributor III

01-31-2023 10:01:51 AM

1690 Views
1 replies
1 kudos

Schema Evolution - Auto Loader for Avro format is not working as expected

* Reading Avro files from s3 and then writing to the delta table * Ingested sample data of 10 files, which contain four columns, and it infers the schema automatically as expected * Introducing a new file which contains a new column [foo] along wi...

Data Engineering

1690 Views
1 replies
1 kudos

01-31-2023 10:01:51 AM

View Replies

Latest Reply

venkat09
New Contributor III

01-31-2023 11:06:32 AM

1 kudos

I am attaching the sample code notebook that helps to reproduce the issue.

1 kudos

01-31-2023 11:06:32 AM

by KuldeepChitraka • New Contributor III

01-31-2023 8:08:58 AM

2380 Views
3 replies
6 kudos

Performance Issue : Create DELTA table form 2 TB PARQUET file

We are trying to create a DELTA table (CTAS statement) from 2 TB PARQUET file and its taking huge amount of time around 12~ hrs.is it normal.? What are option to tune/optimize this ? are we doing anything wrongCluster : Interactive/30 Cores / 320 GB ...

Data Engineering

2380 Views
3 replies
6 kudos

01-31-2023 8:08:58 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-31-2023 10:58:05 AM

6 kudos

Please use COPY INTO (first create an empty delta table) or CONVERT TO DELTA instead of CTAS it will be much more faster, and it process will be auto-optimized.

6 kudos

01-31-2023 10:58:05 AM

2 More Replies

by mriccardi • New Contributor II

12-01-2022 11:12:26 AM

4449 Views
1 replies
0 kudos

Structured Streaming Checkpoint corrupted.

Hello,We are experiencing an error with one Structured Streaming Job that we have, that basically the checkpoint gets corrupted and we are unable to continue with the execution.I've checked the errors and this happens when it triggers an autocompact,...

Data Engineering

4449 Views
1 replies
0 kudos

12-01-2022 11:12:26 AM

View Replies

Latest Reply

jose_gonzalez
Databricks Employee

01-31-2023 9:14:11 AM

0 kudos

Hi @Martin Riccardi,Could you share the following please:1) whats your Source?2) whats your Sink?3) could you share your readStream() and writeStream() code?4) full error stack trace5) did you stop and re-run your query after weeks of not being acti...

0 kudos

01-31-2023 9:14:11 AM

by Sameer_876675 • New Contributor III

12-07-2022 4:22:17 AM

6903 Views
3 replies
2 kudos

How to efficiently process a 100GiB JSON nested file and store it in Delta?

Hi, I'm a fairly new user and I am using Azure Databricks to process a ~1000GiB JSON nested file containing insurance policy data. I uploaded the JSON file to Azure Data Lake Gen2 storage and read the JSON file into a dataframe.df=spark.read.option("...

Data Engineering

6903 Views
3 replies
2 kudos

12-07-2022 4:22:17 AM

View Replies

Latest Reply

Annapurna_Hiriy
Databricks Employee

01-31-2023 8:20:49 AM

2 kudos

Hi Sameer, please refer to following documents on how to work with nested json:https://docs.databricks.com/optimizations/semi-structured.htmlhttps://learn.microsoft.com/en-us/azure/databricks/kb/_static/notebooks/scala/nested-json-to-dataframe.html

2 kudos

01-31-2023 8:20:49 AM

2 More Replies

by pramalin • New Contributor

01-30-2023 10:33:47 AM

4105 Views
3 replies
2 kudos

How to perform Inner join using withcolumn

Data Engineering

4105 Views
3 replies
2 kudos

01-30-2023 10:33:47 AM

View Replies

Latest Reply

shan_chandra
Databricks Employee

01-31-2023 7:55:15 AM

2 kudos

@prudhvi ramalingam - Please refer to the below example code.import org.apache.spark.sql.functions.expr val person = Seq( (0, "Bill Chambers", 0, Seq(100)), (1, "Matei Zaharia", 1, Seq(500, 250, 100)), (2, "Michael Armbrust", 1, Seq(250,...

2 kudos

01-31-2023 7:55:15 AM

2 More Replies

Databricks Community

Forum Posts

Data Engineer Associate and Professional tittle holders count

Resolved! Azure CosmosDB change feed ingestion via DLT

depth of view exceeds the maximum view resolution depth (100).

Delta Sharing Costs

Cannot install com.microsoft.azure.kusto:kusto-spark

Pausing a scheduled Azure Databricks job after failure

Understanding file retention with Vacuum

When developing Delta Live Tables, is there a way to see the query history?

how to get the status of notebook in different notebook

Resolved! Bulk updating Delta tables in Databricks

Schema Evolution - Auto Loader for Avro format is not working as expected

Performance Issue : Create DELTA table form 2 TB PARQUET file

Structured Streaming Checkpoint corrupted.

How to efficiently process a 100GiB JSON nested file and store it in Delta?

How to perform Inner join using withcolumn

Join Us as a Local Community Builder!

Data profiling monitoring with foreign catalog

How to invoke Databricks AI Assistant from a noteb...

Issue with Lakebridge transpile installation – SSL...

Spark JDBC Netsuite error - SQLSyntaxErrorExcepti...

Syncing lakebase table to delta table