Data Engineering

Forum Posts

Sorted by:

by 561064 • New Contributor II

12-01-2023 2:39:57 PM

9567 Views
2 replies
0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

Data Engineering

9567 Views
2 replies
0 kudos

12-01-2023 2:39:57 PM

View Replies

Latest Reply

Dribka
New Contributor III

12-02-2023 1:58:31 PM

0 kudos

A very interesting task in front of you.... let me know how you solve it!

0 kudos

12-02-2023 1:58:31 PM

1 More Replies

by BenLambert • Contributor

11-30-2023 3:39:53 AM

14334 Views
4 replies
0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array element: struct field1: string field2: string array_field2: array element: struct nested_field: stri...

Data Engineering

14334 Views
4 replies
0 kudos

11-30-2023 3:39:53 AM

View Replies

Latest Reply

BenLambert
Contributor

12-04-2023 2:18:27 AM

0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

0 kudos

12-04-2023 2:18:27 AM

3 More Replies

by afk • Databricks Partner

12-01-2023 4:29:13 AM

6503 Views
2 replies
2 kudos

Change data feed from target tables of APPLY CHANGES

Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

Data Engineering

6503 Views
2 replies
2 kudos

12-01-2023 4:29:13 AM

View Replies

by ElaPG • New Contributor III

12-01-2023 3:28:00 AM

2935 Views
1 replies
0 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

Data Engineering

2935 Views
1 replies
0 kudos

12-01-2023 3:28:00 AM

View Replies

by sher • Valued Contributor II

12-03-2023 10:29:29 AM

7513 Views
1 replies
0 kudos

How to resolve the column name in s3 path saved as UUID format

our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...

Data Engineering

deltatable

managedtables

7513 Views
1 replies
0 kudos

12-03-2023 10:29:29 AM

View Replies

Latest Reply

sher
Valued Contributor II

12-03-2023 9:48:57 PM

0 kudos

hi @Retired_mod Thank you for you are reply but the issue is i am not able to map ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fettin...

0 kudos

12-03-2023 9:48:57 PM

by rt-slowth • Contributor

11-30-2023 3:28:57 PM

3300 Views
2 replies
1 kudos

How to call a table created with create_table using dlt in a separate notebook?

I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

Data Engineering

3300 Views
2 replies
1 kudos

11-30-2023 3:28:57 PM

View Replies

by alejandrofm • Valued Contributor

11-01-2022 8:20:22 AM

11477 Views
10 replies
15 kudos

All-purpose clusters not remembering custom tags

Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...

Data Engineering

11477 Views
10 replies
15 kudos

11-01-2022 8:20:22 AM

View Replies

Latest Reply

Dribka
New Contributor III

12-01-2023 10:07:23 AM

15 kudos

@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...

15 kudos

12-01-2023 10:07:23 AM

9 More Replies

by Umamaheswari_12 • New Contributor

12-01-2023 9:00:42 AM

1618 Views
0 replies
0 kudos

Request for reattempt voucher. Databricks Certified Data Engineer Associate exam

HiOn Nov 29th ,I attempted the Databricks Certified Data Engineer Associate exam for 1st time , unfortunately I ended up by failing grade. My passing grade was 70%, and I received 64.00%.I am planning to reattempt the exam, Could you kindly give me a...

Data Engineering

1618 Views
0 replies
0 kudos

12-01-2023 9:00:42 AM

by Daniel20 • New Contributor

12-01-2023 5:10:43 AM

1796 Views
0 replies
0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

Data Engineering

1796 Views
0 replies
0 kudos

12-01-2023 5:10:43 AM

by 804082 • New Contributor III

11-30-2023 7:17:16 AM

3521 Views
4 replies
1 kudos

Resolved! "Your workspace is hosted on infrastructure that cannot support serverless compute."

Hello,I wanted to try out Lakehouse Monitoring, but I receive the following message during setup: "Your workspace is hosted on infrastructure that cannot support serverless compute."I meet all requirements outlined in the documentation. My workspace ...

Data Engineering

3521 Views
4 replies
1 kudos

11-30-2023 7:17:16 AM

View Replies

Latest Reply

SSundaram
Databricks Partner

11-30-2023 9:47:37 AM

1 kudos

Lakehouse MonitoringThis feature is in Public Preview in the following regions: eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2, ap-southeast-2. Not all workspaces in the regions listed are supported. If you see the error “Your workspace is ...

1 kudos

11-30-2023 9:47:37 AM

3 More Replies

by jurodriguezt • New Contributor

11-30-2023 5:02:12 PM

8324 Views
0 replies
0 kudos

How to know the most recent date a Data on a Dashboard was updated.

I know in the Old version of Dashboards we have this Created at:And in the new Lake View Dashboards we have the Last Modified:I'm searching for a field that allows the client to quickly identify the latest data update timestamp for a Dashboard

Data Engineering

8324 Views
0 replies
0 kudos

11-30-2023 5:02:12 PM

by Wayne • New Contributor III

11-30-2023 1:34:16 PM

31754 Views
0 replies
0 kudos

How to flatten a nested recursive JSON struct to a list of struct

Data Engineering

31754 Views
0 replies
0 kudos

11-30-2023 1:34:16 PM

by Arnold_Souza • New Contributor III

11-30-2023 8:06:02 AM

8233 Views
1 replies
0 kudos

Delta Live Tables consuming different files from the same path are combining the schema

SummaryI am using Delta Live Tables to create a pipeline in Databricks and I am facing a problem of merging the schema of different files that are placed in the same folder in a datalake, even though I am using File Patterns to separate the data inge...

Data Engineering

cloud_files

Databricks SQL

Delta Live Tables

read_files

8233 Views
1 replies
0 kudos

11-30-2023 8:06:02 AM

View Replies

Latest Reply

Arnold_Souza
New Contributor III

11-30-2023 10:57:49 AM

0 kudos

Found a solution:Never use 'fileNamePattern', '*file_1*',Instead, put the pattern directly into the path:"abfss://<container>@<storage_account>.dfs.core.windows.net/path/to/folder/*file_1*"

0 kudos

11-30-2023 10:57:49 AM

by bzh • New Contributor

08-14-2023 6:27:13 PM

4928 Views
3 replies
0 kudos

Question: Delta Live Table, multiple streaming sources to the single target

We are trying to writing multiple sources to the same target table using DLT, but getting the below errors. Not sure what we are missing here in the code....File /databricks/spark/python/dlt/api.py:817, in apply_changes(target, source, keys, sequence...

Data Engineering

4928 Views
3 replies
0 kudos

08-14-2023 6:27:13 PM

View Replies

Latest Reply

nag_kanchan
New Contributor III

11-30-2023 5:04:56 AM

0 kudos

The solution did not work for me. It was throwing an error stating: raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o434.readStream. Trace: py4j.Py4JException: Method readStream([class java. util.ArrayList]) does not exist.A...

0 kudos

11-30-2023 5:04:56 AM

2 More Replies

by Faisal • Contributor

11-08-2023 2:22:57 AM

3079 Views
1 replies
0 kudos

DLT - how to log number of rows read and written

Hi @Retired_mod - how to log number of rows read and written in dlt pipeline, I want to store it in audit tables post the pipeline update completes. Can you give me sample query code ?

Data Engineering

3079 Views
1 replies
0 kudos

11-08-2023 2:22:57 AM

View Replies

Latest Reply

Faisal
Contributor

11-30-2023 4:02:09 AM

0 kudos

Thanks @Retired_mod but I asked on how to log number of rows/written via a delta live table (DLT) pipeline, not a delta lake table and the solution you gave is related to data factory pipeline which is not what I need.

0 kudos

11-30-2023 4:02:09 AM

Databricks Community

Forum Posts

Exporting delta table to one CSV

Resolved! Explode is giving unexpected results.

Change data feed from target tables of APPLY CHANGES

DLT concurrent pipeline updates.

How to resolve the column name in s3 path saved as UUID format

How to call a table created with create_table using dlt in a separate notebook?

All-purpose clusters not remembering custom tags

Request for reattempt voucher. Databricks Certified Data Engineer Associate exam

Flattening a Nested Recursive JSON Structure into a Struct List

Resolved! "Your workspace is hosted on infrastructure that cannot support serverless compute."

How to know the most recent date a Data on a Dashboard was updated.

How to flatten a nested recursive JSON struct to a list of struct

Delta Live Tables consuming different files from the same path are combining the schema

Question: Delta Live Table, multiple streaming sources to the single target

DLT - how to log number of rows read and written

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template