Data Engineering

Forum Posts

Sorted by:

by Lazloo • New Contributor III

12-05-2023 4:25:10 AM

414 Views
1 replies
0 kudos

Using spark jars using databricks-connect>=13.0

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\ config('spark.jars.packages','org.apache.spark:spark-avro_...

Data Engineering

414 Views
1 replies
0 kudos

12-05-2023 4:25:10 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-05-2023 11:57:35 PM

0 kudos

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible. Let’s adapt your previous approach to the latest version. Adding JARs to a Databricks cluster: If you want to add JAR f...

0 kudos

12-05-2023 11:57:35 PM

by TiagoMag • New Contributor III

12-04-2023 2:36:49 AM

1328 Views
2 replies
3 kudos

Resolved! DLT pipeline evolution schema error

Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...

Data Engineering

1328 Views
2 replies
3 kudos

12-04-2023 2:36:49 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-04-2023 8:46:00 AM

3 kudos

Hi @TiagoMag, It seems you’ve encountered an issue with your DLT pipeline while working on your first data lake table. Let’s dive into the problem and find a solution! The error message you’re encountering states: “Detected a data update (for exam...

3 kudos

12-04-2023 8:46:00 AM

1 More Replies

by david3 • New Contributor III

07-05-2023 4:50:23 AM

1533 Views
4 replies
3 kudos

Resolved! delta live table udf not known when defined in python module

Hi I have the problem that my "module" is not known when used in a user defined function. The precise message is posted below. I have a repo structure as follows: analytics_pipelines │ ├── __init__.py │ ├── coordinate_transformation.py │ ├── d...

Data Engineering

1533 Views
4 replies
3 kudos

07-05-2023 4:50:23 AM

View Replies

Latest Reply

david3
New Contributor III

12-05-2023 2:52:04 AM

3 kudos

Hiyes, I discovered three working possibilities:Define the pandas functions as inline function as pointed out aboveDefine the pandas function in the same script that is imported as "library" in the dlt config ( libraries: - notebook: path: ./pipeline...

3 kudos

12-05-2023 2:52:04 AM

3 More Replies

by 561064 • New Contributor II

12-01-2023 2:39:57 PM

2125 Views
5 replies
0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

Data Engineering

2125 Views
5 replies
0 kudos

12-01-2023 2:39:57 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:06:01 PM

0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

0 kudos

12-03-2023 9:06:01 PM

4 More Replies

by BenLambert • Contributor

11-30-2023 3:39:53 AM

3129 Views
5 replies
0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array element: struct field1: string field2: string array_field2: array element: struct nested_field: stri...

Data Engineering

3129 Views
5 replies
0 kudos

11-30-2023 3:39:53 AM

View Replies

Latest Reply

BenLambert
Contributor

12-04-2023 2:18:27 AM

0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

0 kudos

12-04-2023 2:18:27 AM

4 More Replies

by afk • New Contributor III

12-01-2023 4:29:13 AM

1610 Views
4 replies
1 kudos

Change data feed from target tables of APPLY CHANGES

Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

Data Engineering

1610 Views
4 replies
1 kudos

12-01-2023 4:29:13 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 8:09:38 AM

1 kudos

Hi @afk, It seems you’ve been navigating the intricacies of Databricks Delta Live Tables and Change Data Capture (CDC). Let’s unravel this together! Change Data Capture (CDC): CDC is a process that identifies and captures incremental changes (data ...

1 kudos

12-03-2023 8:09:38 AM

3 More Replies

by ElaPG • New Contributor III

12-01-2023 3:28:00 AM

868 Views
3 replies
1 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

Data Engineering

868 Views
3 replies
1 kudos

12-01-2023 3:28:00 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:16:14 PM

1 kudos

1 kudos

12-03-2023 9:16:14 PM

2 More Replies

by rt-slowth • Contributor

11-30-2023 3:28:57 PM

1130 Views
5 replies
1 kudos

Resolved! How to call a table created with create_table using dlt in a separate notebook?

I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

Data Engineering

1130 Views
5 replies
1 kudos

11-30-2023 3:28:57 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:26:12 PM

1 kudos

1 kudos

12-03-2023 9:26:12 PM

4 More Replies

by sher • Valued Contributor II

12-03-2023 10:29:29 AM

2155 Views
2 replies
1 kudos

How to resolve the column name in s3 path saved as UUID format

our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...

Data Engineering

deltatable

managedtables

2155 Views
2 replies
1 kudos

12-03-2023 10:29:29 AM

View Replies

Latest Reply

sher
Valued Contributor II

12-03-2023 9:48:57 PM

1 kudos

hi @Kaniz Thank you for you are reply but the issue is i am not able to map ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fetting from...

1 kudos

12-03-2023 9:48:57 PM

1 More Replies

by Wayne • New Contributor III

11-30-2023 1:34:16 PM

7140 Views
2 replies
2 kudos

Resolved! How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

Data Engineering

7140 Views
2 replies
2 kudos

11-30-2023 1:34:16 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:26:25 PM

2 kudos

2 kudos

12-03-2023 9:26:25 PM

1 More Replies

by jurodriguezt • New Contributor

11-30-2023 5:02:12 PM

2456 Views
2 replies
0 kudos

How to know the most recent date a Data on a Dashboard was updated.

I know in the Old version of Dashboards we have this Created at:And in the new Lake View Dashboards we have the Last Modified:I'm searching for a field that allows the client to quickly identify the latest data update timestamp for a Dashboard

Data Engineering

2456 Views
2 replies
0 kudos

11-30-2023 5:02:12 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:25:54 PM

0 kudos

0 kudos

12-03-2023 9:25:54 PM

1 More Replies

by Daniel20 • New Contributor

12-01-2023 5:10:43 AM

428 Views
2 replies
0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

Data Engineering

428 Views
2 replies
0 kudos

12-01-2023 5:10:43 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:11:30 PM

0 kudos

0 kudos

12-03-2023 9:11:30 PM

1 More Replies

by rodrigocms • New Contributor

12-01-2023 1:46:16 PM

979 Views
2 replies
0 kudos

Get information from Power BI via XMLA

Hello everyone I am trying to get information from Power BI semantic models via XMLA endpoint using PySpark in Databricks.Can someone help me with that?tks

Data Engineering

979 Views
2 replies
0 kudos

12-01-2023 1:46:16 PM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 9:06:16 PM

0 kudos

0 kudos

12-03-2023 9:06:16 PM

1 More Replies

by Umamaheswari_12 • New Contributor

12-01-2023 9:00:42 AM

483 Views
1 replies
0 kudos

Resolved! Request for reattempt voucher. Databricks Certified Data Engineer Associate exam

HiOn Nov 29th ,I attempted the Databricks Certified Data Engineer Associate exam for 1st time , unfortunately I ended up by failing grade. My passing grade was 70%, and I received 64.00%.I am planning to reattempt the exam, Could you kindly give me a...

Data Engineering

483 Views
1 replies
0 kudos

12-01-2023 9:00:42 AM

View Replies

Latest Reply

Kaniz
Community Manager

12-03-2023 7:52:51 AM

0 kudos

Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).

0 kudos

12-03-2023 7:52:51 AM

by alejandrofm • Valued Contributor

11-01-2022 8:20:22 AM

3194 Views
10 replies
15 kudos

All-purpose clusters not remembering custom tags

Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...

Data Engineering

3194 Views
10 replies
15 kudos

11-01-2022 8:20:22 AM

View Replies

Latest Reply

Dribka
New Contributor III

12-01-2023 10:07:23 AM

15 kudos

@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...

15 kudos

12-01-2023 10:07:23 AM

9 More Replies

User

Count

1602

736

343

284

247

Databricks

Forum Posts

Using spark jars using databricks-connect>=13.0

Resolved! DLT pipeline evolution schema error

Resolved! delta live table udf not known when defined in python module

Exporting delta table to one CSV

Resolved! Explode is giving unexpected results.

Change data feed from target tables of APPLY CHANGES

DLT concurrent pipeline updates.

Resolved! How to call a table created with create_table using dlt in a separate notebook?

How to resolve the column name in s3 path saved as UUID format

Resolved! How to flatten a nested recursive JSON struct to a list of struct

How to know the most recent date a Data on a Dashboard was updated.

Flattening a Nested Recursive JSON Structure into a Struct List

Get information from Power BI via XMLA

Resolved! Request for reattempt voucher. Databricks Certified Data Engineer Associate exam

All-purpose clusters not remembering custom tags

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...