cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Lazloo
by New Contributor III
  • 414 Views
  • 1 replies
  • 0 kudos

Using spark jars using databricks-connect>=13.0

With the newest version of databricks-connect, I cannot configure the extra jars I want to use. In the older version, I did that viaspark = SparkSession.builder.appName('DataFrame').\ config('spark.jars.packages','org.apache.spark:spark-avro_...

  • 414 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Hi @Lazloo, In the newer versions of Databricks Connect, configuring additional JARs for your Spark session is still possible.   Let’s adapt your previous approach to the latest version.   Adding JARs to a Databricks cluster: If you want to add JAR f...

  • 0 kudos
TiagoMag
by New Contributor III
  • 1328 Views
  • 2 replies
  • 3 kudos

Resolved! DLT pipeline evolution schema error

Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...

  • 1328 Views
  • 2 replies
  • 3 kudos
Latest Reply
Kaniz
Community Manager
  • 3 kudos

Hi @TiagoMag, It seems you’ve encountered an issue with your DLT pipeline while working on your first data lake table. Let’s dive into the problem and find a solution! The error message you’re encountering states: “Detected a data update (for exam...

  • 3 kudos
1 More Replies
david3
by New Contributor III
  • 1533 Views
  • 4 replies
  • 3 kudos

Resolved! delta live table udf not known when defined in python module

Hi I have the problem that my "module" is not known when used in a user defined function. The precise message is posted below. I have a repo structure as follows:  analytics_pipelines │ ├── __init__.py │ ├── coordinate_transformation.py │ ├── d...

  • 1533 Views
  • 4 replies
  • 3 kudos
Latest Reply
david3
New Contributor III
  • 3 kudos

Hiyes, I discovered three working possibilities:Define the pandas functions as inline function as pointed out aboveDefine the pandas function in the same script that is imported as "library" in the dlt config ( libraries: - notebook: path: ./pipeline...

  • 3 kudos
3 More Replies
561064
by New Contributor II
  • 2125 Views
  • 5 replies
  • 0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

  • 2125 Views
  • 5 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
4 More Replies
BenLambert
by Contributor
  • 3129 Views
  • 5 replies
  • 0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array   element: struct          field1: string          field2: string          array_field2: array               element: struct                     nested_field: stri...

  • 3129 Views
  • 5 replies
  • 0 kudos
Latest Reply
BenLambert
Contributor
  • 0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

  • 0 kudos
4 More Replies
afk
by New Contributor III
  • 1610 Views
  • 4 replies
  • 1 kudos

Change data feed from target tables of APPLY CHANGES

Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

  • 1610 Views
  • 4 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Hi @afk, It seems you’ve been navigating the intricacies of Databricks Delta Live Tables and Change Data Capture (CDC). Let’s unravel this together!   Change Data Capture (CDC): CDC is a process that identifies and captures incremental changes (data ...

  • 1 kudos
3 More Replies
ElaPG
by New Contributor III
  • 868 Views
  • 3 replies
  • 1 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

  • 868 Views
  • 3 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
2 More Replies
rt-slowth
by Contributor
  • 1130 Views
  • 5 replies
  • 1 kudos

Resolved! How to call a table created with create_table using dlt in a separate notebook?

I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

  • 1130 Views
  • 5 replies
  • 1 kudos
Latest Reply
Kaniz
Community Manager
  • 1 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 1 kudos
4 More Replies
sher
by Valued Contributor II
  • 2155 Views
  • 2 replies
  • 1 kudos

How to resolve the column name in s3 path saved as UUID format

our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...

Data Engineering
deltatable
managedtables
  • 2155 Views
  • 2 replies
  • 1 kudos
Latest Reply
sher
Valued Contributor II
  • 1 kudos

hi @Kaniz Thank you for you are reply but the issue is i am not able to map  ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fetting from...

  • 1 kudos
1 More Replies
Wayne
by New Contributor III
  • 7140 Views
  • 2 replies
  • 2 kudos

Resolved! How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 7140 Views
  • 2 replies
  • 2 kudos
Latest Reply
Kaniz
Community Manager
  • 2 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 2 kudos
1 More Replies
jurodriguezt
by New Contributor
  • 2456 Views
  • 2 replies
  • 0 kudos

How to know the most recent date a Data on a Dashboard was updated.

I know in the Old version of Dashboards we have this Created at:And in the new Lake View Dashboards we have the Last Modified:I'm searching for a field that allows the client to quickly identify the latest data update timestamp for a Dashboard

Old_Version_Dashboards.JPG jurodriguezt_0-1701392438666.png
  • 2456 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
Daniel20
by New Contributor
  • 428 Views
  • 2 replies
  • 0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 428 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
rodrigocms
by New Contributor
  • 979 Views
  • 2 replies
  • 0 kudos

Get information from Power BI via XMLA

Hello everyone I am trying to get information from Power BI semantic models via XMLA endpoint using PySpark in Databricks.Can someone help me with that?tks

  • 979 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your question in our community! We are happy to assist you.To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?This...

  • 0 kudos
1 More Replies
Umamaheswari_12
by New Contributor
  • 483 Views
  • 1 replies
  • 0 kudos

Resolved! Request for reattempt voucher. Databricks Certified Data Engineer Associate exam

HiOn Nov 29th ,I attempted the Databricks Certified Data Engineer Associate exam for 1st time , unfortunately I ended up by failing grade. My passing grade was 70%, and I received 64.00%.I am planning to reattempt the exam, Could you kindly give me a...

  • 483 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz
Community Manager
  • 0 kudos

Thank you for posting your concern on Community! To expedite your request, please list your concerns on our ticketing portal. Our support staff would be able to act faster on the resolution (our standard resolution time is 24-48 hours).

  • 0 kudos
alejandrofm
by Valued Contributor
  • 3194 Views
  • 10 replies
  • 15 kudos

All-purpose clusters not remembering custom tags

Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...

  • 3194 Views
  • 10 replies
  • 15 kudos
Latest Reply
Dribka
New Contributor III
  • 15 kudos

@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...

  • 15 kudos
9 More Replies
Labels
Top Kudoed Authors