cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

561064
by New Contributor II
  • 9567 Views
  • 2 replies
  • 0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

  • 9567 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

A very interesting task in front of you.... let me know how you solve it!

  • 0 kudos
1 More Replies
BenLambert
by Contributor
  • 14334 Views
  • 4 replies
  • 0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array   element: struct          field1: string          field2: string          array_field2: array               element: struct                     nested_field: stri...

  • 14334 Views
  • 4 replies
  • 0 kudos
Latest Reply
BenLambert
Contributor
  • 0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

  • 0 kudos
3 More Replies
afk
by Databricks Partner
  • 6503 Views
  • 2 replies
  • 2 kudos

Change data feed from target tables of APPLY CHANGES

Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

  • 6503 Views
  • 2 replies
  • 2 kudos
ElaPG
by New Contributor III
  • 2935 Views
  • 1 replies
  • 0 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

  • 2935 Views
  • 1 replies
  • 0 kudos
sher
by Valued Contributor II
  • 7513 Views
  • 1 replies
  • 0 kudos

How to resolve the column name in s3 path saved as UUID format

our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...

Data Engineering
deltatable
managedtables
  • 7513 Views
  • 1 replies
  • 0 kudos
Latest Reply
sher
Valued Contributor II
  • 0 kudos

hi @Retired_mod Thank you for you are reply but the issue is i am not able to map  ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fettin...

  • 0 kudos
rt-slowth
by Contributor
  • 3300 Views
  • 2 replies
  • 1 kudos

How to call a table created with create_table using dlt in a separate notebook?

I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

  • 3300 Views
  • 2 replies
  • 1 kudos
alejandrofm
by Valued Contributor
  • 11477 Views
  • 10 replies
  • 15 kudos

All-purpose clusters not remembering custom tags

Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...

  • 11477 Views
  • 10 replies
  • 15 kudos
Latest Reply
Dribka
New Contributor III
  • 15 kudos

@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...

  • 15 kudos
9 More Replies
Daniel20
by New Contributor
  • 1796 Views
  • 0 replies
  • 0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 1796 Views
  • 0 replies
  • 0 kudos
804082
by New Contributor III
  • 3521 Views
  • 4 replies
  • 1 kudos

Resolved! "Your workspace is hosted on infrastructure that cannot support serverless compute."

Hello,I wanted to try out Lakehouse Monitoring, but I receive the following message during setup: "Your workspace is hosted on infrastructure that cannot support serverless compute."I meet all requirements outlined in the documentation. My workspace ...

  • 3521 Views
  • 4 replies
  • 1 kudos
Latest Reply
SSundaram
Databricks Partner
  • 1 kudos

Lakehouse MonitoringThis feature is in Public Preview in the following regions: eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2, ap-southeast-2. Not all workspaces in the regions listed are supported. If you see the error “Your workspace is ...

  • 1 kudos
3 More Replies
Wayne
by New Contributor III
  • 31754 Views
  • 0 replies
  • 0 kudos

How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 31754 Views
  • 0 replies
  • 0 kudos
Arnold_Souza
by New Contributor III
  • 8233 Views
  • 1 replies
  • 0 kudos

Delta Live Tables consuming different files from the same path are combining the schema

SummaryI am using Delta Live Tables to create a pipeline in Databricks and I am facing a problem of merging the schema of different files that are placed in the same folder in a datalake, even though I am using File Patterns to separate the data inge...

Data Engineering
cloud_files
Databricks SQL
Delta Live Tables
read_files
  • 8233 Views
  • 1 replies
  • 0 kudos
Latest Reply
Arnold_Souza
New Contributor III
  • 0 kudos

Found a solution:Never use 'fileNamePattern', '*file_1*',Instead, put the pattern directly into the path:"abfss://<container>@<storage_account>.dfs.core.windows.net/path/to/folder/*file_1*"

  • 0 kudos
bzh
by New Contributor
  • 4928 Views
  • 3 replies
  • 0 kudos

Question: Delta Live Table, multiple streaming sources to the single target

We are trying to writing multiple sources to the same target table using DLT, but getting the below errors. Not sure what we are missing here in the code....File /databricks/spark/python/dlt/api.py:817, in apply_changes(target, source, keys, sequence...

  • 4928 Views
  • 3 replies
  • 0 kudos
Latest Reply
nag_kanchan
New Contributor III
  • 0 kudos

The solution did not work for me. It was throwing an error stating: raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o434.readStream. Trace: py4j.Py4JException: Method readStream([class java. util.ArrayList]) does not exist.A...

  • 0 kudos
2 More Replies
Faisal
by Contributor
  • 3079 Views
  • 1 replies
  • 0 kudos

DLT - how to log number of rows read and written

Hi @Retired_mod - how to log number of rows read and written in dlt pipeline, I want to store it in audit tables post the pipeline update completes. Can you give me sample query code ?

  • 3079 Views
  • 1 replies
  • 0 kudos
Latest Reply
Faisal
Contributor
  • 0 kudos

Thanks @Retired_mod but I asked on how to log number of rows/written via a delta live table (DLT) pipeline, not a delta lake table and the solution you gave is related to data factory pipeline which is not what I need.

  • 0 kudos
Labels