cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

TiagoMag
by New Contributor III
  • 9140 Views
  • 1 replies
  • 2 kudos

Resolved! DLT pipeline evolution schema error

Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...

  • 9140 Views
  • 1 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
Hello everyone, I am currently working on my first dlt pipeline, and I stumped on a problem which I am struggling to solve.I am working on several tables where I have a column called "my_column" with an array of json with two keys : 1 key : score, 2n...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
david3
by New Contributor III
  • 3140 Views
  • 4 replies
  • 3 kudos

Resolved! delta live table udf not known when defined in python module

Hi I have the problem that my "module" is not known when used in a user defined function. The precise message is posted below. I have a repo structure as follows:  analytics_pipelines │ ├── __init__.py │ ├── coordinate_transformation.py │ ├── d...

  • 3140 Views
  • 4 replies
  • 3 kudos
Latest Reply
david3
New Contributor III
  • 3 kudos

Hiyes, I discovered three working possibilities:Define the pandas functions as inline function as pointed out aboveDefine the pandas function in the same script that is imported as "library" in the dlt config ( libraries: - notebook: path: ./pipeline...

  • 3 kudos
3 More Replies
561064
by New Contributor II
  • 5716 Views
  • 2 replies
  • 0 kudos

Exporting delta table to one CSV

Process to export a delta table is taking ~2hrs.Delta table has 66 partitions with total size of ~6gb, 4million rows and 270 columns.Used below commanddf.coalesce(1).write.csv("path")what are my options to reduce the time?

  • 5716 Views
  • 2 replies
  • 0 kudos
Latest Reply
Dribka
New Contributor III
  • 0 kudos

A very interesting task in front of you.... let me know how you solve it!

  • 0 kudos
1 More Replies
BenLambert
by Contributor
  • 10482 Views
  • 4 replies
  • 0 kudos

Resolved! Explode is giving unexpected results.

I have a dataframe with a schema similar to the following:id: stringarray_field: array   element: struct          field1: string          field2: string          array_field2: array               element: struct                     nested_field: stri...

  • 10482 Views
  • 4 replies
  • 0 kudos
Latest Reply
BenLambert
Contributor
  • 0 kudos

It turns out that if the exploded fields don't match the schema that was defined when reading the JSON in the first place that all the data that doesn't match is silently dropped. This is not really nice default behaviour.

  • 0 kudos
3 More Replies
afk
by New Contributor III
  • 4390 Views
  • 2 replies
  • 2 kudos

Change data feed from target tables of APPLY CHANGES

Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

  • 4390 Views
  • 2 replies
  • 2 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 2 kudos

This widget could not be displayed.
Up until yesterday I was (sort of) able to read changes from target tables of apply changes operations (either through tables_changes() or using readChangeFeed). I say sort of because the meta columns (_change_type, _commit_version, _commit_timestamp...

This widget could not be displayed.
  • 2 kudos
This widget could not be displayed.
1 More Replies
ElaPG
by New Contributor III
  • 2124 Views
  • 1 replies
  • 0 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

  • 2124 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
sher
by Valued Contributor II
  • 6832 Views
  • 1 replies
  • 0 kudos

How to resolve the column name in s3 path saved as UUID format

our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...

Data Engineering
deltatable
managedtables
  • 6832 Views
  • 1 replies
  • 0 kudos
Latest Reply
sher
Valued Contributor II
  • 0 kudos

hi @Retired_mod Thank you for you are reply but the issue is i am not able to map  ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fettin...

  • 0 kudos
rt-slowth
by Contributor
  • 2444 Views
  • 2 replies
  • 1 kudos

How to call a table created with create_table using dlt in a separate notebook?

I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

  • 2444 Views
  • 2 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
1 More Replies
alejandrofm
by Valued Contributor
  • 6970 Views
  • 10 replies
  • 15 kudos

All-purpose clusters not remembering custom tags

Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...

  • 6970 Views
  • 10 replies
  • 15 kudos
Latest Reply
Dribka
New Contributor III
  • 15 kudos

@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...

  • 15 kudos
9 More Replies
Daniel20
by New Contributor
  • 1140 Views
  • 0 replies
  • 0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 1140 Views
  • 0 replies
  • 0 kudos
804082
by New Contributor III
  • 2194 Views
  • 4 replies
  • 1 kudos

Resolved! "Your workspace is hosted on infrastructure that cannot support serverless compute."

Hello,I wanted to try out Lakehouse Monitoring, but I receive the following message during setup: "Your workspace is hosted on infrastructure that cannot support serverless compute."I meet all requirements outlined in the documentation. My workspace ...

  • 2194 Views
  • 4 replies
  • 1 kudos
Latest Reply
SSundaram
Contributor
  • 1 kudos

Lakehouse MonitoringThis feature is in Public Preview in the following regions: eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2, ap-southeast-2. Not all workspaces in the regions listed are supported. If you see the error “Your workspace is ...

  • 1 kudos
3 More Replies
Wayne
by New Contributor III
  • 30032 Views
  • 0 replies
  • 0 kudos

How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 30032 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels