cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

ElaPG
by New Contributor III
  • 2153 Views
  • 1 replies
  • 0 kudos

DLT concurrent pipeline updates.

Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

  • 2153 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi!Regarding this info "An Azure Databricks workspace is limited to 100 concurrent pipeline updates." (Release 2023.16 - Azure Databricks | Microsoft Learn), what is considered as an update? Changes in pipeline logic or each pipeline run?

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
sher
by Valued Contributor II
  • 6859 Views
  • 1 replies
  • 0 kudos

How to resolve the column name in s3 path saved as UUID format

our managed databricks tables stored in s3 as default, while i am reading that s3 path directly i am getting the column value as UUIDeg: column name ID in databricks tablewhile checking the S3 Path, the column name looks like COL- b400af61-9tha-4565-...

Data Engineering
deltatable
managedtables
  • 6859 Views
  • 1 replies
  • 0 kudos
Latest Reply
sher
Valued Contributor II
  • 0 kudos

hi @Retired_mod Thank you for you are reply but the issue is i am not able to map  ID with COL- b400af61-9tha-4565-89c4-d6ba43f948b7. i useDESCRIBE TABLE EXTENDED table_namea query to get the list of UUID column names. and for real column name fettin...

  • 0 kudos
rt-slowth
by Contributor
  • 2453 Views
  • 2 replies
  • 1 kudos

How to call a table created with create_table using dlt in a separate notebook?

I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

  • 2453 Views
  • 2 replies
  • 1 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 1 kudos

This widget could not be displayed.
I created a separate pipeline notebook to generate the table via DLT, and a separate notebook to write the entire output to redshift at the end. The table created via DLT is called spark.read.table("{schema}.{table}").This way, I can import[MATERIALI...

This widget could not be displayed.
  • 1 kudos
This widget could not be displayed.
1 More Replies
alejandrofm
by Valued Contributor
  • 7027 Views
  • 10 replies
  • 15 kudos

All-purpose clusters not remembering custom tags

Hi, we have several clusters used with Notebooks, we don't delete them, just start-stop according to the "minutes of inactivity" set.I'm trying to set a custom tag, so I wait until the cluster shuts down, add a tag, check that the tag is among then "...

  • 7027 Views
  • 10 replies
  • 15 kudos
Latest Reply
Dribka
New Contributor III
  • 15 kudos

@alejandrofm the behavior you're describing, where the custom tag disappears after the cluster restarts, might be related to the cluster configuration or the specific settings of your Databricks environment. To troubleshoot this, ensure that the cust...

  • 15 kudos
9 More Replies
Daniel20
by New Contributor
  • 1144 Views
  • 0 replies
  • 0 kudos

Flattening a Nested Recursive JSON Structure into a Struct List

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 1144 Views
  • 0 replies
  • 0 kudos
804082
by New Contributor III
  • 2207 Views
  • 4 replies
  • 1 kudos

Resolved! "Your workspace is hosted on infrastructure that cannot support serverless compute."

Hello,I wanted to try out Lakehouse Monitoring, but I receive the following message during setup: "Your workspace is hosted on infrastructure that cannot support serverless compute."I meet all requirements outlined in the documentation. My workspace ...

  • 2207 Views
  • 4 replies
  • 1 kudos
Latest Reply
SSundaram
Contributor
  • 1 kudos

Lakehouse MonitoringThis feature is in Public Preview in the following regions: eu-central-1, eu-west-1, us-east-1, us-east-2, us-west-2, ap-southeast-2. Not all workspaces in the regions listed are supported. If you see the error “Your workspace is ...

  • 1 kudos
3 More Replies
Wayne
by New Contributor III
  • 30103 Views
  • 0 replies
  • 0 kudos

How to flatten a nested recursive JSON struct to a list of struct

This is from Spark Event log on Event SparkListenerSQLExecutionStart.How to flatten the sparkPlanInfo struct into an array of the same struct, then later explode it. Note that the element children is an array containing the parent struct, and the lev...

  • 30103 Views
  • 0 replies
  • 0 kudos
Arnold_Souza
by New Contributor III
  • 6014 Views
  • 1 replies
  • 0 kudos

Delta Live Tables consuming different files from the same path are combining the schema

SummaryI am using Delta Live Tables to create a pipeline in Databricks and I am facing a problem of merging the schema of different files that are placed in the same folder in a datalake, even though I am using File Patterns to separate the data inge...

Data Engineering
cloud_files
Databricks SQL
Delta Live Tables
read_files
  • 6014 Views
  • 1 replies
  • 0 kudos
Latest Reply
Arnold_Souza
New Contributor III
  • 0 kudos

Found a solution:Never use 'fileNamePattern', '*file_1*',Instead, put the pattern directly into the path:"abfss://<container>@<storage_account>.dfs.core.windows.net/path/to/folder/*file_1*"

  • 0 kudos
sgannavaram
by New Contributor III
  • 2706 Views
  • 2 replies
  • 1 kudos

How to connect to IBM MQ from Databricks notebook?

We are trying to connect to IBM MQ and post message to MQ, which eventually consumed by mainframe application.What are the IBM MQ clients .jars / libraries installed in cluster ? if you have any sample code for connectivity that would be helpful.

  • 2706 Views
  • 2 replies
  • 1 kudos
Latest Reply
Saleem
New Contributor II
  • 1 kudos

Kindly update if you are able to connect to MQ from Databricks. I am working on same but no luck as I’m unable to install pymqi library on the cluster as its showing error as MQ library could not be found

  • 1 kudos
1 More Replies
bzh
by New Contributor
  • 3159 Views
  • 3 replies
  • 0 kudos

Question: Delta Live Table, multiple streaming sources to the single target

We are trying to writing multiple sources to the same target table using DLT, but getting the below errors. Not sure what we are missing here in the code....File /databricks/spark/python/dlt/api.py:817, in apply_changes(target, source, keys, sequence...

  • 3159 Views
  • 3 replies
  • 0 kudos
Latest Reply
nag_kanchan
New Contributor III
  • 0 kudos

The solution did not work for me. It was throwing an error stating: raise Py4JError( py4j.protocol.Py4JError: An error occurred while calling o434.readStream. Trace: py4j.Py4JException: Method readStream([class java. util.ArrayList]) does not exist.A...

  • 0 kudos
2 More Replies
Faisal
by Contributor
  • 1510 Views
  • 1 replies
  • 0 kudos

DLT - how to log number of rows read and written

Hi @Retired_mod - how to log number of rows read and written in dlt pipeline, I want to store it in audit tables post the pipeline update completes. Can you give me sample query code ?

  • 1510 Views
  • 1 replies
  • 0 kudos
Latest Reply
Faisal
Contributor
  • 0 kudos

Thanks @Retired_mod but I asked on how to log number of rows/written via a delta live table (DLT) pipeline, not a delta lake table and the solution you gave is related to data factory pipeline which is not what I need.

  • 0 kudos
AFox
by Contributor
  • 6134 Views
  • 3 replies
  • 3 kudos

databricks-connect: PandasUDFs importing local packages: ModuleNotFoundError

databricks-connect==14.1.0Related to other posts:https://community.databricks.com/t5/data-engineering/modulenotfounderror-serializationerror-when-executing-over/td-p/14301https://stackoverflow.com/questions/59322622/how-to-use-a-udf-defined-in-a-sub-...

  • 6134 Views
  • 3 replies
  • 3 kudos
Latest Reply
AFox
Contributor
  • 3 kudos

There is a way to do this!! spark.addArtifact(src_zip_path, pyfile=True) Some things of note:This only works on single user (non shared) clusterssrc_zip_path must be a posixpath type string (i.e. forward slash ) even on windows (drop C: and replace t...

  • 3 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels