Data Engineering

Forum Posts

Sorted by:

by sandy311 • New Contributor III

04-06-2025 11:59:36 AM

645 Views
2 replies
1 kudos

Private package installation using DAB on job cluster

I'm using a .whl job to upload and install a package on a job cluster using DAB. However, I'm facing an issue with a private package from Azure Artifacts. It works fine when running in CI or Azure DevOps pipelines because the PAT token is available a...

Data Engineering

645 Views
2 replies
1 kudos

04-06-2025 11:59:36 AM

View Replies

Latest Reply

sandy311
New Contributor III

04-06-2025 9:15:44 PM

1 kudos

Hi @Brahmareddy Thanks for details, could you please provide examples if possible?

1 kudos

04-06-2025 9:15:44 PM

1 More Replies

by satyam-verma • New Contributor

04-06-2025 11:51:59 AM

1146 Views
2 replies
0 kudos

Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

I’m transitioning from all-purpose clusters to job compute to optimize costs. Previously, we reused an existing_cluster_id in the job configuration to reduce total job runtime.My use case:A parent job triggers multiple child jobs sequentially.I want ...

Data Engineering

1146 Views
2 replies
0 kudos

04-06-2025 11:51:59 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-06-2025 7:37:06 PM

0 kudos

Hi satyam-verma,How are you doing today?, As per my understanding, switching from all-purpose clusters to job compute can definitely help with cost optimization. In your case, where a parent job triggers multiple child jobs, it makes sense to want to...

0 kudos

04-06-2025 7:37:06 PM

1 More Replies

by ankit001mittal • New Contributor III

04-05-2025 12:54:39 PM

1160 Views
5 replies
3 kudos

is there a way to find DLT job and task records in system.lakeflow.job_task_run_timeline?

Hi Guys,We're working on the monetization product and are trying to understand how much costs are coming from our jobs and DLT and all purpose interactive sessions? and are currently exploring the system.lakeflow.job_task_run_timeline table to find t...

Data Engineering

1160 Views
5 replies
3 kudos

04-05-2025 12:54:39 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-06-2025 6:09:39 AM

3 kudos

Hi Ankit, Thanks for your follow-up. The pipeline_events table you're referring to is part of the system tables for Delta Live Tables (DLT), and if it’s not showing up in your workspace, it’s likely because system tables haven't been enabled yet in y...

3 kudos

04-06-2025 6:09:39 AM

4 More Replies

by noorbasha534 • Valued Contributor

04-03-2025 1:00:30 PM

589 Views
1 replies
2 kudos

Events subscription in Databricks delta tables

Dear all, We are maintaining a global/enterprise data platform for a customer.We like to capture events based on data streaming happening on Databricks based delta tables. ((data streams run for at least 15 hrs a day; so, events should be generated b...

Data Engineering

589 Views
1 replies
2 kudos

04-03-2025 1:00:30 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-06-2025 6:20:28 AM

2 kudos

Hi noorbasha534,How are you doing today?, As per my understanding, what you're looking to build is a really powerful and smart setup—kind of like a data-driven event notification system for streaming Delta tables. While there’s no out-of-the-box feat...

2 kudos

04-06-2025 6:20:28 AM

by subhas • New Contributor II

03-25-2025 9:26:41 AM

759 Views
1 replies
1 kudos

Auto Loader bringing NULL Records

Hi I am using auto loader to fetch some records stored in two files. Please see below my code. It fetches records from two files correctly and then it starts fetching NULL records. I attach option("cleanSource", ) to readStream. But it is ...

Data Engineering

759 Views
1 replies
1 kudos

03-25-2025 9:26:41 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-06-2025 6:15:51 AM

1 kudos

Hi subhas,How are you doing today?, As per my understanding, It looks like the issue is happening because you're using /FileStore, which isn’t fully supported by Autoloader’s cleanSource option. Even though the code looks mostly fine, Autoloader expe...

1 kudos

04-06-2025 6:15:51 AM

by cmathieu • New Contributor III

04-03-2025 1:27:22 PM

613 Views
1 replies
2 kudos

Resolved! OPTIMIZE command on heavily nested table OOM error

I'm trying to run the OPTIMIZE command on a table with less than 2000 rows, but it is causing an out of memory issue. The problem seems to come from the fact that it is a heavily nested table in staging between a json file and flattened table. The ta...

Data Engineering

613 Views
1 replies
2 kudos

04-03-2025 1:27:22 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

04-05-2025 4:40:56 PM

2 kudos

Hi cmathieu,How are you doing today?, As per my understanding, Yeah, this sounds like one of those cases where the data volume is small, but the complexity of the schema is what’s causing the trouble. If your table has deeply nested JSON structures, ...

2 kudos

04-05-2025 4:40:56 PM

by abelian-grape • New Contributor III

03-29-2025 3:20:50 PM

1061 Views
3 replies
1 kudos

Build a streaming table on top of a snowflake table

Is it possible to create a streaming table on top of a snowflake tablel accessible via the Lakehouse Federation?

Data Engineering

1061 Views
3 replies
1 kudos

03-29-2025 3:20:50 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-29-2025 7:23:14 PM

1 kudos

Hi abelian-grape,How are you doing today?, As per my understanding, Right now, it’s not possible to create a streaming table directly on top of a Snowflake table that’s accessed through Lakehouse Federation in Databricks. Lakehouse Federation allows ...

1 kudos

03-29-2025 7:23:14 PM

2 More Replies

by RajNath • New Contributor II

03-25-2024 4:29:22 AM

4781 Views
2 replies
1 kudos

Cost of using delta sharing with unity catalog

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for la...

Data Engineering

4781 Views
2 replies
1 kudos

03-25-2024 4:29:22 AM

View Replies

Latest Reply

noorbasha534
Valued Contributor

04-05-2025 1:44:23 PM

1 kudos

@RajNath I am also looking for information around this. as far as I understood, it uses provider side compute. did you get the same info?...

1 kudos

04-05-2025 1:44:23 PM

1 More Replies

by Arvind007 • New Contributor II

04-03-2025 7:03:16 AM

1355 Views
3 replies
1 kudos

Resolved! Issue while reading external iceberg table from GCS path using spark SQL

df = spark.sql("select * from bqms_table;"); df.show();ENV - DBRT 16.3 (includes Apache Spark 3.5.2, Scala 2.12)org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.1 Py4JJavaError: An error occurred while calling o471.showString. : org.apache.spar...

Data Engineering

1355 Views
3 replies
1 kudos

04-03-2025 7:03:16 AM

View Replies

Latest Reply

Arvind007
New Contributor II

04-05-2025 4:12:07 AM

1 kudos

I tried given solutions but it seems issue still persist. Appreciate if it can be resolved by Databricks soon for better integration b/w GCP and Databricks .

1 kudos

04-05-2025 4:12:07 AM

2 More Replies

by abelian-grape • New Contributor III

03-29-2025 10:23:18 AM

987 Views
2 replies
0 kudos

Triggering Downstream Workflow in Databricks from New Inserts in Snowflake

Hi Databricks experts,I have a table in Snowflake that tracks newly added items, and a downstream data processing workflow that needs to be triggered whenever new items are added. I'm currently using Lakehouse Federation to query the Snowflake tables...

Data Engineering

987 Views
2 replies
0 kudos

03-29-2025 10:23:18 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-29-2025 7:27:37 PM

0 kudos

Hi abelian-grape,Great question! Since you're using Lakehouse Federation to access the Snowflake table, and Databricks can't directly stream from or listen to inserts in Snowflake, the best approach is to use an interval-based polling mechanism in Da...

0 kudos

03-29-2025 7:27:37 PM

1 More Replies

by ChsAIkrishna • Contributor

01-21-2025 3:15:07 AM

3094 Views
1 replies
1 kudos

Vnet Gateway issues on Power bi Conn

Team,We are getting frequent vnet gateway failures on power bi Dataset using DAX(simple DAX not complex) and upon the rerun it is working, is any permanent fix for this ?Error :{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error...

Data Engineering

3094 Views
1 replies
1 kudos

01-21-2025 3:15:07 AM

View Replies

Latest Reply

F_Goudarzi
New Contributor III

04-04-2025 2:23:29 PM

1 kudos

Similar issue. Any solution?

1 kudos

04-04-2025 2:23:29 PM

by Nyarish • Contributor

09-11-2021 9:42:00 PM

21568 Views
18 replies
18 kudos

Resolved! How to connect Neo4j aura to a cluster

Please help resolve this error :org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverThis occurs when I want to establish a connection to neo4j aura to my cluster .Thank you.

Data Engineering

21568 Views
18 replies
18 kudos

09-11-2021 9:42:00 PM

View Replies

Latest Reply

saab123
New Contributor II

04-04-2025 12:04:32 PM

18 kudos

I have added the init script on cluster start up in Spark->config->init.scripts. But my clsuter won't start after this.Cluster scoped init script /Volumes/xxx/xxx/neo4j/neo4j-init.sh failed: Script exit status is non-zero. Could you please help me or...

18 kudos

04-04-2025 12:04:32 PM

17 More Replies

by jeremy98 • Honored Contributor

04-04-2025 7:35:03 AM

970 Views
3 replies
0 kudos

differences between notebooks and notebooks that run inside a job

Hello Community,I'm facing an issue with a job that runs a notebook task. When I run the same join condition through the job pipeline, it produces different results compared to running the notebook interactively (outside the job).Why might this be ha...

Data Engineering

970 Views
3 replies
0 kudos

04-04-2025 7:35:03 AM

View Replies

Latest Reply

jeremy98
Honored Contributor

04-04-2025 9:27:48 AM

0 kudos

Hi,Thanks for your question!What I'm doing is essentially loading a table from PostgreSQL using a Spark JDBC connection, and also reading the corresponding table from Databricks. I then perform delete, update, and insert operations by comparing the ...

0 kudos

04-04-2025 9:27:48 AM

2 More Replies

by bhargavabasava • New Contributor II

04-03-2025 3:19:35 AM

1745 Views
3 replies
0 kudos

Resolved! Job compute is taking longer even after using pool

Hi team,We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequen...

Data Engineering

1745 Views
3 replies
0 kudos

04-03-2025 3:19:35 AM

View Replies

Latest Reply

Isi
Honored Contributor II

04-03-2025 7:22:37 AM

0 kudos

Hey @bhargavabasava ,Job Cluster + Compute Pools: Long Startup TimesIf you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to cluster provisioning. While compute pools are designed to reduce cold start tim...

0 kudos

04-03-2025 7:22:37 AM

2 More Replies

by dbernabeuplx • New Contributor II

04-03-2025 7:04:36 AM

1343 Views
5 replies
0 kudos

Resolved! How to delete/empty notebook output

I need to clear cell output in Databricks notebooks using dbutils or the API. As for my requirements, I need to clear it for data security reasons. That is, given a notebook's PATH, I would like to be able to clear all its outputs, as is done through...

Data Engineering

API

Data

issue

Notebooks

1343 Views
5 replies
0 kudos

04-03-2025 7:04:36 AM

View Replies

Latest Reply

srinum89
New Contributor III

04-03-2025 11:08:33 AM

0 kudos

For Programmatic approach, you can also clear the each cell output individually using IPython package. Unfortunately, you need to do this in each and every cell. from IPython.display import clear_output clear_output(wait=True)

0 kudos

04-03-2025 11:08:33 AM

4 More Replies

Databricks Community

Forum Posts

Private package installation using DAB on job cluster

Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

is there a way to find DLT job and task records in system.lakeflow.job_task_run_timeline?

Events subscription in Databricks delta tables

Auto Loader bringing NULL Records

Resolved! OPTIMIZE command on heavily nested table OOM error

Build a streaming table on top of a snowflake table

Cost of using delta sharing with unity catalog

Resolved! Issue while reading external iceberg table from GCS path using spark SQL

Triggering Downstream Workflow in Databricks from New Inserts in Snowflake

Vnet Gateway issues on Power bi Conn

Resolved! How to connect Neo4j aura to a cluster

differences between notebooks and notebooks that run inside a job

Resolved! Job compute is taking longer even after using pool

Resolved! How to delete/empty notebook output

Join Us as a Local Community Builder!

Cognito as IdP provider for Delta Share

How to Retrieve the spark.statistics.createdAt Whe...

Not able to find lab for Data Engineering Learning...

Lakeflow Connect - Postgres connector

Prakash Hinduja Switzerland (Swiss) How do I build...