cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

sandy311
by New Contributor III
  • 645 Views
  • 2 replies
  • 1 kudos

Private package installation using DAB on job cluster

I'm using a .whl job to upload and install a package on a job cluster using DAB. However, I'm facing an issue with a private package from Azure Artifacts. It works fine when running in CI or Azure DevOps pipelines because the PAT token is available a...

  • 645 Views
  • 2 replies
  • 1 kudos
Latest Reply
sandy311
New Contributor III
  • 1 kudos

Hi @Brahmareddy Thanks for details, could you please provide examples if possible?

  • 1 kudos
1 More Replies
satyam-verma
by New Contributor
  • 1146 Views
  • 2 replies
  • 0 kudos

Switching from All-Purpose to Job Compute – How to Reuse Cluster in Parent/Child Jobs?

I’m transitioning from all-purpose clusters to job compute to optimize costs. Previously, we reused an existing_cluster_id in the job configuration to reduce total job runtime.My use case:A parent job triggers multiple child jobs sequentially.I want ...

  • 1146 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi satyam-verma,How are you doing today?, As per my understanding, switching from all-purpose clusters to job compute can definitely help with cost optimization. In your case, where a parent job triggers multiple child jobs, it makes sense to want to...

  • 0 kudos
1 More Replies
ankit001mittal
by New Contributor III
  • 1160 Views
  • 5 replies
  • 3 kudos

is there a way to find DLT job and task records in system.lakeflow.job_task_run_timeline?

Hi Guys,We're working on the monetization product and are trying to understand how much costs are coming from our jobs and DLT and all purpose interactive sessions? and are currently exploring the system.lakeflow.job_task_run_timeline table to find t...

  • 1160 Views
  • 5 replies
  • 3 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 3 kudos

Hi Ankit, Thanks for your follow-up. The pipeline_events table you're referring to is part of the system tables for Delta Live Tables (DLT), and if it’s not showing up in your workspace, it’s likely because system tables haven't been enabled yet in y...

  • 3 kudos
4 More Replies
noorbasha534
by Valued Contributor
  • 589 Views
  • 1 replies
  • 2 kudos

Events subscription in Databricks delta tables

Dear all, We are maintaining a global/enterprise data platform for a customer.We like to capture events based on data streaming happening on Databricks based delta tables. ((data streams run for at least 15 hrs a day; so, events should be generated b...

  • 589 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi noorbasha534,How are you doing today?, As per my understanding, what you're looking to build is a really powerful and smart setup—kind of like a data-driven event notification system for streaming Delta tables. While there’s no out-of-the-box feat...

  • 2 kudos
subhas
by New Contributor II
  • 759 Views
  • 1 replies
  • 1 kudos

Auto Loader bringing NULL Records

Hi        I am using auto loader to fetch some records stored in two files. Please see below my code. It fetches records from two files correctly and then it starts fetching NULL records. I attach option("cleanSource",    ) to  readStream. But it is ...

  • 759 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi subhas,How are you doing today?, As per my understanding, It looks like the issue is happening because you're using /FileStore, which isn’t fully supported by Autoloader’s cleanSource option. Even though the code looks mostly fine, Autoloader expe...

  • 1 kudos
cmathieu
by New Contributor III
  • 613 Views
  • 1 replies
  • 2 kudos

Resolved! OPTIMIZE command on heavily nested table OOM error

I'm trying to run the OPTIMIZE command on a table with less than 2000 rows, but it is causing an out of memory issue. The problem seems to come from the fact that it is a heavily nested table in staging between a json file and flattened table. The ta...

  • 613 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi cmathieu,How are you doing today?, As per my understanding, Yeah, this sounds like one of those cases where the data volume is small, but the complexity of the schema is what’s causing the trouble. If your table has deeply nested JSON structures, ...

  • 2 kudos
abelian-grape
by New Contributor III
  • 1061 Views
  • 3 replies
  • 1 kudos

Build a streaming table on top of a snowflake table

Is it possible to create a streaming table on top of a snowflake tablel accessible via the Lakehouse Federation?

  • 1061 Views
  • 3 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi abelian-grape,How are you doing today?, As per my understanding, Right now, it’s not possible to create a streaming table directly on top of a Snowflake table that’s accessed through Lakehouse Federation in Databricks. Lakehouse Federation allows ...

  • 1 kudos
2 More Replies
RajNath
by New Contributor II
  • 4781 Views
  • 2 replies
  • 1 kudos

Cost of using delta sharing with unity catalog

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for la...

  • 4781 Views
  • 2 replies
  • 1 kudos
Latest Reply
noorbasha534
Valued Contributor
  • 1 kudos

@RajNath I am also looking for information around this. as far as I understood, it uses provider side compute. did you get the same info?...

  • 1 kudos
1 More Replies
Arvind007
by New Contributor II
  • 1355 Views
  • 3 replies
  • 1 kudos

Resolved! Issue while reading external iceberg table from GCS path using spark SQL

 df = spark.sql("select * from bqms_table;"); df.show();ENV - DBRT 16.3 (includes Apache Spark 3.5.2, Scala 2.12)org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.1 Py4JJavaError: An error occurred while calling o471.showString. : org.apache.spar...

  • 1355 Views
  • 3 replies
  • 1 kudos
Latest Reply
Arvind007
New Contributor II
  • 1 kudos

 I tried given solutions but it seems issue still persist. Appreciate if it can be resolved by Databricks soon for better integration b/w GCP and Databricks .

  • 1 kudos
2 More Replies
abelian-grape
by New Contributor III
  • 987 Views
  • 2 replies
  • 0 kudos

Triggering Downstream Workflow in Databricks from New Inserts in Snowflake

Hi Databricks experts,I have a table in Snowflake that tracks newly added items, and a downstream data processing workflow that needs to be triggered whenever new items are added. I'm currently using Lakehouse Federation to query the Snowflake tables...

  • 987 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi abelian-grape,Great question! Since you're using Lakehouse Federation to access the Snowflake table, and Databricks can't directly stream from or listen to inserts in Snowflake, the best approach is to use an interval-based polling mechanism in Da...

  • 0 kudos
1 More Replies
ChsAIkrishna
by Contributor
  • 3094 Views
  • 1 replies
  • 1 kudos

Vnet Gateway issues on Power bi Conn

Team,We are getting frequent vnet gateway failures on power bi Dataset using DAX(simple DAX not complex) and upon the rerun it is working, is any permanent fix for this ?Error :{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error...

  • 3094 Views
  • 1 replies
  • 1 kudos
Latest Reply
F_Goudarzi
New Contributor III
  • 1 kudos

Similar issue. Any solution?

  • 1 kudos
Nyarish
by Contributor
  • 21568 Views
  • 18 replies
  • 18 kudos

Resolved! How to connect Neo4j aura to a cluster

Please help resolve this error :org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverThis occurs when I want to establish a connection to neo4j aura to my cluster .Thank you.

  • 21568 Views
  • 18 replies
  • 18 kudos
Latest Reply
saab123
New Contributor II
  • 18 kudos

I have added the init script on cluster start up in Spark->config->init.scripts. But my clsuter won't start after this.Cluster scoped init script /Volumes/xxx/xxx/neo4j/neo4j-init.sh failed: Script exit status is non-zero. Could you please help me or...

  • 18 kudos
17 More Replies
jeremy98
by Honored Contributor
  • 970 Views
  • 3 replies
  • 0 kudos

differences between notebooks and notebooks that run inside a job

Hello Community,I'm facing an issue with a job that runs a notebook task. When I run the same join condition through the job pipeline, it produces different results compared to running the notebook interactively (outside the job).Why might this be ha...

  • 970 Views
  • 3 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

 Hi,Thanks for your question!What I'm doing is essentially loading a table from PostgreSQL using a Spark JDBC connection, and also reading the corresponding table from Databricks. I then perform delete, update, and insert operations by comparing the ...

  • 0 kudos
2 More Replies
bhargavabasava
by New Contributor II
  • 1745 Views
  • 3 replies
  • 0 kudos

Resolved! Job compute is taking longer even after using pool

Hi team,We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequen...

  • 1745 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor II
  • 0 kudos

Hey @bhargavabasava ,Job Cluster + Compute Pools: Long Startup TimesIf you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to cluster provisioning. While compute pools are designed to reduce cold start tim...

  • 0 kudos
2 More Replies
dbernabeuplx
by New Contributor II
  • 1343 Views
  • 5 replies
  • 0 kudos

Resolved! How to delete/empty notebook output

I need to clear cell output in Databricks notebooks using dbutils or the API. As for my requirements, I need to clear it for data security reasons. That is, given a notebook's PATH, I would like to be able to clear all its outputs, as is done through...

Data Engineering
API
Data
issue
Notebooks
  • 1343 Views
  • 5 replies
  • 0 kudos
Latest Reply
srinum89
New Contributor III
  • 0 kudos

For Programmatic approach, you can also clear the each cell output individually using IPython package. Unfortunately, you need to do this in each and every cell. from IPython.display import clear_output clear_output(wait=True) 

  • 0 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels