cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

noorbasha534
by Valued Contributor
  • 504 Views
  • 1 replies
  • 2 kudos

Events subscription in Databricks delta tables

Dear all, We are maintaining a global/enterprise data platform for a customer.We like to capture events based on data streaming happening on Databricks based delta tables. ((data streams run for at least 15 hrs a day; so, events should be generated b...

  • 504 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 2 kudos

Hi noorbasha534,How are you doing today?, As per my understanding, what you're looking to build is a really powerful and smart setup—kind of like a data-driven event notification system for streaming Delta tables. While there’s no out-of-the-box feat...

  • 2 kudos
subhas
by New Contributor II
  • 697 Views
  • 1 replies
  • 1 kudos

Auto Loader bringing NULL Records

Hi        I am using auto loader to fetch some records stored in two files. Please see below my code. It fetches records from two files correctly and then it starts fetching NULL records. I attach option("cleanSource",    ) to  readStream. But it is ...

  • 697 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi subhas,How are you doing today?, As per my understanding, It looks like the issue is happening because you're using /FileStore, which isn’t fully supported by Autoloader’s cleanSource option. Even though the code looks mostly fine, Autoloader expe...

  • 1 kudos
cmathieu
by New Contributor III
  • 511 Views
  • 1 replies
  • 2 kudos

Resolved! OPTIMIZE command on heavily nested table OOM error

I'm trying to run the OPTIMIZE command on a table with less than 2000 rows, but it is causing an out of memory issue. The problem seems to come from the fact that it is a heavily nested table in staging between a json file and flattened table. The ta...

  • 511 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 2 kudos

Hi cmathieu,How are you doing today?, As per my understanding, Yeah, this sounds like one of those cases where the data volume is small, but the complexity of the schema is what’s causing the trouble. If your table has deeply nested JSON structures, ...

  • 2 kudos
abelian-grape
by New Contributor III
  • 918 Views
  • 3 replies
  • 1 kudos

Build a streaming table on top of a snowflake table

Is it possible to create a streaming table on top of a snowflake tablel accessible via the Lakehouse Federation?

  • 918 Views
  • 3 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi abelian-grape,How are you doing today?, As per my understanding, Right now, it’s not possible to create a streaming table directly on top of a Snowflake table that’s accessed through Lakehouse Federation in Databricks. Lakehouse Federation allows ...

  • 1 kudos
2 More Replies
RajNath
by New Contributor II
  • 4579 Views
  • 2 replies
  • 1 kudos

Cost of using delta sharing with unity catalog

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for la...

  • 4579 Views
  • 2 replies
  • 1 kudos
Latest Reply
noorbasha534
Valued Contributor
  • 1 kudos

@RajNath I am also looking for information around this. as far as I understood, it uses provider side compute. did you get the same info?...

  • 1 kudos
1 More Replies
Arvind007
by New Contributor II
  • 1069 Views
  • 3 replies
  • 1 kudos

Resolved! Issue while reading external iceberg table from GCS path using spark SQL

 df = spark.sql("select * from bqms_table;"); df.show();ENV - DBRT 16.3 (includes Apache Spark 3.5.2, Scala 2.12)org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.1 Py4JJavaError: An error occurred while calling o471.showString. : org.apache.spar...

  • 1069 Views
  • 3 replies
  • 1 kudos
Latest Reply
Arvind007
New Contributor II
  • 1 kudos

 I tried given solutions but it seems issue still persist. Appreciate if it can be resolved by Databricks soon for better integration b/w GCP and Databricks .

  • 1 kudos
2 More Replies
abelian-grape
by New Contributor III
  • 838 Views
  • 2 replies
  • 0 kudos

Triggering Downstream Workflow in Databricks from New Inserts in Snowflake

Hi Databricks experts,I have a table in Snowflake that tracks newly added items, and a downstream data processing workflow that needs to be triggered whenever new items are added. I'm currently using Lakehouse Federation to query the Snowflake tables...

  • 838 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 0 kudos

Hi abelian-grape,Great question! Since you're using Lakehouse Federation to access the Snowflake table, and Databricks can't directly stream from or listen to inserts in Snowflake, the best approach is to use an interval-based polling mechanism in Da...

  • 0 kudos
1 More Replies
ChsAIkrishna
by Contributor
  • 2434 Views
  • 1 replies
  • 1 kudos

Vnet Gateway issues on Power bi Conn

Team,We are getting frequent vnet gateway failures on power bi Dataset using DAX(simple DAX not complex) and upon the rerun it is working, is any permanent fix for this ?Error :{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error...

  • 2434 Views
  • 1 replies
  • 1 kudos
Latest Reply
F_Goudarzi
New Contributor III
  • 1 kudos

Similar issue. Any solution?

  • 1 kudos
Nyarish
by Contributor
  • 20395 Views
  • 18 replies
  • 18 kudos

Resolved! How to connect Neo4j aura to a cluster

Please help resolve this error :org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverThis occurs when I want to establish a connection to neo4j aura to my cluster .Thank you.

  • 20395 Views
  • 18 replies
  • 18 kudos
Latest Reply
saab123
New Contributor II
  • 18 kudos

I have added the init script on cluster start up in Spark->config->init.scripts. But my clsuter won't start after this.Cluster scoped init script /Volumes/xxx/xxx/neo4j/neo4j-init.sh failed: Script exit status is non-zero. Could you please help me or...

  • 18 kudos
17 More Replies
jeremy98
by Honored Contributor
  • 712 Views
  • 3 replies
  • 0 kudos

differences between notebooks and notebooks that run inside a job

Hello Community,I'm facing an issue with a job that runs a notebook task. When I run the same join condition through the job pipeline, it produces different results compared to running the notebook interactively (outside the job).Why might this be ha...

  • 712 Views
  • 3 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

 Hi,Thanks for your question!What I'm doing is essentially loading a table from PostgreSQL using a Spark JDBC connection, and also reading the corresponding table from Databricks. I then perform delete, update, and insert operations by comparing the ...

  • 0 kudos
2 More Replies
bhargavabasava
by New Contributor II
  • 1323 Views
  • 3 replies
  • 0 kudos

Resolved! Job compute is taking longer even after using pool

Hi team,We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequen...

  • 1323 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Contributor III
  • 0 kudos

Hey @bhargavabasava ,Job Cluster + Compute Pools: Long Startup TimesIf you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to cluster provisioning. While compute pools are designed to reduce cold start tim...

  • 0 kudos
2 More Replies
dbernabeuplx
by New Contributor II
  • 1122 Views
  • 5 replies
  • 0 kudos

Resolved! How to delete/empty notebook output

I need to clear cell output in Databricks notebooks using dbutils or the API. As for my requirements, I need to clear it for data security reasons. That is, given a notebook's PATH, I would like to be able to clear all its outputs, as is done through...

Data Engineering
API
Data
issue
Notebooks
  • 1122 Views
  • 5 replies
  • 0 kudos
Latest Reply
srinum89
New Contributor III
  • 0 kudos

For Programmatic approach, you can also clear the each cell output individually using IPython package. Unfortunately, you need to do this in each and every cell. from IPython.display import clear_output clear_output(wait=True) 

  • 0 kudos
4 More Replies
amitkamthane
by New Contributor II
  • 935 Views
  • 3 replies
  • 0 kudos

Resolved! Delete files from databricks Volumes based on trigger

Hi,I noticed there's a file arrival trigger option in the workflow but cant see delete trigger option. However, let's say I want to delete files from the Databricks volume based on this trigger, and also remove the corresponding records from the bron...

  • 935 Views
  • 3 replies
  • 0 kudos
Latest Reply
BigRoux
Databricks Employee
  • 0 kudos

Currently, Databricks doesn’t offer a built-in file deletion trigger mechanism similar to the file arrival trigger. The file arrival trigger only monitors for new files being added to a location, not for files being deleted.

  • 0 kudos
2 More Replies
Ambesh
by New Contributor III
  • 16244 Views
  • 8 replies
  • 1 kudos

Reading external Iceberg table

Hi all, I am trying to Read an external Iceberg table.  A separate spark sql script creates my Iceberg table and now i need to read the Iceberg tables(created outside of databricks) from my Databricks notebook. Could someone tell me the approach for ...

  • 16244 Views
  • 8 replies
  • 1 kudos
Latest Reply
Sash
New Contributor II
  • 1 kudos

Hi, I'm facing the same problem.However, when set the access mode to "No isolation shared" I loose access to the external location where the Iceberg table resides. Is there a way to force Spark to NOT use catalog even when in the "Standard (formerly ...

  • 1 kudos
7 More Replies
nielsehlers
by New Contributor
  • 705 Views
  • 1 replies
  • 1 kudos

from_utc_time gives strange results

I don't understand why from_utc_time(col("original_time"), "Europe/Berlin") changes the timestamp instead of just setting the timezone. That's a non-intuitive behaviour.   spark.conf.set("spark.sql.session.timeZone", "UTC")from pyspark.sql import Row...

  • 705 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @nielsehlers! Just to clarify, PySpark's from_utc_timestamp converts a UTC timestamp to the specified timezone (in this case it's Europe/Berlin), adjusting the actual timestamp value rather than just setting timezone metadata. This happens beca...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels