cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

cmathieu
by New Contributor III
  • 764 Views
  • 1 replies
  • 2 kudos

Resolved! OPTIMIZE command on heavily nested table OOM error

I'm trying to run the OPTIMIZE command on a table with less than 2000 rows, but it is causing an out of memory issue. The problem seems to come from the fact that it is a heavily nested table in staging between a json file and flattened table. The ta...

  • 764 Views
  • 1 replies
  • 2 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 2 kudos

Hi cmathieu,How are you doing today?, As per my understanding, Yeah, this sounds like one of those cases where the data volume is small, but the complexity of the schema is what’s causing the trouble. If your table has deeply nested JSON structures, ...

  • 2 kudos
abelian-grape
by New Contributor III
  • 1300 Views
  • 3 replies
  • 1 kudos

Build a streaming table on top of a snowflake table

Is it possible to create a streaming table on top of a snowflake tablel accessible via the Lakehouse Federation?

  • 1300 Views
  • 3 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi abelian-grape,How are you doing today?, As per my understanding, Right now, it’s not possible to create a streaming table directly on top of a Snowflake table that’s accessed through Lakehouse Federation in Databricks. Lakehouse Federation allows ...

  • 1 kudos
2 More Replies
RajNath
by New Contributor II
  • 4986 Views
  • 2 replies
  • 1 kudos

Cost of using delta sharing with unity catalog

I am new to databricks delta sharing. In case of delta sharing, i don't see any cluster running. Tried looking for documentation but only hint i got is, it usage delta sharing server but what is the cost of it and how to configure and optimize for la...

  • 4986 Views
  • 2 replies
  • 1 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 1 kudos

@RajNath I am also looking for information around this. as far as I understood, it uses provider side compute. did you get the same info?...

  • 1 kudos
1 More Replies
Arvind007
by New Contributor II
  • 1750 Views
  • 3 replies
  • 1 kudos

Resolved! Issue while reading external iceberg table from GCS path using spark SQL

 df = spark.sql("select * from bqms_table;"); df.show();ENV - DBRT 16.3 (includes Apache Spark 3.5.2, Scala 2.12)org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.5.1 Py4JJavaError: An error occurred while calling o471.showString. : org.apache.spar...

  • 1750 Views
  • 3 replies
  • 1 kudos
Latest Reply
Arvind007
New Contributor II
  • 1 kudos

 I tried given solutions but it seems issue still persist. Appreciate if it can be resolved by Databricks soon for better integration b/w GCP and Databricks .

  • 1 kudos
2 More Replies
abelian-grape
by New Contributor III
  • 1169 Views
  • 2 replies
  • 0 kudos

Triggering Downstream Workflow in Databricks from New Inserts in Snowflake

Hi Databricks experts,I have a table in Snowflake that tracks newly added items, and a downstream data processing workflow that needs to be triggered whenever new items are added. I'm currently using Lakehouse Federation to query the Snowflake tables...

  • 1169 Views
  • 2 replies
  • 0 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 0 kudos

Hi abelian-grape,Great question! Since you're using Lakehouse Federation to access the Snowflake table, and Databricks can't directly stream from or listen to inserts in Snowflake, the best approach is to use an interval-based polling mechanism in Da...

  • 0 kudos
1 More Replies
ChsAIkrishna
by Contributor
  • 4024 Views
  • 1 replies
  • 1 kudos

Vnet Gateway issues on Power bi Conn

Team,We are getting frequent vnet gateway failures on power bi Dataset using DAX(simple DAX not complex) and upon the rerun it is working, is any permanent fix for this ?Error :{"error":{"code":"DM_GWPipeline_Gateway_MashupDataAccessError","pbi.error...

  • 4024 Views
  • 1 replies
  • 1 kudos
Latest Reply
F_Goudarzi
New Contributor III
  • 1 kudos

Similar issue. Any solution?

  • 1 kudos
Nyarish
by Contributor
  • 23862 Views
  • 18 replies
  • 18 kudos

Resolved! How to connect Neo4j aura to a cluster

Please help resolve this error :org.neo4j.driver.exceptions.SecurityException: Failed to establish secured connection with the serverThis occurs when I want to establish a connection to neo4j aura to my cluster .Thank you.

  • 23862 Views
  • 18 replies
  • 18 kudos
Latest Reply
saab123
New Contributor II
  • 18 kudos

I have added the init script on cluster start up in Spark->config->init.scripts. But my clsuter won't start after this.Cluster scoped init script /Volumes/xxx/xxx/neo4j/neo4j-init.sh failed: Script exit status is non-zero. Could you please help me or...

  • 18 kudos
17 More Replies
jeremy98
by Honored Contributor
  • 1414 Views
  • 3 replies
  • 0 kudos

differences between notebooks and notebooks that run inside a job

Hello Community,I'm facing an issue with a job that runs a notebook task. When I run the same join condition through the job pipeline, it produces different results compared to running the notebook interactively (outside the job).Why might this be ha...

  • 1414 Views
  • 3 replies
  • 0 kudos
Latest Reply
jeremy98
Honored Contributor
  • 0 kudos

 Hi,Thanks for your question!What I'm doing is essentially loading a table from PostgreSQL using a Spark JDBC connection, and also reading the corresponding table from Databricks. I then perform delete, update, and insert operations by comparing the ...

  • 0 kudos
2 More Replies
bhargavabasava
by New Contributor III
  • 2350 Views
  • 3 replies
  • 0 kudos

Resolved! Job compute is taking longer even after using pool

Hi team,We created a workflow and attached it to a job cluster (which is configured to use compute pool). When we run the pipeline, it takes up to 5 minutes to go into clusterReady state and this is adding latency to our use case. Even with subsequen...

  • 2350 Views
  • 3 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hey @bhargavabasava ,Job Cluster + Compute Pools: Long Startup TimesIf you’re using Job Clusters backed by compute pools, the initial delay (~5 minutes) is usually due to cluster provisioning. While compute pools are designed to reduce cold start tim...

  • 0 kudos
2 More Replies
dbernabeuplx
by New Contributor II
  • 2016 Views
  • 5 replies
  • 0 kudos

Resolved! How to delete/empty notebook output

I need to clear cell output in Databricks notebooks using dbutils or the API. As for my requirements, I need to clear it for data security reasons. That is, given a notebook's PATH, I would like to be able to clear all its outputs, as is done through...

Data Engineering
API
Data
issue
Notebooks
  • 2016 Views
  • 5 replies
  • 0 kudos
Latest Reply
srinum89
New Contributor III
  • 0 kudos

For Programmatic approach, you can also clear the each cell output individually using IPython package. Unfortunately, you need to do this in each and every cell. from IPython.display import clear_output clear_output(wait=True) 

  • 0 kudos
4 More Replies
amitkamthane
by New Contributor II
  • 1389 Views
  • 3 replies
  • 0 kudos

Resolved! Delete files from databricks Volumes based on trigger

Hi,I noticed there's a file arrival trigger option in the workflow but cant see delete trigger option. However, let's say I want to delete files from the Databricks volume based on this trigger, and also remove the corresponding records from the bron...

  • 1389 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Currently, Databricks doesn’t offer a built-in file deletion trigger mechanism similar to the file arrival trigger. The file arrival trigger only monitors for new files being added to a location, not for files being deleted.

  • 0 kudos
2 More Replies
Ambesh
by New Contributor III
  • 18980 Views
  • 8 replies
  • 1 kudos

Reading external Iceberg table

Hi all, I am trying to Read an external Iceberg table.  A separate spark sql script creates my Iceberg table and now i need to read the Iceberg tables(created outside of databricks) from my Databricks notebook. Could someone tell me the approach for ...

  • 18980 Views
  • 8 replies
  • 1 kudos
Latest Reply
Sash
New Contributor II
  • 1 kudos

Hi, I'm facing the same problem.However, when set the access mode to "No isolation shared" I loose access to the external location where the Iceberg table resides. Is there a way to force Spark to NOT use catalog even when in the "Standard (formerly ...

  • 1 kudos
7 More Replies
nielsehlers
by New Contributor
  • 1012 Views
  • 1 replies
  • 1 kudos

from_utc_time gives strange results

I don't understand why from_utc_time(col("original_time"), "Europe/Berlin") changes the timestamp instead of just setting the timezone. That's a non-intuitive behaviour.   spark.conf.set("spark.sql.session.timeZone", "UTC")from pyspark.sql import Row...

  • 1012 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @nielsehlers! Just to clarify, PySpark's from_utc_timestamp converts a UTC timestamp to the specified timezone (in this case it's Europe/Berlin), adjusting the actual timestamp value rather than just setting timezone metadata. This happens beca...

  • 1 kudos
al_rammos
by New Contributor II
  • 1588 Views
  • 2 replies
  • 0 kudos

DROP VIEW IF EXISTS Failing on Dynamically Generated Temporary View in Databricks 15.4 LTS

Hello everyone,I'm experiencing a very strange issue with temporary views in Databricks 15.4 LTS that did not occur in 13.3. I have a workflow where I create a temporary view, run a query against it, and then drop it using a DROP VIEW IF EXISTS comma...

  • 1588 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @al_rammos, Thanks for your detail comments and replication of the issue. There have been known issues in recent DBR versions where dynamically created temporary views are not being properly resolved during certain operations due to incorrect sess...

  • 0 kudos
1 More Replies
Volker
by Contributor
  • 1750 Views
  • 4 replies
  • 1 kudos

Retention Period for Parquet Data in e.g. S3 After Dropping a Managed Delta Table

Hey community,I have a question regarding the data retention policy for managed Delta tables stored e.g. in Amazon S3. Specifically:​When a managed Delta table is dropped, what is the retention period for the underlying Parquet data files in S3 befor...

  • 1750 Views
  • 4 replies
  • 1 kudos
Latest Reply
Volker
Contributor
  • 1 kudos

Thanks for the resources!So, to adjust how long Parquet files are stored in the S3 bucket after I drop a table, I would need to adjust the delta.logRetentionDuration, right?And since dropping a Delta table marks the files for deletion after 7 days, I...

  • 1 kudos
3 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels