cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

eyalholzmann
by New Contributor
  • 49 Views
  • 2 replies
  • 0 kudos

Does VACUUM on Delta Lake also clean Iceberg metadata when using Iceberg Uniform feature?

I'm working with Delta tables using the Iceberg Uniform feature to enable Iceberg-compatible reads. I’m trying to understand how metadata cleanup works in this setup.Specifically, does the VACUUM operation—which removes old Delta Lake metadata based ...

  • 49 Views
  • 2 replies
  • 0 kudos
Latest Reply
eyalholzmann
New Contributor
  • 0 kudos

Which actions should be used to clean up and maintain Iceberg metadata?expireSnapshots: Is it recommended to delete old snapshots using the same retention period as the Delta table?deleteOrphanFiles: This deletes unreferenced Iceberg metadata as well...

  • 0 kudos
1 More Replies
SumitB14
by Visitor
  • 8 Views
  • 0 replies
  • 0 kudos

Databricks Nested Json Flattening

Hi Databricks Community,I am facing an issue while exploding nested JSON data.In the content column, I have dynamic nested JSON, and I am using the below approach to parse and explode it.from pyspark.sql import SparkSessionfrom pyspark.sql.functions ...

  • 8 Views
  • 0 replies
  • 0 kudos
minhhung0507
by Valued Contributor
  • 1732 Views
  • 5 replies
  • 1 kudos

DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] Error in Streaming Table with Minimal

Dear Databricks Experts,I am encountering a recurring issue while working with Delta streaming tables in my system. The error message is as follows: com.databricks.sql.transaction.tahoe.DeltaFileNotFoundException: [DELTA_TRUNCATED_TRANSACTION_LOG] gs...

minhhung0507_0-1738728278906.png minhhung0507_1-1738728343460.png
  • 1732 Views
  • 5 replies
  • 1 kudos
Latest Reply
gbrueckl
Contributor II
  • 1 kudos

I would assume it is trying to read v899 because in you read up until v898 in the last [streaming]batch and stored the state in the streaming checkpoint. Now, if you run the code again and continue the stream, it tries to pick up from the first versi...

  • 1 kudos
4 More Replies
GANAPATI_HEGDE
by New Contributor III
  • 59 Views
  • 2 replies
  • 0 kudos

Unable to configure custom compute for DLT pipeline

I am trying to configure cluster for a pipeline like above, However dlt keeps using the small cluster as usual, how to resolve this? 

GANAPATI_HEGDE_0-1762754316899.png GANAPATI_HEGDE_1-1762754398253.png
  • 59 Views
  • 2 replies
  • 0 kudos
Latest Reply
GANAPATI_HEGDE
New Contributor III
  • 0 kudos

i updated my CLI and deployed the job, still i dont see the clusters updates in  pipeline

  • 0 kudos
1 More Replies
lecarusin
by Visitor
  • 33 Views
  • 1 replies
  • 1 kudos

Help regarding a python notebook and s3 file structure

Hello all, I am new to this forum, so please forgive if I am posting in the wrong location (I'd appreciate if the post is moved by mods or am told where to post).I am looking for help with an optimization of a python code I have. This python notebook...

  • 33 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @lecarusin , You can absolutely make Databricks only read the dates you care about. The trick is to constrain the input paths  (so Spark lists only those folders) instead of reading the whole directory.   Build the exact S3 prefixes for your da...

  • 1 kudos
sparmar
by New Contributor
  • 3546 Views
  • 1 replies
  • 0 kudos

I am Getting SSLError(SSLEOFError) error while triggering Azure DevOps pipeline from Databricks

While triggering Azure devOps pipleline from Databricks, I am getting below error:An error occurred: HTTPSConnectionPool(host='dev.azure.com', port=443): Max retries exceeded with url: /XXX-devops/XXXDevOps/_apis/pipelines/20250224.1/runs?api-version...

  • 3546 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re seeing (SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:1147)')) while triggering the Azure DevOps pipeline from Databricks indicates an issue with the SSL/TLS handshake, not the firewall or certificate itself. This is ...

  • 0 kudos
Amit_Dass_Chmp
by New Contributor III
  • 3028 Views
  • 1 replies
  • 0 kudos

query on Databricks Arc :ARC will not work on 13.x or greater runtime

I have a query on Databricks Arc , is this statement true - Databricks Runtime Requirements for implementing Arc:ARC requires Databricks ML Runtime 12.2LTS. ARC will not work on 13.x or greater runtime

  • 3028 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The statement is true: Databricks Arc requires the Databricks ML Runtime 12.2 LTS and will not work on 13.x or greater runtimes. This requirement is confirmed by multiple Databricks Community discussions and documentation, which specifically state th...

  • 0 kudos
j_h_robinson
by New Contributor II
  • 3102 Views
  • 1 replies
  • 0 kudos

GitHub CI/CD Best Practices

Using GitHub, what are some best-practice CI/CD approaches to use specifically with the silver and gold medallion layers? We want to create the bronze, silver, and gold layers in Databricks notebooks.Also, is using notebooks in production a "best pra...

  • 3102 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

For Databricks projects using the medallion architecture (bronze, silver, gold layers), effective CI/CD strategies on GitHub include strict version control, environment isolation, automated testing and deployments, and careful notebook management—all...

  • 0 kudos
SObiero
by New Contributor
  • 3327 Views
  • 1 replies
  • 0 kudos

Passing Microsoft MFA Auth from Databricks to MSSQL Managed Instance in a Databricks FastAPI App

I have a Databricks App built using FastAPI. Users access this App after authenticating with Microsoft MFA on Databricks Azure Cloud. The App connects to an MSSQL Managed Instance (MI) that also supports Microsoft MFA.I want the authenticated user's ...

  • 3327 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

It is not possible in Databricks to seamlessly pass each authenticated user's Azure/MS identity from a web app running on Databricks to MSSQL MI for per-user MFA authentication, in the way your development code does. This limitation stems from how id...

  • 0 kudos
kanikeom
by New Contributor II
  • 3613 Views
  • 2 replies
  • 2 kudos

Asset Bundle API update issues

I was working on a proof of concept (POC) using the assert bundle. My job configuration in the .yml file worked yesterday, but it threw an error today during a demo to the team.The error was likely due to an update to the Databricks API. After some t...

  • 3613 Views
  • 2 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

Unexpected breaking changes to APIs—especially from cloud platforms like Databricks—can disrupt projects and demos. Proactively anticipating and rapidly adapting to such updates requires a combination of monitoring, process improvements, and technica...

  • 2 kudos
1 More Replies
jeremy98
by Honored Contributor
  • 3323 Views
  • 2 replies
  • 0 kudos

if else condition task doubt

Hi community,The if else condition task couldn't be used as real if condition? Seems that if the condition goes to False the entire job will be stop. Is it a right behaviour?

  • 3323 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

In Databricks workflows, the "if-else" condition and depends_on logic do not behave exactly like standard programming if-else statements. If a task depends on another task's outcome and that outcome does not match (for example, the condition is false...

  • 0 kudos
1 More Replies
Carl_B
by New Contributor II
  • 3761 Views
  • 1 replies
  • 0 kudos

ImportError: cannot import name 'override' from 'typing_extensions'

Hello,I'm facing an ImportError when trying to run my OpenAI-based summarization script in.The error message is:ImportError: cannot import name 'override' from 'typing_extensions' (/databricks/python/lib/python3.10/site-packages/typing_extensions.py)...

  • 3761 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

This error is caused by a version mismatch between the OpenAI Python package and the typing_extensions library in your Databricks environment. The 'override' symbol is relatively new and only exists in typing_extensions version 4.5.0 and above; some ...

  • 0 kudos
SQLBob
by New Contributor II
  • 3506 Views
  • 2 replies
  • 0 kudos

Unity Catalog Python UDF to Send Messages to MS Teams

Good Morning All - This didn't seem like such a daunting task until I tried it. Of course, it's my very first function in Unity Catalog. Attached are images of both the UDF and example usage I created to send messages via the Python requests library ...

  • 3506 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You're encountering a common limitation when trying to use an external HTTP request (like the Python requests library) inside a Unity Catalog UDF in Databricks. While your code is correct for a regular notebook environment, Unity Catalog UDFs (and, s...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels