cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

pokus
by New Contributor III
  • 9525 Views
  • 3 replies
  • 2 kudos

Resolved! use DeltaLog class in databricks cluster

I need to use DeltaLog class in the code to get the AddFiles dataset. I have to keep the implemented code in a repo and run it in databricks cluster. Some docs say to use org.apache.spark.sql.delta.DeltaLog class, but it seems databricks gets rid of ...

  • 9525 Views
  • 3 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

Hi @pokus ,  You don't need to access via reflection.  You can Access DeltaLog with spark._jvm:Unity Catalog and DeltaLake tables expose their metadata and transaction log via the JVM backend. Using spark._jvm, you can interact with DeltaLog Thanks!

  • 2 kudos
2 More Replies
Nasd_
by New Contributor II
  • 1011 Views
  • 3 replies
  • 2 kudos

Resolved! Accessing DeltaLog and OptimisticTransaction from PySpark

Hi community,I'm exploring ways to perform low-level, programmatic operations on Delta tables directly from a PySpark environment.The standard delta.tables.DeltaTable Python API is excellent for high-level DML, but it seems to abstract away the core ...

  • 1011 Views
  • 3 replies
  • 2 kudos
Latest Reply
NandiniN
Databricks Employee
  • 2 kudos

For accessing the Databricks pre-installed package's use spark._jvm.com.databricks.sql.transaction.tahoe.DeltaLog  org.apache.spark.sql.delta.DeltaLog would be the OSS jar's classname.  

  • 2 kudos
2 More Replies
Nasd_
by New Contributor II
  • 2180 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to load org.apache.spark.sql.delta classes from JVM pyspark

Hello,I’m working on Databricks with a cluster running Runtime 16.4, which includes Spark 3.5.2 and Scala 2.12.For a specific need, I want to implement my own custom way of writing to Delta tables by manually managing Delta transactions from PySpark....

  • 2180 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hi @Nasd_,  I believe you are trying to use OSS jars on DBR. (Can infer based on class package) org.apache.spark.sql.delta.DeltaLog The error ModuleNotFoundError: No module named 'delta.exceptions.captured'; 'delta.exceptions' is not a package can be...

  • 1 kudos
LeoRickli
by New Contributor II
  • 964 Views
  • 2 replies
  • 0 kudos

Databricks Asset Bundles fails deploy but works on the GUI with same parameters

I'm running into an issue when running databricks bundle deploy when using job clusters.When I run databricks bundle deploy on a new workspace or after destroying previous resources, the deployment fails with the error: Error: cannot update job: At l...

  • 964 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hello @LeoRickli  Are you setting apply_policy_default_values? https://docs.databricks.com/en/administration-guide/clusters/policies.html#:~:text=Default%20values%20don't%20automatically,not%20needed%20for%20fixed%20policies. After you update a polic...

  • 0 kudos
1 More Replies
benesq
by New Contributor
  • 1687 Views
  • 1 replies
  • 1 kudos

Resolved! JDBC driver uses Unsafe API, which will be completely deprecated in a future release of Java

Using JDBC driver (2.7.3) in OpenJDK 24 gives the following warning:WARNING: A terminally deprecated method in sun.misc.Unsafe has been called WARNING: sun.misc.Unsafe::arrayBaseOffset has been called by com.databricks.client.jdbc42.internal.apache.a...

  • 1687 Views
  • 1 replies
  • 1 kudos
Latest Reply
NandiniN
Databricks Employee
  • 1 kudos

Hey @benesq ,  For JDBC driver 2.7.4 https://www.databricks.com/spark/jdbc-drivers-download should be used with Java Runtime Environment (JRE) 8.0, 11.0 or 21.0. As mentioned in the installation doc "Each machine where you use the Databricks JDBC Dri...

  • 1 kudos
AlbertWang
by Valued Contributor
  • 4116 Views
  • 7 replies
  • 3 kudos

Resolved! Azure Databricks Unity Catalog - cannot access managed volume in notebook

We have set up Azure Databricks with Unity Catalog (Metastore).Used Managed Identity (Databricks Access Connector) for connection from workspace(s) to ADLS Gen2ADLS Gen2 storage account has Storage Blob Data Contributor and Storage Queue Data Contrib...

  • 4116 Views
  • 7 replies
  • 3 kudos
Latest Reply
fifata
New Contributor II
  • 3 kudos

@AlbertWang @VAMSaha22 Since you want private connectivity I assume you have a vnet and a PE associated with the gen2 account. That PE needs to have a sub-resource of type dfs when the storage account is gen2/hierarchical namespace. You might want to...

  • 3 kudos
6 More Replies
Mildred
by New Contributor
  • 1941 Views
  • 1 replies
  • 0 kudos

Resolved! Parameter "expand_tasks" on List job runs request seams not to be working (databricsk api)

I'm setting it as True, but it doesn't return the cluster_instance info. Here is the function I'm using:def get_job_runs(job_id): """ Fetches job runs for a specific job from Databricks Jobs API. """ headers = { "Authorization...

  • 1941 Views
  • 1 replies
  • 0 kudos
Latest Reply
Krishna_S
Databricks Employee
  • 0 kudos

Hi @Mildred  The way you passed the data for the expand_tasks parameter is wrong: data = { data = { "job_id": job_id, "expand_tasks": "true" } It should not be passed as Python boolean values, but as a string "true" or "false" Once you do that will...

  • 0 kudos
DiskoSuperStar
by New Contributor
  • 207 Views
  • 1 replies
  • 0 kudos

DLT Flow Redeclaration Error After Service Upgrade

Hi, our delta live tables(Lakeflow declarative pipelines) pipeline started failing after the Sep 30 / Oct 1 service upgrade with the following error :AnalysisException: Cannot have multiple queries named `<table_name>_realtime_flow` for `<table_name>...

  • 207 Views
  • 1 replies
  • 0 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 0 kudos

Hi @DiskoSuperStar IT seems you’ve run into a recently enforced change in Databricks DLT/Lakeflow:Multiple flows (append or otherwise) targeting the same table must have unique names. actually it looks correct on your code. Check if your  table_info ...

  • 0 kudos
Gvnreddy
by New Contributor II
  • 411 Views
  • 3 replies
  • 4 kudos

Need Help to learn scala

Hi Enthusiasts, recently i joined company in that company they used to develope databricks notebook with Scala programming language perviously, i worked on Pyspark it was very easy for me by the way i have 3 years of experence in DE i need help to wh...

  • 411 Views
  • 3 replies
  • 4 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 4 kudos

fwiw: you do not have to be a scala wiz to work on spark in scala.Older spark articles are often about scala in spark (before python took over).You will notice it is a lot like pyspark, but way way better.  typing, immutability, things like leftfold ...

  • 4 kudos
2 More Replies
john77
by New Contributor II
  • 576 Views
  • 5 replies
  • 1 kudos

Why ETL Pipelines and Jobs

I do notice that ETL Pipelines let's you run declarative SQL syntax such as DLT tables but you can do the same with Jobs if you use SQL as your task type. So why and when to use ETL Pipelines?

  • 576 Views
  • 5 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

Hi @john77 SQL Task Type : Simple, one-off SQL operations or batch jobs + you need to orchestrate a mix of notebooks, Python/Scala code, and SQL in a single workflowLakeflow Declarative Pipelines : Complex , production ETL jobs requires lineage , mon...

  • 1 kudos
4 More Replies
DivyaKumar
by New Contributor
  • 176 Views
  • 1 replies
  • 0 kudos

Databricks to Dataverse migration via ADF copy data

Hi team,I need to load data from databricks delta tables to dataverse tables and I have one unique id column which I am ensuring via mapping. Its datatype is GUID in dataverse and string in delta table. I ensured that column holds unique values. Sinc...

  • 176 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

That is not a valid guid.Dataverse will check this.http://guid.us/test/guid

  • 0 kudos
noorbasha534
by Valued Contributor II
  • 523 Views
  • 6 replies
  • 4 kudos

Resolved! Figure out stale tables/folders being loaded by auto-loader

Hello allWe have a pipeline which uses auto-loader to load data from cloud object storage (ADLS) to a delta table. We use directory listing at the moment. And there exist around 20000 folders to be verified in ADLS every 30 mins to check for new data...

  • 523 Views
  • 6 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 4 kudos

@Krishna_S I didn't know about file detection modes, that's very cool! .@noorbasha534 according to the documentation, there is a piece around RockDB: https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/#how-does-auto-loader-...

  • 4 kudos
5 More Replies
billfoster
by New Contributor II
  • 25860 Views
  • 10 replies
  • 7 kudos

how can I learn DataBricks

I am currently enrolled in data engineering boot camp. We go over various technologies azure , pyspark , airflow , Hadoop ,nosql,SQL, python. But not over something like databricks. I am in contact with lots of recent graduates who landed a job. Almo...

  • 25860 Views
  • 10 replies
  • 7 kudos
Latest Reply
SprunkiRetake
New Contributor II
  • 7 kudos

yes, i often refer to the helpful tutorials at https://www.youtube.com/c/AdvancingAnalytics?reload=9&app=desktop Sprunki Retake

  • 7 kudos
9 More Replies
devagya
by New Contributor
  • 1102 Views
  • 3 replies
  • 1 kudos

Infor Data Lake to Databricks

I'm working on this project which involves moving data from Infor to Databricks.Infor is somewhat of an enterprise solution. I could not find much resources on this. I could not even find any free trial option on their site.If anyone has experience w...

  • 1102 Views
  • 3 replies
  • 1 kudos
Latest Reply
Shirlzz
New Contributor II
  • 1 kudos

I specialise in data migration with Infor.What is your question, how to connect databricks to the infor datalake through the data fabric?

  • 1 kudos
2 More Replies
leireroman
by New Contributor III
  • 3044 Views
  • 2 replies
  • 2 kudos

Resolved! DBR 16.4 LTS - Spark 3.5.2 is not compatible with Delta Lake 3.3.1

I'm migrating to Databricks Runtime 16.4 LTS, which is using Spark 3.5.2 and Delta Lake 3.3.1 according to the documentation: Databricks Runtime 16.4 LTS - Azure Databricks | Microsoft LearnI've upgraded my conda environment to use those versions, bu...

Captura de pantalla 2025-06-09 084355.png
  • 3044 Views
  • 2 replies
  • 2 kudos
Latest Reply
SamAdams
Contributor
  • 2 kudos

@leireroman encountered the same and used an override (like a pip constraints.txt file or PDM resolution override specification) to make sure my local development environment matched the runtime.

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels