Data Engineering

Forum Posts

Sorted by:

by code_vibe • New Contributor

03-08-2025 1:41:56 AM

705 Views
1 replies
0 kudos

Delta lake federated table not working as expected

I’m facing an issue while working with federated Redshift tables in Databricks, and I’m hoping someone here can help me out.I have a source table(material) in Redshift that I’m querying through the Delta lake federation in Databricks. when I run the ...

Data Engineering

705 Views
1 replies
0 kudos

03-08-2025 1:41:56 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-10-2025 9:03:54 PM

0 kudos

Hi Code_Vide,How are you doing today?, As per my understanding, It looks like the issue might be due to predicate pushdown not happening when querying the federated Redshift table in Databricks. Predicate pushdown helps filter data at the source (Red...

0 kudos

03-10-2025 9:03:54 PM

by Jorge3 • New Contributor III

03-06-2025 7:17:42 AM

1023 Views
1 replies
0 kudos

Too many small files in the "landing area"

Hello everyone,I’m currently working on a setup where my unprocessed real-time data arrives as .json files in Azure Data Lake Storage (ADLS). Every x minutes, I use Databricks Autoloader to pick up the new data, run my ETL transformations, and store ...

Data Engineering

1023 Views
1 replies
0 kudos

03-06-2025 7:17:42 AM

View Replies

Latest Reply

koji_kawamura
Databricks Employee

03-10-2025 8:03:36 PM

0 kudos

Hi @Jorge3 Since you mentioned the "cloudFiles.useNotifications" option, I assume you know AutoLoader's File Detection Mode. It should be the best solution to your situation. Have you tried it already and encountered an issue? If so, please let us kn...

0 kudos

03-10-2025 8:03:36 PM

by Kayla • Valued Contributor II

03-05-2025 11:38:51 AM

1013 Views
4 replies
3 kudos

Unity Catalog "Sync" Question

I'm having a little trouble fully following the documentation on the SYNC command.I have a table in hive_metastore that still needs to be able to be updated daily for the next few months, but I also need to define a view in Unity Catalog based on tha...

Data Engineering

1013 Views
4 replies
3 kudos

03-05-2025 11:38:51 AM

View Replies

Latest Reply

Nivethan_Venkat
Contributor III

03-05-2025 3:25:15 PM

3 kudos

Hi @Kayla,SYNC command is to sync your hive EXTERNAL table to your Unity Catalog name space. If the table is external, the UC table will be in sync with your external location. If it is hive managed table, you can't use SYNC command to have your mana...

3 kudos

03-05-2025 3:25:15 PM

3 More Replies

by the_dude • New Contributor II

03-07-2025 1:21:40 PM

2572 Views
1 replies
0 kudos

Impossibility to have multiple versions of the same Python package installed

Hello, We package our Spark jobs + utilities in a custom package to be used in wheel tasks in Databricks. In my opinion, having several versions of this job (say "production" and "dev") run on the same cluster against different versions of this custo...

Data Engineering

2572 Views
1 replies
0 kudos

03-07-2025 1:21:40 PM

View Replies

Latest Reply

the_dude
New Contributor II

03-10-2025 9:17:50 AM

0 kudos

If someone comes across this post - as per documentation, library/package installation can be Notebook-scoped. Thus, in order to overcome the limitation described in the initial post instead we are experimenting with Notebook tasks whose only respons...

0 kudos

03-10-2025 9:17:50 AM

by Phani1 • Valued Contributor II

03-10-2025 5:31:16 AM

553 Views
1 replies
0 kudos

Reading Multiple Data Formats

Hi All, I'm looking to develop generic code that can read multiple data formats, such as Parquet, Delta, Iceberg and save it as delta. Can you provide some insights or guidance on how to achieve this?Regards,Phani

Data Engineering

553 Views
1 replies
0 kudos

03-10-2025 5:31:16 AM

View Replies

Latest Reply

Erika_Fonseca
Databricks Employee

03-10-2025 6:53:25 AM

0 kudos

Take a look at these 2 projects that follow a metadata-driven approach: Lakehouse Engine DLT Meta

0 kudos

03-10-2025 6:53:25 AM

by zmsoft • Contributor

02-06-2025 10:25:53 PM

785 Views
1 replies
0 kudos

How to copy file from UC volume to external location folder

Hi there, How to copy file from UC volume to external location folder Thanks&Regards, zmsoft

Data Engineering

785 Views
1 replies
0 kudos

02-06-2025 10:25:53 PM

View Replies

Latest Reply

Advika_
Databricks Employee

03-10-2025 6:27:20 AM

0 kudos

Hello @zmsoft! To copy a file from a UC volume to an external location, you can use: dbutils.fs.cp( "UC_volume_path", "external_location_path" ) Ensure the external location is preconfigured in Unity Catalog and you have the necessary permission...

0 kudos

03-10-2025 6:27:20 AM

by Eduard • New Contributor II

08-23-2023 1:30:35 AM

117638 Views
3 replies
1 kudos

Cluster xxxxxxx was terminated during the run.

Hello,I have a problem with the autoscaling of a cluster. Every time the autoscaling is activated I get this error. Does anyone have any idea why this could be?"Cluster xxxxxxx was terminated during the run (cluster state message: Lost communication ...

Data Engineering

117638 Views
3 replies
1 kudos

08-23-2023 1:30:35 AM

View Replies

Latest Reply

louisgarza
New Contributor II

03-10-2025 4:10:00 AM

1 kudos

Hello Databricks Community,The error message indicates that the driver node was lost, which can happen due to network issues or malfunctioning instances. Here are a few possible reasons and solutions:Instance Instability: If your cloud provider has u...

1 kudos

03-10-2025 4:10:00 AM

2 More Replies

by Fatimah-Tariq • New Contributor III

03-06-2025 12:00:35 PM

841 Views
4 replies
0 kudos

Schema update Issue in DLT

I have a pipeline in databricks with this flowSQL SERVER (Source) -> Staging (Parquet) -> Bronze (DLT) -> Silver(DLT) -> Gold (DLT)The pipeline is up and running smoothly for months but recently, there was a schema update at my source level and one o...

Data Engineering

841 Views
4 replies
0 kudos

03-06-2025 12:00:35 PM

View Replies

Latest Reply

Fatimah-Tariq
New Contributor III

03-10-2025 2:00:30 AM

0 kudos

Hi @Alberto_Umana, is there any word on how to fix my data and bring all the records back to the pipeline schema?

0 kudos

03-10-2025 2:00:30 AM

3 More Replies

by cpayne_vax • New Contributor III

01-17-2024 2:17:00 PM

24363 Views
15 replies
9 kudos

Resolved! Delta Live Tables: dynamic schema

Does anyone know if there's a way to specify an alternate Unity schema in a DLT workflow using the @Dlt.table syntax? In my case, I’m looping through folders in Azure datalake storage to ingest data. I’d like those folders to get created in different...

Data Engineering

24363 Views
15 replies
9 kudos

01-17-2024 2:17:00 PM

View Replies

Latest Reply

abhishek_02
New Contributor II

03-09-2025 10:30:35 PM

9 kudos

Hi @kuldeep-in, Could you please provide the exact location how to disable DPM enabled option as i was not able to locate it in pipeline settings or Databricks settings.Thank you

9 kudos

03-09-2025 10:30:35 PM

14 More Replies

by zmsoft • Contributor

03-09-2025 10:06:12 PM

665 Views
1 replies
0 kudos

Resolved! How do I use the azure databricks dlt pipeline to consume azure Event Center data

Hi there, How do I use the azure databricks dlt pipeline to consume azure Event Hub dataCode :TOPIC = "myeventhub" KAFKA_BROKER = "" GROUP_ID = "group_dev" raw_kafka_events = (spark.readStream .format("kafka") .option("subscribe", EH_NAME) .opt...

Data Engineering

665 Views
1 replies
0 kudos

03-09-2025 10:06:12 PM

View Replies

Latest Reply

ashraf1395
Honored Contributor

03-09-2025 10:28:15 PM

0 kudos

Hi there @zmsoft ,Did you have a look at this ref doc : https://docs.databricks.com/aws/en/dlt/event-hubsThis might help

0 kudos

03-09-2025 10:28:15 PM

by noorbasha534 • Valued Contributor

03-09-2025 4:36:59 PM

2276 Views
0 replies
0 kudos

Databricks Jobs Failure Notification to Azure DevOps as incident

Dear all,Has anyone tried sending Databricks Jobs Failure Notification to Azure DevOps as incident? I see webhook as a OOTB destination for jobs. I am thinking to leverage it. But, like to hear any success stories of it or any other smart approaches....

Data Engineering

2276 Views
0 replies
0 kudos

03-09-2025 4:36:59 PM

by Brianben • New Contributor III

03-03-2025 1:16:40 AM

547 Views
1 replies
0 kudos

Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk

Hi all,We have a daily Databricks job that downloads excel files from SharePoint and read them, the job works fine until today (3March). We are getting the following error message when running the code to read the excel file:org.apache.spark.SparkExc...

Data Engineering

547 Views
1 replies
0 kudos

03-03-2025 1:16:40 AM

View Replies

Latest Reply

Renu_
Valued Contributor II

03-09-2025 10:07:54 AM

0 kudos

I think the issue comes from installing Office365-REST-Python-Client using dbutils.library.installPyPI, which seems to create a conflicting Python environment for Spark executors. Since notebook specific installs modify the environment dynamically, t...

0 kudos

03-09-2025 10:07:54 AM

by xx123 • New Contributor III

03-06-2025 10:23:40 AM

360 Views
1 replies
0 kudos

ETL Pipeline work fine, but when executed via Workflow it fails due to StorageAccessError

I have a fairly simple ETL Pipeline that uses dlt. It streams data from ADLS2 SA and creates materialized view using two tables. It works fine when i execute it on its own. Materialized view is properly refreshed.Now I wanted to add this as a task to...

Data Engineering

360 Views
1 replies
0 kudos

03-06-2025 10:23:40 AM

View Replies

Latest Reply

Nivethan_Venkat
Contributor III

03-09-2025 8:34:57 AM

0 kudos

Hi @xx123,Could you please provide more details on the cluster configuration?. I am guessing the cluster policy you might be using for deploying the job in workflow / and when you are testing might be different. Please try to use same cluster policy ...

0 kudos

03-09-2025 8:34:57 AM

by Nalapriya • New Contributor II

02-11-2025 4:22:45 AM

1202 Views
3 replies
0 kudos

I've data in s3/Iceberg tables. How to read it using databricks SparkSQL ?

I tried this method: df = spark.read.format("iceberg").load("s3-bucket-path")But got an error: Multiple sources found for iceberg (com.databricks.sql.transaction.tahoe.uniform.sources.IcebergBrowseOnlyDataSource, org.apache.iceberg.spark.source.Icebe...

Data Engineering

1202 Views
3 replies
0 kudos

02-11-2025 4:22:45 AM

View Replies

Latest Reply

Nalapriya
New Contributor II

02-20-2025 7:15:49 AM

0 kudos

Hi @Alberto_Umana, I tried the steps you've provided but still I'm not able to read data which is in iceberg format. It would be useful if I get any other suggestions.

0 kudos

02-20-2025 7:15:49 AM

2 More Replies

by Srujanm01 • New Contributor III

03-03-2025 9:27:56 AM

3069 Views
1 replies
0 kudos

Databricks Managed RG Storage cost is High

Hi Community,How to calculate the databricks storage cost and where to see the data which is stored and charged in databricks.I'm trying to understand the storage cost on a managed resource group and i'm clueless about the data and where it is stored...

Data Engineering

3069 Views
1 replies
0 kudos

03-03-2025 9:27:56 AM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

03-08-2025 2:06:16 PM

0 kudos

Hi,How are you doing today? To understand Databricks storage costs in Azure, you can check where your data is stored and how it’s being charged. Managed tables, DBFS files, and Unity Catalog volumes are usually stored in an Azure Data Lake Storage (A...

0 kudos

03-08-2025 2:06:16 PM

Databricks Community

Forum Posts

Delta lake federated table not working as expected

Too many small files in the "landing area"

Unity Catalog "Sync" Question

Impossibility to have multiple versions of the same Python package installed

Reading Multiple Data Formats

How to copy file from UC volume to external location folder

Cluster xxxxxxx was terminated during the run.

Schema update Issue in DLT

Resolved! Delta Live Tables: dynamic schema

Resolved! How do I use the azure databricks dlt pipeline to consume azure Event Center data

Databricks Jobs Failure Notification to Azure DevOps as incident

Getting Errors when reading data from Excel InternalError: pip is not installed for /local_disk

ETL Pipeline work fine, but when executed via Workflow it fails due to StorageAccessError

I've data in s3/Iceberg tables. How to read it using databricks SparkSQL ?

Databricks Managed RG Storage cost is High

Join Us as a Local Community Builder!

How to build Data Pipeline to consume data from Ad...

Cognito as IdP provider for Delta Share

How to Retrieve the spark.statistics.createdAt Whe...

Not able to find lab for Data Engineering Learning...

Lakeflow Connect - Postgres connector