Data Engineering

Forum Posts

Sorted by:

by TobyE • New Contributor III

07-03-2025 6:43:56 AM

3026 Views
3 replies
2 kudos

Resolved! Problem using dbutils to access local files

Greetings, total newbie here, please advise if this is an inappropriate forum. I'm trying to learn some basics.I have tried to copy a file from the local file system onto dbfs, and while it fails in python, it succeeds in the command line. The error ...

Data Engineering

3026 Views
3 replies
2 kudos

07-03-2025 6:43:56 AM

View Replies

Latest Reply

TobyE
New Contributor III

07-03-2025 8:37:41 AM

2 kudos

Well, it was indeed a very fundamental misunderstanding. In the UI, I had not connected to my own compute cluster, it was still running "serverless": So, I guess it's not surprising the process didn't have any privileges, even though the terminal tha...

2 kudos

07-03-2025 8:37:41 AM

2 More Replies

by seefoods • Valued Contributor

07-02-2025 2:51:25 AM

2240 Views
3 replies
0 kudos

autoloader running task batch

Hello Guys, I run task in batch mode with autoloader i enable option trigger (available now true). So, when my script finish, the continue running. Someone know who's happen? Cordially ;

Data Engineering

2240 Views
3 replies
0 kudos

07-02-2025 2:51:25 AM

View Replies

Latest Reply

seefoods
Valued Contributor

07-03-2025 6:12:36 AM

0 kudos

this is my script I enable this options when i read files on Volumes before write on delta table(reader_stream.option("cloudFiles.format", self.file_format) .option("cloudFiles.schemaLocation", self.schema_location) ...

0 kudos

07-03-2025 6:12:36 AM

2 More Replies

by Diogo_W • New Contributor III

10-26-2023 1:44:35 PM

8152 Views
3 replies
1 kudos

Resolved! Spark in not executing any tasks

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.Even for very simple code it hangs forever.Anyone ever faced something similar? Our infra is AWS.

Data Engineering

8152 Views
3 replies
1 kudos

10-26-2023 1:44:35 PM

View Replies

Latest Reply

Diogo_W
New Contributor III

10-27-2023 2:01:15 PM

1 kudos

Found the solution: Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

1 kudos

10-27-2023 2:01:15 PM

2 More Replies

by Manoranjan • New Contributor II

07-02-2025 7:14:21 AM

859 Views
2 replies
0 kudos

Automate data bricks batch which copy data from file/database and put it in some other file/database

Hi,I have created a data bricks batch which copy data from a file/database table and put it in some other file/database table.I want to automate the functional test cases of this data brick batch.Can someone suggest, is there any tool available for t...

Data Engineering

859 Views
2 replies
0 kudos

07-02-2025 7:14:21 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

07-02-2025 10:39:25 AM

0 kudos

Create a workflow for your batch load notebook and then schedule it.https://docs.databricks.com/gcp/en/jobs/jobs-quickstart

0 kudos

07-02-2025 10:39:25 AM

1 More Replies

by Pavel_Soucek • New Contributor II

06-29-2025 12:25:15 PM

3048 Views
4 replies
0 kudos

Upload xlsx file with API

Hello, I would like to upload xlsx files from SharePoint to Databricks (AWS).I have files in my Sharepoint folder, try to use power automation (like if new file is created upload file to Databricks) and custom connector (where I defined API /api/2.0/...

Data Engineering

3048 Views
4 replies
0 kudos

06-29-2025 12:25:15 PM

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

06-30-2025 4:02:12 AM

0 kudos

Hi @Pavel_Soucek ,The Excel files are binary formats. I think when you are using decodeBase64() on the file content, you might corrupting the file accidently. I think Power Automate already returns the binary data in a format suitable for sending in ...

0 kudos

06-30-2025 4:02:12 AM

3 More Replies

by ruoyuqian • New Contributor II

08-07-2024 10:03:15 PM

9812 Views
7 replies
7 kudos

How to print out logs during DLT pipeline run

I'm trying to debug my pipeline in DLT and during runtime I need some log info and how do I do a print('something') during DLT run?

Data Engineering

9812 Views
7 replies
7 kudos

08-07-2024 10:03:15 PM

View Replies

Latest Reply

User16871418122
Databricks Employee

06-04-2025 10:14:25 AM

7 kudos

We can try emitting logs to stdout/stderr: The below sample code worked in UC dlt cluster - dlt:16.4.0-delta-pipelines-photon-dlt-release-dp-2025.20-rc0-commit-fcedf0a-image-be34de2import dlt from pyspark.sql.functions import col from utilities impor...

7 kudos

06-04-2025 10:14:25 AM

6 More Replies

by lucami • Contributor

07-02-2025 6:53:56 AM

988 Views
1 replies
1 kudos

Notifications for Data Ingestion in Declarative Pipelines Using Auto Loader

Is there a way to add a notification or set a metric threshold if a declarative pipeline (based on Auto Loader) ingests no data (I can see only max duration/backlog metrics in the workflow ui)?

Data Engineering

988 Views
1 replies
1 kudos

07-02-2025 6:53:56 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

07-02-2025 12:16:01 PM

1 kudos

Yes, there are several ways to detect and get notified when your Auto Loader pipeline ingests no data.Here are the most effective approaches:1. Streaming Backlog (Records) MetricThe Streaming backlog (records) metric you see in the UI can actually he...

1 kudos

07-02-2025 12:16:01 PM

by lucami • Contributor

07-02-2025 4:41:56 AM

3519 Views
2 replies
2 kudos

Resolved! Access Azure storage with serverless compute

I would like to know how to connect to Azure Blob Storage in a Python job inside a workflow with serverless cluster. When working with a non-serverless cluster or with serverless in a declarative pipeline, I would typically set the Azure storage acco...

Data Engineering

3519 Views
2 replies
2 kudos

07-02-2025 4:41:56 AM

View Replies

Latest Reply

nayan_wylde
Esteemed Contributor II

07-02-2025 10:30:22 AM

2 kudos

Use the below code in your notebook. You cannot set spark config in serverless as there is no advanced options in cluster.credential_id = dbutils.secrets.get(scope="{scope_name}",key="{app_id}") credential_key = dbutils.secrets.get(scope="{scope_name...

2 kudos

07-02-2025 10:30:22 AM

1 More Replies

by databicky • Contributor II

11-03-2023 2:25:28 AM

2096 Views
2 replies
2 kudos

how to edit or delete a post in this community after posted?

when trying to edit the post i am not able see the edit option there. @Retired_mod

Screenshot_2023-11-03-14-51-47-37_40deb401b9ffe8e1df2f1cc5ba480b12.jpg

Data Engineering

2096 Views
2 replies
2 kudos

11-03-2023 2:25:28 AM

View Replies

Latest Reply

wamu
New Contributor II

07-02-2025 8:54:31 AM

2 kudos

It’s actually pretty simple, just click the three dots on the top right of your post, and you’ll see options to edit or delete. Easy to miss at first, but once you see it, it’s straightforward.

2 kudos

07-02-2025 8:54:31 AM

1 More Replies

by Murtaza-007-007 • Databricks Partner

07-01-2025 10:32:35 AM

1534 Views
5 replies
0 kudos

How to import Class Room Setup Scripts -03.4

I am learning Databricks Data engineering certificate and during the course i try to load the class room scripts in my databricks community edition and i am getting following error message. I am relatively new to databricks

Data Engineering

1534 Views
5 replies
0 kudos

07-01-2025 10:32:35 AM

View Replies

Latest Reply

Murtaza-007-007
Databricks Partner

07-02-2025 7:33:43 AM

0 kudos

@nayan_wylde , szymon_dybczakEven if i can complete the few courses then it should be fine for me. please share step by step guide how to import these libraries in personal workspace and run notebooks.

0 kudos

07-02-2025 7:33:43 AM

4 More Replies

by joao_vnb • New Contributor III

02-08-2023 7:36:02 AM

68384 Views
8 replies
11 kudos

Resolved! Automate the Databricks workflow deployment

Hi everyone,Do you guys know if it's possible to automate the Databricks workflow deployment through azure devops (like what we do with the deployment of notebooks)?

Data Engineering

68384 Views
8 replies
11 kudos

02-08-2023 7:36:02 AM

View Replies

Latest Reply

asingamaneni
New Contributor II

02-12-2024 10:46:13 AM

11 kudos

Did you get a chance to try Brickflows - https://github.com/Nike-Inc/brickflowYou can find the documentation here - https://engineering.nike.com/brickflow/v0.11.2/Brickflow uses - Databricks Asset Bundles(DAB) under the hood but provides a Pythonic w...

11 kudos

02-12-2024 10:46:13 AM

7 More Replies

by Parth2692 • Databricks Partner

06-30-2025 4:24:41 AM

772 Views
1 replies
0 kudos

org.apache.spark.SparkException: Job aborted due to stage failure: org.apache.spark.memory.SparkOutO

Hi everyone,I'm using a serverless cluster and encountering an issue where my code runs fine when executed cell-by-cell in a notebook, but fails with a memory error when executed as a job. Interestingly, the same job runs successfully in our dev envi...

Data Engineering

772 Views
1 replies
0 kudos

06-30-2025 4:24:41 AM

View Replies

Latest Reply

Advika
Community Manager

07-02-2025 7:01:11 AM

0 kudos

Hello @Parth2692! It’s possible that your dev and prod environments have different serverless configurations, which could explain the difference in behavior. You can try increasing the notebook memory by switching from Standard to High in the Environ...

0 kudos

07-02-2025 7:01:11 AM

by divyansh8989 • New Contributor

07-02-2025 2:47:30 AM

2896 Views
1 replies
0 kudos

Autoloader with availableNow=True and overwrite mode removes data in second micro-batch (DBR 16.3)

Hi everyone,I'm encountering an issue after upgrading to Databricks Runtime 16.3, while using Autoloader with the following configuration:trigger(availableNow=True)outputMode("overwrite")When a new file arrives, Autoloader processes it and writes the...

Data Engineering

2896 Views
1 replies
0 kudos

07-02-2025 2:47:30 AM

View Replies

Latest Reply

ashesharyak
New Contributor II

07-02-2025 6:06:50 AM

0 kudos

You've hit on a known behavioral change or subtle interaction in Databricks Runtime 16.3 with Autoloader, trigger(availableNow=True), and outputMode("overwrite"). This specific combination seems to be causing an unexpected second micro-batch that ove...

0 kudos

07-02-2025 6:06:50 AM

by ChrisLawford_n1 • Contributor II

07-02-2025 4:08:02 AM

4343 Views
3 replies
3 kudos

Resolved! How to use bundle substitutions in %pip install for Lakeflow Declarative Pipelines

Hello,When defining a Lakeflow Declarative Pipeline (DLT pipeline) I would like to allow the installation of a whl file to be dictated by the user running the pipeline. This will allow the notebook to have the pip installs at the top be agnostic of t...

Data Engineering

4343 Views
3 replies
3 kudos

07-02-2025 4:08:02 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

07-02-2025 5:16:14 AM

3 kudos

Glad to help and feel free to "Accept as Solution" to help others in the same boat Cheers, Lou.

3 kudos

07-02-2025 5:16:14 AM

2 More Replies

by jeremy98 • Honored Contributor

07-02-2025 12:19:30 AM

1226 Views
2 replies
2 kudos

Resolved! How to deploy a DLT Pipeline?

Hi community,My team and I have been working on manually creating our first DLT pipeline. However, when we tried importing it into DABs, we encountered an issue in the dev workspace: we are unable to deploy the same DLT pipeline multiple times becaus...

Data Engineering

1226 Views
2 replies
2 kudos

07-02-2025 12:19:30 AM

View Replies

Latest Reply

jeremy98
Honored Contributor

07-02-2025 3:44:43 AM

2 kudos

Hello,Thanks for your response! Duplicating the catalog for this does feel a bit unusual. I understand the reasoning behind it, though it’s not the cleanest approach. Still, I suppose it’s acceptable for a DEV workspace.Thanks again!

2 kudos

07-02-2025 3:44:43 AM

1 More Replies

Databricks Community

Forum Posts

Resolved! Problem using dbutils to access local files

autoloader running task batch

Resolved! Spark in not executing any tasks

Automate data bricks batch which copy data from file/database and put it in some other file/database

Upload xlsx file with API

How to print out logs during DLT pipeline run

Notifications for Data Ingestion in Declarative Pipelines Using Auto Loader

Resolved! Access Azure storage with serverless compute

how to edit or delete a post in this community after posted?

How to import Class Room Setup Scripts -03.4

Resolved! Automate the Databricks workflow deployment

org.apache.spark.SparkException: Job aborted due to stage failure: org.apache.spark.memory.SparkOutO

Autoloader with availableNow=True and overwrite mode removes data in second micro-batch (DBR 16.3)

Resolved! How to use bundle substitutions in %pip install for Lakeflow Declarative Pipelines

Resolved! How to deploy a DLT Pipeline?

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template