cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

TobyE
by New Contributor III
  • 3026 Views
  • 3 replies
  • 2 kudos

Resolved! Problem using dbutils to access local files

Greetings, total newbie here, please advise if this is an inappropriate forum. I'm trying to learn some basics.I have tried to copy a file from the local file system onto dbfs, and while it fails in python, it succeeds in the command line. The error ...

Screenshot from 2025-07-03 07-33-30.png
  • 3026 Views
  • 3 replies
  • 2 kudos
Latest Reply
TobyE
New Contributor III
  • 2 kudos

Well, it was indeed a very fundamental misunderstanding. In the UI, I had not connected to my own compute cluster, it was still running "serverless": So, I guess it's not surprising the process didn't have any privileges, even though the terminal tha...

  • 2 kudos
2 More Replies
seefoods
by Valued Contributor
  • 2240 Views
  • 3 replies
  • 0 kudos

autoloader running task batch

Hello Guys, I run task in batch mode with autoloader i enable option trigger (available now true). So, when my script finish, the continue running. Someone know who's happen? Cordially ;

  • 2240 Views
  • 3 replies
  • 0 kudos
Latest Reply
seefoods
Valued Contributor
  • 0 kudos

this is my script I enable this options when i read files on Volumes before write on delta table(reader_stream.option("cloudFiles.format", self.file_format) .option("cloudFiles.schemaLocation", self.schema_location) ...

  • 0 kudos
2 More Replies
Diogo_W
by New Contributor III
  • 8152 Views
  • 3 replies
  • 1 kudos

Resolved! Spark in not executing any tasks

I have an issue where Spark in not submiting any task, on any worksapce or cluster, even SQLWarehouse.Even for very simple code it hangs forever.Anyone ever faced something similar? Our infra is AWS. 

Diogo_W_0-1698352974280.png Diogo_W_1-1698353051402.png
  • 8152 Views
  • 3 replies
  • 1 kudos
Latest Reply
Diogo_W
New Contributor III
  • 1 kudos

Found the solution: Turned out to be an issue with the Security Groups. The internal security group communication was not open to all ports for TCP and UDP. After fixing that the jobs ran fine. Seems like we did require more workers too.

  • 1 kudos
2 More Replies
Manoranjan
by New Contributor II
  • 859 Views
  • 2 replies
  • 0 kudos

Automate data bricks batch which copy data from file/database and put it in some other file/database

Hi,I have created a data bricks batch which copy data from a file/database table and put it in some other file/database table.I want to automate the functional test cases of this data brick batch.Can someone suggest, is there any tool available for t...

  • 859 Views
  • 2 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

Create a workflow for your batch load notebook and then schedule it.https://docs.databricks.com/gcp/en/jobs/jobs-quickstart

  • 0 kudos
1 More Replies
Pavel_Soucek
by New Contributor II
  • 3048 Views
  • 4 replies
  • 0 kudos

Upload xlsx file with API

Hello, I would like to upload xlsx files from SharePoint to Databricks (AWS).I have files in my Sharepoint folder, try to use power automation (like if new file is created upload file to Databricks) and custom connector (where I defined API /api/2.0/...

  • 3048 Views
  • 4 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Pavel_Soucek ,The Excel files are binary formats. I think when you are using decodeBase64() on the file content, you might corrupting the file accidently. I think Power Automate already returns the binary data in a format suitable for sending in ...

  • 0 kudos
3 More Replies
ruoyuqian
by New Contributor II
  • 9812 Views
  • 7 replies
  • 7 kudos

How to print out logs during DLT pipeline run

I'm trying to debug my pipeline in DLT and during runtime I need some log info and how do I do a print('something') during DLT run?

  • 9812 Views
  • 7 replies
  • 7 kudos
Latest Reply
User16871418122
Databricks Employee
  • 7 kudos

We can try emitting logs to stdout/stderr: The below sample code worked in UC dlt cluster - dlt:16.4.0-delta-pipelines-photon-dlt-release-dp-2025.20-rc0-commit-fcedf0a-image-be34de2import dlt from pyspark.sql.functions import col from utilities impor...

  • 7 kudos
6 More Replies
lucami
by Contributor
  • 988 Views
  • 1 replies
  • 1 kudos

Notifications for Data Ingestion in Declarative Pipelines Using Auto Loader

Is there a way to add a notification or set a metric threshold if a declarative pipeline (based on Auto Loader) ingests no data (I can see only max duration/backlog metrics in the workflow ui)? 

mai_luca_0-1751464022554.png
  • 988 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Yes, there are several ways to detect and get notified when your Auto Loader pipeline ingests no data.Here are the most effective approaches:1. Streaming Backlog (Records) MetricThe Streaming backlog (records) metric you see in the UI can actually he...

  • 1 kudos
lucami
by Contributor
  • 3519 Views
  • 2 replies
  • 2 kudos

Resolved! Access Azure storage with serverless compute

I would like to know how to connect to Azure Blob Storage in a Python job inside a workflow with serverless cluster. When working with a non-serverless cluster or with serverless in a declarative pipeline, I would typically set the Azure storage acco...

  • 3519 Views
  • 2 replies
  • 2 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 2 kudos

Use the below code in your notebook. You cannot set spark config in serverless as there is no advanced options in cluster.credential_id = dbutils.secrets.get(scope="{scope_name}",key="{app_id}") credential_key = dbutils.secrets.get(scope="{scope_name...

  • 2 kudos
1 More Replies
databicky
by Contributor II
  • 2096 Views
  • 2 replies
  • 2 kudos

how to edit or delete a post in this community after posted?

 when trying to edit the post i am not able see the edit option there. @Retired_mod 

Screenshot_2023-11-03-14-51-47-37_40deb401b9ffe8e1df2f1cc5ba480b12.jpg
  • 2096 Views
  • 2 replies
  • 2 kudos
Latest Reply
wamu
New Contributor II
  • 2 kudos

It’s actually pretty simple, just click the three dots on the top right of your post, and you’ll see options to edit or delete. Easy to miss at first, but once you see it, it’s straightforward.

  • 2 kudos
1 More Replies
Murtaza-007-007
by Databricks Partner
  • 1534 Views
  • 5 replies
  • 0 kudos

How to import Class Room Setup Scripts -03.4

I am learning Databricks Data engineering certificate and during the course i try to load the class room scripts in my databricks community edition and i am getting following error message. I am relatively new to databricks  

Murtaza007007_0-1751391092553.png
  • 1534 Views
  • 5 replies
  • 0 kudos
Latest Reply
Murtaza-007-007
Databricks Partner
  • 0 kudos

@nayan_wylde , szymon_dybczakEven if i can complete the few courses then it should be fine for me. please share step by step guide how to import these libraries in personal workspace and run notebooks. 

  • 0 kudos
4 More Replies
joao_vnb
by New Contributor III
  • 68384 Views
  • 8 replies
  • 11 kudos

Resolved! Automate the Databricks workflow deployment

Hi everyone,Do you guys know if it's possible to automate the Databricks workflow deployment through azure devops (like what we do with the deployment of notebooks)?

  • 68384 Views
  • 8 replies
  • 11 kudos
Latest Reply
asingamaneni
New Contributor II
  • 11 kudos

Did you get a chance to try Brickflows - https://github.com/Nike-Inc/brickflowYou can find the documentation here - https://engineering.nike.com/brickflow/v0.11.2/Brickflow uses - Databricks Asset Bundles(DAB) under the hood but provides a Pythonic w...

  • 11 kudos
7 More Replies
Parth2692
by Databricks Partner
  • 772 Views
  • 1 replies
  • 0 kudos

org.apache.spark.SparkException: Job aborted due to stage failure: org.apache.spark.memory.SparkOutO

Hi everyone,I'm using a serverless cluster and encountering an issue where my code runs fine when executed cell-by-cell in a notebook, but fails with a memory error when executed as a job. Interestingly, the same job runs successfully in our dev envi...

  • 772 Views
  • 1 replies
  • 0 kudos
Latest Reply
Advika
Community Manager
  • 0 kudos

Hello @Parth2692! It’s possible that your dev and prod environments have different serverless configurations, which could explain the difference in behavior. You can try increasing the notebook memory by switching from Standard to High in the Environ...

  • 0 kudos
divyansh8989
by New Contributor
  • 2896 Views
  • 1 replies
  • 0 kudos

Autoloader with availableNow=True and overwrite mode removes data in second micro-batch (DBR 16.3)

Hi everyone,I'm encountering an issue after upgrading to Databricks Runtime 16.3, while using Autoloader with the following configuration:trigger(availableNow=True)outputMode("overwrite")When a new file arrives, Autoloader processes it and writes the...

  • 2896 Views
  • 1 replies
  • 0 kudos
Latest Reply
ashesharyak
New Contributor II
  • 0 kudos

You've hit on a known behavioral change or subtle interaction in Databricks Runtime 16.3 with Autoloader, trigger(availableNow=True), and outputMode("overwrite"). This specific combination seems to be causing an unexpected second micro-batch that ove...

  • 0 kudos
ChrisLawford_n1
by Contributor II
  • 4343 Views
  • 3 replies
  • 3 kudos

Resolved! How to use bundle substitutions in %pip install for Lakeflow Declarative Pipelines

Hello,When defining a Lakeflow Declarative Pipeline (DLT pipeline) I would like to allow the installation of a whl file to be dictated by the user running the pipeline. This will allow the notebook to have the pip installs at the top be agnostic of t...

  • 4343 Views
  • 3 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Glad to help and feel free to "Accept as Solution" to help others in the same boat Cheers, Lou.

  • 3 kudos
2 More Replies
jeremy98
by Honored Contributor
  • 1226 Views
  • 2 replies
  • 2 kudos

Resolved! How to deploy a DLT Pipeline?

Hi community,My team and I have been working on manually creating our first DLT pipeline. However, when we tried importing it into DABs, we encountered an issue in the dev workspace: we are unable to deploy the same DLT pipeline multiple times becaus...

  • 1226 Views
  • 2 replies
  • 2 kudos
Latest Reply
jeremy98
Honored Contributor
  • 2 kudos

 Hello,Thanks for your response! Duplicating the catalog for this does feel a bit unusual. I understand the reasoning behind it, though it’s not the cleanest approach. Still, I suppose it’s acceptable for a DEV workspace.Thanks again!

  • 2 kudos
1 More Replies
Labels