cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

jeremy98
by Honored Contributor
  • 2904 Views
  • 1 replies
  • 0 kudos

How to Initialize Sentry in All Notebooks Used in Jobs using __init__.py?

Hi Community,I'm looking to initialize Sentry in all notebooks that are used across multiple jobs. My goal is to capture exceptions using Sentry whenever a job runs a notebook.What’s the recommended approach for initializing Sentry packages in this c...

  • 2904 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To consistently initialize Sentry in all notebooks for reliable exception tracking, experts recommend using a shared initialization approach that minimizes duplication and ensures setup for every job execution. Here’s a structured approach: Recommend...

  • 0 kudos
DataP1
by New Contributor
  • 3986 Views
  • 3 replies
  • 1 kudos

Excel File from Databricks Not Auto-Adjusting Columns in Power Automate Email Attachment

Hi community,I've built an automation workflow using Databricks and Power Automate. The process runs a query in Databricks, exports the result to Excel, auto-adjusts the columns based on the header/content, and then Power Automate picks up the file a...

  • 3986 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Yes, this is a common challenge when automating Excel file generation—the default export (especially from pandas or Databricks) does not auto-fit column widths, resulting in cramped columns when viewed or emailed. Auto-fitting columns typically requi...

  • 1 kudos
2 More Replies
sanutopia
by New Contributor
  • 3162 Views
  • 1 replies
  • 0 kudos

How to ingest data from SAP Data Services (ECC, IP, MDG, FLP, MRP) to Databricks Lakehouse on GCP ?

Hi Friends,My customer is using Databricks (as GCP partner product). The ask is to ingest data from sources into Databricks Lakehouse. Currently customer has 3 types of sources : SAP (ECC, Hana) , Oracle and Kafka StreamWhat are the Databricks native...

  • 3162 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Databricks on GCP offers several native ETL services and integration options to ingest data from SAP (ECC, HANA), Oracle, and Kafka Streams into the Lakehouse. Comparing Databricks-native solutions with GCP-native ETL like Data Fusion or Dataflow rev...

  • 0 kudos
LeoGriffM
by New Contributor II
  • 3381 Views
  • 2 replies
  • 1 kudos

Zip archive with PowerShell "Error: The zip file may not be valid or may be an unsupported version."

Zip archive "Error: The zip file may not be valid or may be an unsupported version."We are trying to upload a ZIP archive to a Databricks workspace for faster and atomic uploads of artifacts. The expected behaviour is that we can run the following co...

  • 3381 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

The error message "Error: The zip file may not be valid or may be an unsupported version" when importing a zip archive via the Databricks CLI is a known issue, especially with zip files created using PowerShell's Compress-Archive or [System.IO.Compre...

  • 1 kudos
1 More Replies
sandy311
by New Contributor III
  • 3544 Views
  • 3 replies
  • 1 kudos

Install python packages on serverless compute in DLT pipelines (using asset bundles)

Has anyone figured out how to install packages on serverless compute using asset bundle,s similar to how we handle it for jobs or job tasks?I didn’t see any direct option for this, apart from installing packages manually within a notebook.I tried ins...

Data Engineering
DLT Serverless
  • 3544 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

Installing Python packages on Databricks serverless compute via asset bundles is possible, but there are some unique limitations and required configuration adjustments compared to traditional jobs or job tasks. The core methods to install packages fo...

  • 1 kudos
2 More Replies
saicharandeepb
by Contributor
  • 2724 Views
  • 1 replies
  • 0 kudos

Implementing ADB Autoloader with Managed File Notification Mode for UC Ext Location (public preview)

Hi everyone,I'm planning to implement Azure Databricks Auto Loader using the Databricks-managed file notification mode for an external location registered in Unity Catalog. I understand this feature is currently in public preview, and I’d love to hea...

  • 2724 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Yes, Azure Databricks Auto Loader with Databricks-managed file notification mode for external locations in Unity Catalog has been successfully implemented by users, especially since it entered public preview in 2025, and it's designed to make file di...

  • 0 kudos
tbailey
by New Contributor II
  • 3246 Views
  • 3 replies
  • 1 kudos

DABs, policies and cluster pools

My scenario,A policy called 'Job Pool', which has the following overrides:"instance_pool_id": { "type": "unlimited", "hidden": true }, "driver_instance_pool_id": { "type": "unlimited", "hidden": true }I have an asset bundle that sets a new cluster as...

  • 3246 Views
  • 3 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

You are experiencing validation errors assigning a driver to an on-demand pool and workers to a spot pool in your Databricks Asset Bundle (DAB) configuration because the 'spot_bid_max_price' attribute is being forced by policies—even when the pools a...

  • 1 kudos
2 More Replies
pvalcheva
by New Contributor
  • 2416 Views
  • 1 replies
  • 0 kudos

Simba Spark Driver fails for big datasets in Excel

Hello, I am getting the following error when I want to extract data from Databricks via VBA code. The code for the connection is:Option ExplicitConst adStateClosed = 0Public CnAdo As New ADODB.ConnectionDim DSN_name As StringDim WB As WorkbookDim das...

pvalcheva_0-1750755864726.png
  • 2416 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The code you provided for connecting to Databricks via VBA appears structurally sound, but the cause of the error you are experiencing could stem from several typical issues encountered when using ADODB with Databricks ODBC connections from Excel VBA...

  • 0 kudos
Gustavo_Az
by Contributor
  • 2723 Views
  • 2 replies
  • 1 kudos

Resolved! Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE

HelloIm optmizing a big notebook and have encountered many times the tip from databricks that says "Unused range join hints". Reading the documentation for reference, I have been able to supress that warning in almost all cells, but some of then rema...

range_joins.JPG
  • 2723 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

There is no official documentation covering the use of range_join hints directly with the INSERT INTO ... REPLACE WHERE operation in Databricks—existing documentation around range joins focuses only on explicit joining operations, not on conditional ...

  • 1 kudos
1 More Replies
ChrisLawford_n1
by Contributor
  • 2681 Views
  • 1 replies
  • 2 kudos

Update for databricks-dlt pip package

Hello, With the recent changes to Delta Live Tables, I was wondering when the python stub will be updated to reflect the new methods that are available ?Link to the Pypi repo:databricks-dlt·PyPI

  • 2681 Views
  • 1 replies
  • 2 kudos
Latest Reply
mark_ott
Databricks Employee
  • 2 kudos

The Python stub for Delta Live Tables (DLT), which helps with local development by providing API specs, docstring references, and type hints, is available as the databricks-dlt package on PyPI. However, this library only provides interfaces to the DL...

  • 2 kudos
ChrisLawford_n1
by Contributor
  • 376 Views
  • 1 replies
  • 1 kudos

Network error on subsequent runs using serverless compute in DLT

Hello,When running on a serverless cluster in DLT our notebook first tries to install some python whls onto the cluster. We have noticed that when in development and running a pipeline many times over in a short space of time between runs that the pi...

  • 376 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

The error you’re seeing (“Network is unreachable” repeated during pip installs) on a DLT (Delta Live Table) serverless cluster, especially after the first successful run, is a common issue that appears to affect Databricks pipelines run repeatedly on...

  • 1 kudos
abhirupa7
by New Contributor
  • 596 Views
  • 2 replies
  • 1 kudos

Resolved! Databricks Workflow

I have a query. I have multiple job (workflow)present in my workspace. Those job runs regularly. Multiple task present in those jobs. Few task having notebook that contain for each code in it. now when a job runs that particular task execute the for ...

  • 596 Views
  • 2 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

To programmatically capture iteration-level information for tasks running inside a Databricks Workflow Job that uses the "for each" loop construct, you will primarily rely on the Databricks Jobs REST API (v2.1) and possibly the Databricks Python SDK....

  • 1 kudos
1 More Replies
nefflev1
by New Contributor
  • 342 Views
  • 1 replies
  • 1 kudos

VS Code Python file execution

Hi Everyone,I'm using the Databricks VS Code Extension to develop and deploy Asset Bundles. Usually we work with Notebooks and use the "Run File as Workflow" function. Now I'm trying to use pure python file for a new use case and tried to use the "Up...

  • 342 Views
  • 1 replies
  • 1 kudos
Latest Reply
mark_ott
Databricks Employee
  • 1 kudos

You're encountering a common issue when using the Databricks VS Code Extension's "Upload and Run File" with pure Python files, especially in a secure, VNet-injected Azure Databricks deployment. Here’s a direct summary of what’s happening and how you ...

  • 1 kudos
Akshay_Petkar
by Valued Contributor
  • 345 Views
  • 2 replies
  • 0 kudos

%run notebook fails in Job mode with Py4JJavaError (None.get), but works in interactive notebook

 Hi everyone,I’m facing an issue when executing a Databricks job where my notebook uses %run to include other notebooks. I have a final notebook added as a task in a job, and inside that notebook I use %run to call another notebook that contains all ...

  • 345 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

This issue with %run in Databricks notebooks—where everything works interactively in the UI, but fails in a job context with java.util.NoSuchElementException: None.get—is a relatively common pain point for users leveraging notebook modularization. Th...

  • 0 kudos
1 More Replies
Anoora
by New Contributor II
  • 396 Views
  • 2 replies
  • 0 kudos

Scheduling and triggering jobs based on time and frequency precedence

I have a table in Databricks that stores job information, including fields such as job_name, job_id, frequency, scheduled_time, and last_run_time.I want to run a query every 10 minutes that checks this table and triggers a job if the scheduled_time i...

Data Engineering
data engineering
jobs
scheduling
  • 396 Views
  • 2 replies
  • 0 kudos
Latest Reply
SamAdams
Contributor
  • 0 kudos

You could add a job with a scheduled based trigger that runs every 10 minutes. The task at the start of the job runs a SQL query against the job information table and uses the logic you described above to output a boolean value. Then feed that boolea...

  • 0 kudos
1 More Replies
Labels