cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

1GauravS
by New Contributor III
  • 110 Views
  • 1 replies
  • 0 kudos

Ingesting Data from Event Hubs via Kafka API with Serverless Compute

Hi!I'm currently working on ingesting log data from Azure Event Hubs into Databricks. Initially, I was using a managed Databricks workspace, which couldn't access Event Hubs over a private endpoint. To resolve this, our DevOps team provisioned a VNet...

  • 110 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Serverless compute in Azure Databricks does not support accessing resources over private endpoints, such as Azure Event Hubs configured with a private endpoint. This is a known and frequently cited limitation in the Databricks documentation and commu...

  • 0 kudos
ChrisLawford_n1
by Contributor
  • 47 Views
  • 1 replies
  • 0 kudos

Network error on subsequent runs using serverless compute in DLT

Hello,When running on a serverless cluster in DLT our notebook first tries to install some python whls onto the cluster. We have noticed that when in development and running a pipeline many times over in a short space of time between runs that the pi...

  • 47 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re seeing (“Network is unreachable” repeated during pip installs) on a DLT (Delta Live Table) serverless cluster, especially after the first successful run, is a common issue that appears to affect Databricks pipelines run repeatedly on...

  • 0 kudos
abhirupa7
by New Contributor
  • 59 Views
  • 2 replies
  • 0 kudos

Databricks Workflow

I have a query. I have multiple job (workflow)present in my workspace. Those job runs regularly. Multiple task present in those jobs. Few task having notebook that contain for each code in it. now when a job runs that particular task execute the for ...

  • 59 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

To programmatically capture iteration-level information for tasks running inside a Databricks Workflow Job that uses the "for each" loop construct, you will primarily rely on the Databricks Jobs REST API (v2.1) and possibly the Databricks Python SDK....

  • 0 kudos
1 More Replies
nefflev1
by New Contributor
  • 53 Views
  • 1 replies
  • 0 kudos

VS Code Python file execution

Hi Everyone,I'm using the Databricks VS Code Extension to develop and deploy Asset Bundles. Usually we work with Notebooks and use the "Run File as Workflow" function. Now I'm trying to use pure python file for a new use case and tried to use the "Up...

  • 53 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

You're encountering a common issue when using the Databricks VS Code Extension's "Upload and Run File" with pure Python files, especially in a secure, VNet-injected Azure Databricks deployment. Here’s a direct summary of what’s happening and how you ...

  • 0 kudos
Akshay_Petkar
by Valued Contributor
  • 38 Views
  • 2 replies
  • 0 kudos

%run notebook fails in Job mode with Py4JJavaError (None.get), but works in interactive notebook

 Hi everyone,I’m facing an issue when executing a Databricks job where my notebook uses %run to include other notebooks. I have a final notebook added as a task in a job, and inside that notebook I use %run to call another notebook that contains all ...

  • 38 Views
  • 2 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

This issue with %run in Databricks notebooks—where everything works interactively in the UI, but fails in a job context with java.util.NoSuchElementException: None.get—is a relatively common pain point for users leveraging notebook modularization. Th...

  • 0 kudos
1 More Replies
Sergecom
by New Contributor III
  • 37 Views
  • 1 replies
  • 0 kudos

Migrating from on-premises HDFS to Unity Catalog - Looking for advice on on-prem options

Hi,We’re currently running a Databricks installation with an on-premises HDFS file system. As we’re looking to adopt Unity Catalog, we’ve realized that our current HDFS setup has limited support and compatibility with Unity Catalog.Our requirement: W...

  • 37 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

Unity Catalog does not natively support HDFS, and its primary design assumes cloud object storage (such as S3, ADLS, or GCS) as the backing store for both managed and external tables. For organizations restricted to on-premises storage, the situation...

  • 0 kudos
Anoora
by New Contributor II
  • 35 Views
  • 2 replies
  • 0 kudos

Scheduling and triggering jobs based on time and frequency precedence

I have a table in Databricks that stores job information, including fields such as job_name, job_id, frequency, scheduled_time, and last_run_time.I want to run a query every 10 minutes that checks this table and triggers a job if the scheduled_time i...

Data Engineering
data engineering
jobs
scheduling
  • 35 Views
  • 2 replies
  • 0 kudos
Latest Reply
SamAdams
Contributor
  • 0 kudos

You could add a job with a scheduled based trigger that runs every 10 minutes. The task at the start of the job runs a SQL query against the job information table and uses the logic you described above to output a boolean value. Then feed that boolea...

  • 0 kudos
1 More Replies
aonurdemir
by Contributor
  • 42 Views
  • 1 replies
  • 1 kudos

Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

  • 42 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

  • 1 kudos
EricCournarie
by New Contributor III
  • 20 Views
  • 2 replies
  • 0 kudos

Retrieving OBJECT values with the JDBC driver may lead to invalid JSON

Hello,Using the JDBC driver , I try to retrieve values in the ResultSet for a OBJECT type. Sadly, it returns invalid JSONGiven the SQLCREATE OR REPLACE TABLE main.eric.eric_complex_team (`id` INT,`nom` STRING,`infos` STRUCT<`age`: INT, `ville`: STRIN...

  • 20 Views
  • 2 replies
  • 0 kudos
Latest Reply
EricCournarie
New Contributor III
  • 0 kudos

Hello,  thanks for the quick response .Sadly I do not have the hand on the SQL request , so no way for me to modify it ... 

  • 0 kudos
1 More Replies
DylanStout
by Contributor
  • 3029 Views
  • 1 replies
  • 0 kudos

Pyspark ML tools

Cluster policies not letting us use Pyspark ML toolsIssue details: We have clusters available in our Databricks environment and our plan was to use functions and classes from "pyspark.ml" to process data and train our model in parallel across cores/n...

  • 3029 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @DylanStout ,   Thanks for laying out the symptoms clearly—this is a classic clash between Safe Spark (shared/high-concurrency) protections and multi-threaded/driver-mutating code paths.   What’s happening On clusters with the Shared/Safe Spark a...

  • 0 kudos
AvneeshSingh
by New Contributor
  • 3224 Views
  • 1 replies
  • 0 kudos

Autloader Data Reprocess

Hi ,If possible can any please help me with some autloader options I have 2 open queries ,(i) Let assume I am running some autoloader stream and if my job fails, so instead of resetting the whole checkpoint, I want to run stream from specified timest...

Data Engineering
autoloader
  • 3224 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Have you reviewed following doc already? Please let me know specifics and we can go from there but i'd start with following doc. https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options  

  • 0 kudos
dhruvs2
by New Contributor
  • 74 Views
  • 3 replies
  • 4 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

  • 74 Views
  • 3 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 4 kudos

Hi @dhruvs2  .A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet . Then you can setup the dependencies as you require.I...

  • 4 kudos
2 More Replies
Adam_Borlase
by New Contributor III
  • 74 Views
  • 3 replies
  • 1 kudos

Error trying to edit Job Cluster via Databricks CLI

Good Day all,After having issues with Cloud resources allocated to Lakeflow jobs and Gateways I am trying to apply a policy to the cluster that is allocated to the Job. I am very new to a lot of the databricks platform and the administration so all h...

  • 74 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

@Adam_Borlase ,  Thanks, this is helpful context. The key is that the SQL Server connector’s ingestion pipeline runs on serverless, while the ingestion “gateway” runs on classic compute in your cloud account, so vCPU family quotas can block gateway c...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels