cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

aonurdemir
by Contributor
  • 48 Views
  • 1 replies
  • 1 kudos

Broken s3 file paths in File Notifications for auto loader

Suddenly at "2025-10-23T14:12:48.409+00:00", coming file paths from file notification queue started to be urlencoded. Hence, our pipeline gets file not found exception. I think something has changed suddenly and broke notification system. Here are th...

  • 48 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @aonurdemir, Could you please re-run your pipeline now and check? This issue should be mitigated now. It is due to a recent internal bug that led to the unexpected handling of file paths with special characters. You should set ignoreMissingFile...

  • 1 kudos
EricCournarie
by New Contributor III
  • 21 Views
  • 2 replies
  • 0 kudos

Retrieving OBJECT values with the JDBC driver may lead to invalid JSON

Hello,Using the JDBC driver , I try to retrieve values in the ResultSet for a OBJECT type. Sadly, it returns invalid JSONGiven the SQLCREATE OR REPLACE TABLE main.eric.eric_complex_team (`id` INT,`nom` STRING,`infos` STRUCT<`age`: INT, `ville`: STRIN...

  • 21 Views
  • 2 replies
  • 0 kudos
Latest Reply
EricCournarie
New Contributor III
  • 0 kudos

Hello,  thanks for the quick response .Sadly I do not have the hand on the SQL request , so no way for me to modify it ... 

  • 0 kudos
1 More Replies
DylanStout
by Contributor
  • 3029 Views
  • 1 replies
  • 0 kudos

Pyspark ML tools

Cluster policies not letting us use Pyspark ML toolsIssue details: We have clusters available in our Databricks environment and our plan was to use functions and classes from "pyspark.ml" to process data and train our model in parallel across cores/n...

  • 3029 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @DylanStout ,   Thanks for laying out the symptoms clearly—this is a classic clash between Safe Spark (shared/high-concurrency) protections and multi-threaded/driver-mutating code paths.   What’s happening On clusters with the Shared/Safe Spark a...

  • 0 kudos
AvneeshSingh
by New Contributor
  • 3224 Views
  • 1 replies
  • 0 kudos

Autloader Data Reprocess

Hi ,If possible can any please help me with some autloader options I have 2 open queries ,(i) Let assume I am running some autoloader stream and if my job fails, so instead of resetting the whole checkpoint, I want to run stream from specified timest...

Data Engineering
autoloader
  • 3224 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Have you reviewed following doc already? Please let me know specifics and we can go from there but i'd start with following doc. https://docs.databricks.com/aws/en/ingestion/cloud-object-storage/auto-loader/options  

  • 0 kudos
dhruvs2
by New Contributor
  • 75 Views
  • 3 replies
  • 4 kudos

How to trigger a Databricks job only after multiple other jobs have completed

We have a use case where Job C should start only after both Job A and Job B have successfully completed.In Airflow, we achieve this using an ExternalTaskSensor to set dependencies across different DAGs.Is there a way to configure something similar in...

  • 75 Views
  • 3 replies
  • 4 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor II
  • 4 kudos

Hi @dhruvs2  .A Lakeflow Job consists of tasks. The tasks can be things like notebooks or other jobs. If you want to orchestrate many jobs, I'd agree that having a job to do this is your best bet . Then you can setup the dependencies as you require.I...

  • 4 kudos
2 More Replies
Adam_Borlase
by New Contributor III
  • 77 Views
  • 3 replies
  • 1 kudos

Error trying to edit Job Cluster via Databricks CLI

Good Day all,After having issues with Cloud resources allocated to Lakeflow jobs and Gateways I am trying to apply a policy to the cluster that is allocated to the Job. I am very new to a lot of the databricks platform and the administration so all h...

  • 77 Views
  • 3 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

@Adam_Borlase ,  Thanks, this is helpful context. The key is that the SQL Server connector’s ingestion pipeline runs on serverless, while the ingestion “gateway” runs on classic compute in your cloud account, so vCPU family quotas can block gateway c...

  • 1 kudos
2 More Replies
akeel-rehman
by New Contributor
  • 2885 Views
  • 1 replies
  • 0 kudos

Best Practices for Reusable Workflows & Cluster Management Across Repos.

Hi everyone,I am looking for best practices around reusable workflows in Databricks, particularly in these areas:Reusable Workflows Instead of Repetition: How can we define reusable workflows rather than repeating the same steps across multiple jobs?...

  • 2885 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Here are my recommendations: 1. Databricks Asset Bundles (DABs) for reusable workflows   2. API-based triggering and Run Job Tasks for cross-repo workflows   3. Instance Pools as the #1 game-changer for cluster optimization (5-10 seconds vs 5-10 minu...

  • 0 kudos
jorperort
by Contributor
  • 99 Views
  • 5 replies
  • 4 kudos

Resolved! Spark JDBC Write Fails for Record Not Present - PK error

Good afternoon everyone,I’m writing this post to see if anyone has encountered this problem and if there is a way to resolve it or understand why it happens. I’m working in a Databricks Runtime 15.4 LTS environment, which includes Apache Spark 3.5.0 ...

  • 99 Views
  • 5 replies
  • 4 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 4 kudos

@jorperort  When writing to SQL Server tables with composite primary keys from Databricks using JDBC, unique constraint violations are often caused by Spark’s distributed retry logic  https://docs.databricks.com/aws/en/archive/connectors/jdbcSolution...

  • 4 kudos
4 More Replies
Dimitry
by Contributor III
  • 64 Views
  • 4 replies
  • 0 kudos

databricks notebook parameter works in interactive mode but not in the job

Hi guys I've added a parameter "files_mask " to a notebook, with a default value.The job running this notebook broke with error: com.databricks.dbutils_v1.InputWidgetNotDefined: No input widget named files_mask is definedCode: mask = dbutils.widgets....

  • 64 Views
  • 4 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Dimitry ,Do you use python or scala in your notebook?

  • 0 kudos
3 More Replies
janm2
by New Contributor II
  • 1172 Views
  • 5 replies
  • 1 kudos

Autoloader cleansource option does not take any effect

Hello everyone,I was very keen to try out the Autoloader's new cleanSource option so we can clean up our landing folder easily.However I found out it does not have any effect whatsoever. As I cannot create a support case I am creating this post.A sim...

  • 1172 Views
  • 5 replies
  • 1 kudos
Latest Reply
SanthoshU
New Contributor II
  • 1 kudos

Any Solution ? 

  • 1 kudos
4 More Replies
ashraf1395
by Honored Contributor
  • 2764 Views
  • 4 replies
  • 1 kudos

Resolved! How to capture dlt pipeline id / name using dynamic value reference

Hi there,I have a usecase where I want to set the dlt pipeline id in the configuration parameters of that dlt pipeline.The way we can use workspace ids or task id in notebook task task_id = {{task.id}}/ {{task.name}} and can save them as parameters a...

  • 2764 Views
  • 4 replies
  • 1 kudos
Latest Reply
CaptainJack
New Contributor III
  • 1 kudos

Did someone was able to get pipeline_id programaticaly?

  • 1 kudos
3 More Replies
shadowinc
by New Contributor III
  • 3274 Views
  • 1 replies
  • 0 kudos

Call SQL Function via API

Background - I created a SQL function with the name schema.function_name, which returns a table, in a notebook, the function works perfectly, however, I want to execute it via API using SQL Endpoint. In API, I got insufficient privileges error, so gr...

  • 3274 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Do you know if API service principal / user has USAGE on the database itself? This seems like the most likely issue based on information on the question.  Quick Fix Checklist:   Run these commands in order (replace api_user with the actual user from ...

  • 0 kudos
Pw76
by New Contributor II
  • 2287 Views
  • 4 replies
  • 1 kudos

CDC with Snapshot - next_snapshot_and_version() function

I am trying to use create_auto_cdc_from_snapshot_flow (formerly apply_changes_from_snapshot())  (see: https://docs.databricks.com/aws/en/dlt/cdc#cdc-from-snapshot)I am attempting to do SCD type 2 changes using historic snapshot data.In the first coup...

Data Engineering
CDC
dlt
Snapshot
  • 2287 Views
  • 4 replies
  • 1 kudos
Latest Reply
fabdsp
New Contributor
  • 1 kudos

I have the same issue - very big limitation of create_auto_cdc_from_snapshot_flow and no solution

  • 1 kudos
3 More Replies
jeremy98
by Honored Contributor
  • 854 Views
  • 3 replies
  • 0 kudos

how to pass secrets keys using a spark_python_task

Hello community,I was searching a way to pass secrets to spark_python_task. Using a notebook file is easy, it's only to use dbutils.secrets.get(...) but how to do the same thing using a spark_python_task set using serveless compute?Kind regards,

  • 854 Views
  • 3 replies
  • 0 kudos
Latest Reply
analytics_eng
New Contributor III
  • 0 kudos

@Renu_  but passing them as spark_env will not work with serverless I guess? See also the limitations on the docs  Serverless compute limitations | Databricks on AWS 

  • 0 kudos
2 More Replies
dpc
by Contributor
  • 146 Views
  • 5 replies
  • 3 kudos

Resolved! Pass parameters between jobs

Hello I have jobIn that job, it runs a task (GetGid) that executes a notebook and obtains some value using dbutils.jobs.taskValuesSete.g. dbutils.jobs.taskValuesSet(key = "gid", value = gid)As a result, I can use this and pass it to another task for ...

  • 146 Views
  • 5 replies
  • 3 kudos
Latest Reply
dpc
Contributor
  • 3 kudos

Thanks @Hubert-Dudek and @ilir_nuredini I see this nowI'm setting using:dbutils.jobs.taskValues.Set()passing to the job task using Key - gid; Value - {{tasks.GetGid.values.gid}}Then reading using: pid = dbutils.widgets.get()

  • 3 kudos
4 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels