cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vijsharm
by New Contributor II
  • 28 Views
  • 4 replies
  • 0 kudos

checkpoint changes not working on my databricks job

Hi,I do have a job processing kafka stream using kafka.readstream process and due to some issue we changed the checkpoint path to other path and it pulled all the records and later when i changed to the original checkpoint location it is not pulling ...

Data Engineering
checkpoint
  • 28 Views
  • 4 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

When you swapped back to the old checkpoint, were any records flowing through, and were batches completing? It's possible that you've accumulated a big backlog with the old checkpoint, and/or records in Kafka have expired. And the "startingOffsets" o...

  • 0 kudos
3 More Replies
csondergaardp
by New Contributor
  • 46 Views
  • 1 replies
  • 0 kudos

[PATH_NOT_FOUND] Structured Streaming uses wrong checkpoint location

I'm trying to perform a simple example using structured streaming on a directory created as a Volume. The use case is purely educational; I am investigating various forms of triggers. Basic info:Catalog: "dev_catalog"Schema: "stream"Volume: "streamin...

  • 46 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

Your checkpoint code looks correct. What is the source of `df`? Is it `/Volumes/dev_catalog/default/streaming_basics/` ? The path looks incorrect - add `stream` to it.  

  • 0 kudos
SatabrataMuduli
by New Contributor II
  • 40 Views
  • 1 replies
  • 0 kudos

Unable to Connect to Oracle from Databricks UC Cluster (DBR 15.4) – ORA-12170 Timeout Error

 Hi all,I’m trying to connect to an Oracle database from my Databricks UC cluster (DBR 15.4) using the ojdbc8.jar driver, which I’ve installed on the cluster. Here’s the code I’m using:df = spark.read.format("jdbc")\ .option("url", jdbc_url)\ ...

  • 40 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @SatabrataMuduli ,I'm quite sure this is networking issue. You didn't provide much detalis about your environment , so I'll give you general advise. You cannot reach on premise oracle database unless networking is explicitly configured or your dat...

  • 0 kudos
Dhruv-22
by Contributor II
  • 66 Views
  • 2 replies
  • 0 kudos

Feature request: Allow to set value as null when not present in schema evolution

I want to raise a feature request as follows.Currently, in the Automatic schema evolution for merge when a column is not present in the source dataset it is not changed in the target dataset. For e.g.%sql CREATE OR REPLACE TABLE edw_nprd_aen.bronze.t...

Dhruv22_0-1767970990008.png Dhruv22_1-1767971051176.png Dhruv22_2-1767971116934.png Dhruv22_3-1767971213212.png
  • 66 Views
  • 2 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 0 kudos

@Dhruv-22 ProblemWhen using MERGE INTO ... WITH SCHEMA EVOLUTION, if a column exists in the target table but is not present in the source dataset, that column is left unchanged on matched rows.Solution ThinkingThis can be emulated by introspecting th...

  • 0 kudos
1 More Replies
RevanthV
by New Contributor III
  • 67 Views
  • 3 replies
  • 2 kudos

Data validation with df writes using append mode

Hi Team,Recently i came across a situation where I had to write a huge data and it took 6 hrs to complete...later when i checked the target data , I saw 20% of the total records written incorrectly or corrupted because the source data itself was corr...

  • 67 Views
  • 3 replies
  • 2 kudos
Latest Reply
RevanthV
New Contributor III
  • 2 kudos

Hey @K_Anudeep , thanks a lot for tagging me into the GitHub issue.. This is exactly what I want " validate and commit" feature and i se you have already raised a PR for the same with a new option called . I will try this out and check if it satisfie...

  • 2 kudos
2 More Replies
ramsai
by New Contributor
  • 76 Views
  • 5 replies
  • 2 kudos

Updating Job Creator to Service Principal

Regarding data governance best practices: I have jobs created by a user who has left the organization, and I need to change the job creator to a service principal. Currently, it seems the only option is to clone the job and update it. Is this the rec...

  • 76 Views
  • 5 replies
  • 2 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 2 kudos

I agree with @nayan_wylde , for auditing, creator is important and it should in immutable by nature. 

  • 2 kudos
4 More Replies
Rose_15
by New Contributor
  • 83 Views
  • 3 replies
  • 0 kudos

Databricks SQL Warehouse fails when streaming ~53M rows via Python (token/session expiry)

Hello Team,I am facing a consistent issue when streaming a large table (~53 million rows) from a Databricks SQL Warehouse using Python (databricks-sql-connector) with OAuth authentication.I execute a single long-running query and fetch data in batche...

  • 83 Views
  • 3 replies
  • 0 kudos
Latest Reply
Sanjeeb2024
Contributor III
  • 0 kudos

Hi @Rose_15 - Thanks for the details. It is better to do a planning like number of tables, size and number of records and better to extract the files to a cloud storage and reload the data using any mechanism. Once your extraction is complete, you wi...

  • 0 kudos
2 More Replies
jfvizoso
by New Contributor II
  • 12917 Views
  • 6 replies
  • 0 kudos

Can I pass parameters to a Delta Live Table pipeline at running time?

I need to execute a DLT pipeline from a Job, and I would like to know if there is any way of passing a parameter. I know you can have settings in the pipeline that you use in the DLT notebook, but it seems you can only assign values to them when crea...

  • 12917 Views
  • 6 replies
  • 0 kudos
Latest Reply
Sudharsan
New Contributor II
  • 0 kudos

@DeepakAI : May I know, how you resolved it?

  • 0 kudos
5 More Replies
Phani1
by Databricks MVP
  • 3149 Views
  • 8 replies
  • 0 kudos

Triggering DLT Pipelines with Dynamic Parameters

Hi Team,We have a scenario where we need to pass a dynamic parameter to a Spark job that will trigger a DLT pipeline in append mode. Can you please suggest an approach for this?Regards,Phani

  • 3149 Views
  • 8 replies
  • 0 kudos
Latest Reply
Sudharsan
New Contributor II
  • 0 kudos

@koji_kawamura : I have more or less the same scenario say I have 3 tables.The sources and targets are different but I would like to use a generic pipeline and pass in the source and target as a parameter and run them parallely. @sas30 : can you be m...

  • 0 kudos
7 More Replies
Ved88
by New Contributor II
  • 151 Views
  • 4 replies
  • 1 kudos

databricks all-purpose cluster

getting below error-Failure starting repl. Try detaching and re-attaching the notebook. while executing notebook and can see cluster have all installed lib.

  • 151 Views
  • 4 replies
  • 1 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 1 kudos

SQLNonTransientConnectionException with port 3306 strongly points to egress being blocked from your Databricks compute to the default Hive Metastore (which runs on Azure Database for MySQL). Databricks recently published the reserved IP ranges and gu...

  • 1 kudos
3 More Replies
Dhruv-22
by Contributor II
  • 66 Views
  • 2 replies
  • 1 kudos

BUG: Merge with schema evolution doesn't work in update clause

I am referring to this link of databricks documentation. Here is a screenshot of the same  According to the documentation the UPDATE command should work when the target table doesn't have the column but it is present in source. I tried the same with ...

Screenshot 2026-01-09 at 16.33.15.png Dhruv22_0-1767956896097.png
  • 66 Views
  • 2 replies
  • 1 kudos
Latest Reply
Dhruv-22
Contributor II
  • 1 kudos

Hi @iyashk-DBThanks for the response, it will help in resolving the issue.But, can you mark it as a bug and report it? Because specifying just the column without the table name is a little risky.

  • 1 kudos
1 More Replies
SaugatMukherjee
by New Contributor III
  • 122 Views
  • 2 replies
  • 0 kudos

Structured streaming for iceberg tables

According to this https://iceberg.apache.org/docs/latest/spark-structured-streaming/ , we can stream from iceberg tables. I have ensured that my source table is Iceberg version 3, but no matter what I do, I get Iceberg does not streaming reads. Looki...

  • 122 Views
  • 2 replies
  • 0 kudos
Latest Reply
SaugatMukherjee
New Contributor III
  • 0 kudos

Hi,Iceberg streaming is possible in Databricks. One does not need to change to Delta Lake. In my previous attempt, I used "load" while reading the source iceberg table. One should instead use "table". Load apparently seems to take a path and not a ta...

  • 0 kudos
1 More Replies
rcatelli
by New Contributor
  • 239 Views
  • 1 replies
  • 0 kudos

OBO auth implementation in Streamlit not working

Hello,I am currently trying to implement OBO auth in  a streamlit db app but I'm getting the following error message:requests.exceptions.HTTPError: 400 Client Error: PERMISSION_DENIED: User does not have USE CATALOG on Catalog '...'. Config: host=, a...

  • 239 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @rcatelli  Here's a quick example https://docs.databricks.com/aws/en/dev-tools/databricks-apps/auth#user-authorization https://docs.databricks.com/aws/en/dev-tools/databricks-apps/auth#example-query-with-user-authorization  Get the user token from...

  • 0 kudos
Naren1
by New Contributor
  • 77 Views
  • 1 replies
  • 1 kudos

CLuster Config

Hi, can we pass a parameter into job activity from ADF side to change the environment inside the job cluster configuration?

  • 77 Views
  • 1 replies
  • 1 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 1 kudos

Hello @Naren1 , Yes — you can pass parameters from ADF to a Databricks Job run, but you generally can’t use those parameters to change the job cluster configuration (node type, Spark version, autoscale, init scripts, etc.) for that run.In an ADF Data...

  • 1 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels