Data Engineering

Forum Posts

Sorted by:

by P10d • New Contributor

Monday

61 Views
0 replies
0 kudos

Connect Databrick's cluster with Artifactory

Hello,I'm trying to connect databricks with an own JFrog Artifactory. The objective is to download both PIP/JAR dependencies from it instead of connecting to maven-central/PyPi. Im struggling with JAR's. My aproximation to solve the problem is:1. Cre...

Data Engineering

61 Views
0 replies
0 kudos

Monday

by IM_01 • Contributor II

2 weeks ago

225 Views
3 replies
0 kudos

Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING

Hi,I was using window function row_number(),min,sum in the code, then the Lakeflow SDP pipeline was failing with the error: NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING - Window function is not supported on streaming dataframeswhat is the recommended a...

Data Engineering

225 Views
3 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

IM_01
Contributor II

Monday

0 kudos

@Louis_Frolio suppose if I use foreachbatch I might end up with duplicates as the state is not maintainedcan you please share more information on max_by

0 kudos

Monday

2 More Replies

by TheBeacon • New Contributor II

10-07-2024 4:52:29 AM

2037 Views
5 replies
2 kudos

Exploring Postman Alternatives for API Testing in VSCode?

Has anyone here explored Postman alternatives within VSCode? I’ve seen mentions of Thunder Client and Apidog. Would love to know if they offer a smoother integration or better functionality.

Data Engineering

2037 Views
5 replies
2 kudos

10-07-2024 4:52:29 AM

View Replies

Latest Reply

-werners-
Esteemed Contributor III

Friday

2 kudos

I may be old fashioned but curl is the only valid answer IMO

2 kudos

Friday

4 More Replies

by yit337 • Contributor

Monday

88 Views
2 replies
1 kudos

Is it required to run Lakeflow Connect on Serverless?

As the subject states, my question is:Is it required to run the Ingestion Pipeline in Lakeflow Connect on Serverless compute? Cause I try to define my own cluster in the DAB, but it raises an error:`Error: cannot create pipeline: You cannot provide c...

Data Engineering

88 Views
2 replies
1 kudos

Monday

View Replies

Latest Reply

saurabh18cs
Honored Contributor III

Monday

1 kudos

Yes — Lakeflow Connect ingestion pipelines always run on Serverless compute. Databricks overrides your compute config and switches back to serverless,because the ingestion connector requires it.

1 kudos

Monday

1 More Replies

by bhargavabasava • New Contributor III

09-25-2025 12:21:06 AM

752 Views
2 replies
1 kudos

Support for JDBC writes from serverless compute

Hi team,Are there any plans in place to support JDBC writes using serverless compute.

Data Engineering

752 Views
2 replies
1 kudos

09-25-2025 12:21:06 AM

View Replies

Latest Reply

CarlosPH
Databricks Partner

Monday

1 kudos

Hello! And what is the standard way to write to a external database through databricks? general purpose compute?Thanks very much.

1 kudos

Monday

1 More Replies

by JIWON • New Contributor III

Thursday

235 Views
2 replies
3 kudos

Resolved! Questions on Auto Loader auto Listing Logic

Hi everyone,I’m investigating some performance patterns in our Auto Loader (S3) pipelines and would like to clarify the internal listing logic.Context: We run a batch job every hour using Auto Loader. Recently, after March 10th, we noticed our execut...

Data Engineering

235 Views
2 replies
3 kudos

Thursday

View Replies

Latest Reply

aleksandra_ch
Databricks Employee

Friday

3 kudos

Hi @JIWON , 1. There is no such option; 2. Assuming that the job is triggered every hour, the spikes every 8-hours can be explained by this: To ensure eventual completeness of data in auto mode, Auto Loader automatically triggers a full directory lis...

3 kudos

Friday

1 More Replies

by jacovangelder • Databricks MVP

06-28-2024 4:08:50 AM

4441 Views
4 replies
10 kudos

How do you define PyPi libraries on job level in Asset Bundles?

Hello,Reading the documentation, it does not state it is possible to define libraries on job level instead of on task level. It feels really counter-intuitive putting libraries on task level in Databricks workflows provisioned by Asset Bundles. Is th...

Data Engineering

4441 Views
4 replies
10 kudos

06-28-2024 4:08:50 AM

View Replies

Latest Reply

jacovangelder
Databricks MVP

07-01-2024 5:27:55 AM

10 kudos

Thanks @Witold ! Thought so. I decided to go with an init script where I install my dependencies rather than installing libraries. For future reference, this is what it looks like:job_clusters: - job_cluster_key: job_cluster new_cluster: ...

10 kudos

07-01-2024 5:27:55 AM

3 More Replies

by zenwanderer • New Contributor

Saturday

185 Views
4 replies
0 kudos

Kill/Cancel a Notebook Cell Running Too Long on an All-purpose Cluster

Hi everyone, I’m facing an issue when running a notebook on a Databricks All-purpose cluster. Some of my cells/pipelines run for a very long time, and I want to automatically cancel/kill them when they exceed a certain time limit.I tried setting spar...

Data Engineering

185 Views
4 replies
0 kudos

Saturday

View Replies

Latest Reply

MoJaMa
Databricks Employee

Sunday

0 kudos

@zenwanderer Have you looked into Query Watchdog? For Classic All-Purpose clusters this might be your best bet. https://docs.databricks.com/aws/en/compute/troubleshooting/query-watchdog

0 kudos

Sunday

3 More Replies

by guidotognini • New Contributor II

Friday

241 Views
2 replies
2 kudos

Resolved! Rename Column Name of Streaming Table in Lakeflow Spark Declarative Pipeline

Hi, I would like to know if it is possible to rename the name of a column of a streaming table defined in Lakeflow Spark Declarative Pipeline without having to run a Full Refresh. Could u give me any ideas on how I can achieve this?

Data Engineering

241 Views
2 replies
2 kudos

Friday

View Replies

Latest Reply

balajij8
Contributor

Saturday

2 kudos

You canUpdate pipeline code to rename old column & trigger a Incremental Update (old_column and new_column exists after it)Old data will have NULL for new_column after Incremental Update. Update the table to fill new_column for such cases from old_co...

2 kudos

Saturday

1 More Replies

by ittzzmalind • New Contributor

Friday

126 Views
1 replies
0 kudos

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Databricks ETL pipeline, specifically an error with the @DP.expectorfail decorator causing the pipeline update to fail. The error message indicated a 'key not found: all_info_dlt_cx_utils_cod ' resulting in a NoSuchElementException.Note: if we commen...

Data Engineering

126 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

MoJaMa
Databricks Employee

Friday

0 kudos

I don't know if it's a copy paste error on your side but you reference an error: 'key not found: all_info_dlt_cx_utils_cod ' I notice it's missing an "e". Can you check the code-base for any typos? Maybe there's a typo in a dlt.read() or spark.table...

0 kudos

Friday

by AnilKumarM • New Contributor

a week ago

244 Views
3 replies
1 kudos

Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)

Hi everyone, I’m working on an ML project in Databricks and want to design a clean, scalable, and production-ready project structure. I’d really appreciate guidance from those with real-world experience. My Requirement I need to organize my project ...

Data Engineering

244 Views
3 replies
1 kudos

a week ago

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

Friday

1 kudos

Hi @AnilKumarM, Agree with @-werners- here. There isn’t a single 'one true' repo layout we mandate, but there are a few public references that show the patterns Databricks recommends. For bundles/databricks.yml + multi‑env, you may want to check the ...

1 kudos

Friday

2 More Replies

by maikel • Contributor II

3 weeks ago

689 Views
5 replies
1 kudos

Resolved! SQL schemas migration

Hello Community!I would like to ask for your recommendation in terms of SQL schemas migration best practice. In our project, currently we have different SQL schemas definition and data seeding saved in SQL files. Since we are going to higher environm...

Data Engineering

689 Views
5 replies
1 kudos

3 weeks ago

View Replies

Latest Reply

maikel
Contributor II

Friday

1 kudos

@anuj_lathi and @Louis_Frolio thank you very much! This is really great approach and example!

1 kudos

Friday

4 More Replies

by js5 • New Contributor II

Friday

279 Views
1 replies
0 kudos

Resolved! UNSUPPORTED_TIME_TYPE despite 18.1 runtime?

Hello,I have tried using TimeType data type which is supported since Spark 4.1:https://spark.apache.org/docs/latest/sql-ref-datatypes.htmlI am unfortunately still getting UNSUPPORTED_TIME_TYPE error when trying to run display() on a pandas dataframe ...

Data Engineering

279 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

Friday

0 kudos

Hi @js5, This is expected today on Databricks. You can check this out for reference. Spark 4.1 introduces a standard TIME type (TimeType) in the SQL type system, and Databricks runtimes based on Spark 4.x already expose it at the engine level (for ex...

0 kudos

Friday

by malterializedvw • New Contributor II

2 weeks ago

421 Views
8 replies
3 kudos

Parametrizing queries in DAB deployments

Hi folks,I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs...

Data Engineering

421 Views
8 replies
3 kudos

2 weeks ago

View Replies

Latest Reply

-werners-
Esteemed Contributor III

Friday

3 kudos

Hm, the IDENTIFIER({{var}} || string) should work for create statements with DAB.I also spent way too much time on AI giving me wrong answers (Jinja templating format on the first place).Mind that there are no spaces in {{var}}.BUT there are some lim...

3 kudos

Friday

7 More Replies

by Neelimak • Databricks Partner

a week ago

513 Views
5 replies
3 kudos

Resolved! ingestion pipeline configuration

When trying to create a ingestion pipelines, auto generated cluster is hitting quota limit errors. The type of vm its trying to use is not available in our region and there seems no way to add fallback to different types of vms. Can you please help h...

Data Engineering

513 Views
5 replies
3 kudos

a week ago

View Replies

Latest Reply

Ashwin_DSA
Databricks Employee

Friday

3 kudos

Hi @Neelimak, Thanks for the feedback. I've now passed the feedback to our product team.

3 kudos

Friday

4 More Replies

Databricks Community

Forum Posts

Connect Databrick's cluster with Artifactory

Structured streaming error- NON_TIME_WINDOW_NOT_SUPPORTED_IN_STREAMING

Exploring Postman Alternatives for API Testing in VSCode?

Is it required to run Lakeflow Connect on Serverless?

Support for JDBC writes from serverless compute

Resolved! Questions on Auto Loader auto Listing Logic

How do you define PyPi libraries on job level in Asset Bundles?

Kill/Cancel a Notebook Cell Running Too Long on an All-purpose Cluster

Resolved! Rename Column Name of Streaming Table in Lakeflow Spark Declarative Pipeline

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Best-practice structure for config.yaml, utils, and databricks.yaml in ML project (Databricks)

Resolved! SQL schemas migration

Resolved! UNSUPPORTED_TIME_TYPE despite 18.1 runtime?

Parametrizing queries in DAB deployments

Resolved! ingestion pipeline configuration

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...

DLT with CDC and schema changes in streaming pipel...

how to update not tracked column only in new row v...

Databricks Cost Estimation Template