Data Engineering

Forum Posts

Sorted by:

by JIWON • New Contributor III

3 hours ago

14 Views
0 replies
0 kudos

Questions on Auto Loader auto Listing Logic

Hi everyone,I’m investigating some performance patterns in our Auto Loader (S3) pipelines and would like to clarify the internal listing logic.Context: We run a batch job every hour using Auto Loader. Recently, after March 10th, we noticed our execut...

Data Engineering

14 Views
0 replies
0 kudos

3 hours ago

by Neelimak • New Contributor

Monday

120 Views
4 replies
2 kudos

Resolved! ingestion pipeline configuration

When trying to create a ingestion pipelines, auto generated cluster is hitting quota limit errors. The type of vm its trying to use is not available in our region and there seems no way to add fallback to different types of vms. Can you please help h...

Data Engineering

120 Views
4 replies
2 kudos

Monday

View Replies

Latest Reply

Neelimak
New Contributor

10 hours ago

2 kudos

Thanks Ashwin. I hope that when creating pipelines through UI, SKU availability and quota is taken into account in future improvements. As it stands today for simpler/ POC type of implementation, this is a major roadblock. Thank you.

2 kudos

10 hours ago

3 More Replies

by IM_01 • Contributor II

11 hours ago

27 Views
0 replies
0 kudos

How to disable full refresh on materialized view if the src is not updated since last run

Hi,Is there any option to control refresh rate of materialized view such as , even the dlt is triggered in full refresh mode and src Streaming tables are not updated then the full refresh should not happen on mvs . Is there any way to achieve this.

Data Engineering

27 Views
0 replies
0 kudos

11 hours ago

by Phani1 • Databricks MVP

Monday

104 Views
2 replies
0 kudos

Databricks Cost Estimation Template

Hi Databricks Team, Is there a standard Databricks cost estimation template(xl), sizing calculator, or TCO tool that allows us to provide the following inputs and derive an approximate monthly and annual platform cost:Source systems and their types (...

Data Engineering

104 Views
2 replies
0 kudos

Monday

View Replies

Latest Reply

emma_s
Databricks Employee

Tuesday

0 kudos

Hi, There isn't anything publicly available that I'm aware of. For this kind of complex migration I'd recommend working with your account team. As somebody who does Databricks sizing a lot, it's a nuanced art which I suspect is why we don't have any ...

0 kudos

Tuesday

1 More Replies

by maikel • Contributor II

2 weeks ago

376 Views
4 replies
0 kudos

SQL schemas migration

Hello Community!I would like to ask for your recommendation in terms of SQL schemas migration best practice. In our project, currently we have different SQL schemas definition and data seeding saved in SQL files. Since we are going to higher environm...

Data Engineering

376 Views
4 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

anuj_lathi
Databricks Employee

11 hours ago

0 kudos

Great question — and since you already have DABs and numbered SQL files, you're most of the way there. You do not need Alembic or SQLAlchemy. Here's a concrete implementation of the migration runner pattern that plugs directly into your existing DABs...

0 kudos

11 hours ago

3 More Replies

by AndriusVitkausk • New Contributor III

12 hours ago

26 Views
0 replies
0 kudos

Logging inside foreachbatch

Im having difficulty carrying out logging inside a foreachbatch.Seems that all logging outside the foreachbatch works as expected, but inside they are only visable from the spark UI in driver logs. Is there a way to get this to work (inc serverless)?

Data Engineering

26 Views
0 replies
0 kudos

12 hours ago

by seefoods • Valued Contributor

2 weeks ago

142 Views
2 replies
0 kudos

setup justfile command in order to launch your spark application

Hello Guys, Actually, I build a just file for my project which will be execute my wheel job task using command line, but when i run my wheel task i have encountered this error. from pyspark.sql.connect.expressions import PythonUDFEnvironmentImportErr...

Data Engineering

142 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

anuj_lathi
Databricks Employee

13 hours ago

0 kudos

This ImportError happens because you have both standalone pyspark and databricks-connect installed, and they conflict with each other. databricks-connect bundles its own version of PySpark internally — when the standalone pyspark package is also pres...

0 kudos

13 hours ago

1 More Replies

by IM_01 • Contributor II

yesterday

79 Views
5 replies
0 kudos

OrderBy is not sorting the results

Hi,I am currently using Lakeflow SDP ,firstly I am creating 2 views and then joining them and creating materialized view and using order by in the materialized view create function , but the results are not sorted does order by not work on materializ...

Data Engineering

79 Views
5 replies
0 kudos

yesterday

View Replies

Latest Reply

IM_01
Contributor II

14 hours ago

0 kudos

Hi @Ashwin_DSA Even the tables does not guarantee ordering could you please explain me the reason just curious.. I was in perception that using delta tables would solve problem.And view its a wrapper around select I thought it would work.

0 kudos

14 hours ago

4 More Replies

by DineshOjha • New Contributor II

Monday

125 Views
3 replies
2 kudos

Service Principal access notebooks created under /Workspace/Users

What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?Hi everyone,We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a wor...

Data Engineering

125 Views
3 replies
2 kudos

Monday

View Replies

Latest Reply

DineshOjha
New Contributor II

yesterday

2 kudos

Thank you so much for your response.We dont prefer to keep the notebooks under Shared or run our jobs pointing to the Shared location. We have more than 200 applications and different teams working on them. Each application has a service principal as...

2 kudos

yesterday

2 More Replies

by malterializedvw • New Contributor II

a week ago

185 Views
5 replies
3 kudos

Parametrizing queries in DAB deployments

Hi folks,I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs...

Data Engineering

185 Views
5 replies
3 kudos

a week ago

View Replies

Latest Reply

Pat
Esteemed Contributor

a week ago

3 kudos

Not sure if I understand correctly, but the issue is that you are using .sql file that have hardcoded env?Use bundle substitutions/variables inside the SQL file itselfDatabricks Asset Bundles support variable substitution not only in YAML but also in...

3 kudos

a week ago

4 More Replies

by amrim • New Contributor III

19 hours ago

35 Views
1 replies
0 kudos

Notebook dashboards: ugly export

Hello,Recently (within the last 2 weeks) the appearance of the notebook dashboard export got quite ugly.Consider this notebook dashboard as seen from Databricks:Databricks-dashboard-viewWhen exporting it as a file, it shows as this:file-dashboard-vie...

Data Engineering

35 Views
1 replies
0 kudos

19 hours ago

View Replies

Latest Reply

emma_s
Databricks Employee

17 hours ago

0 kudos

Hi, This sounds like a regression in the Databricks platform from a recent release. My recommendation would be to file a support ticket or raise with your account team. They'll be able to look into whether a fix for it is available and when it will b...

0 kudos

17 hours ago

by Danish11052000 • Contributor

yesterday

111 Views
2 replies
1 kudos

Resolved! Discrepancy between Azure Billing and Databricks System Tables

Hi everyone,I am currently building a cost-tracking dashboard, but I’ve run into a major reconciliation issue between the Azure Portal and Databricks.The Numbers (Current Month):Azure Cost Management Portal: $2,041.79 (Total cost as per specific adb ...

Data Engineering

111 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

emma_s
Databricks Employee

yesterday

1 kudos

Hi, Yes you're correct in your conclusion the Databricks tables just use list price and therefore don't apply any negotiated discounts. They also won't include the underying VM cost when using classic compute. Most of our customers just handle the di...

1 kudos

yesterday

1 More Replies

by AngelShrestha • New Contributor II

yesterday

49 Views
4 replies
2 kudos

Error updating schema: SCHEMA_FOREIGN_SQLSERVER update_mask requirement.

What I tried:Updating the description via UI (AI Suggested Description / manual editI’m running into an issue while trying to update the description for the schema.Context:Type: SCHEMA_FOREIGN_SQLSERVERError message:Failed to save description. Please...

Data Engineering

49 Views
4 replies
2 kudos

yesterday

View Replies

Latest Reply

emma_s
Databricks Employee

17 hours ago

2 kudos

Hi, Yes 100%, if you use Lakeflow connect, it will ingest the data and they will become managed tables. Which will support the descriptions and comments. You should also get some query improvement as you're actually moving the data rather than queryi...

2 kudos

17 hours ago

3 More Replies

by bricks_2026 • New Contributor

yesterday

69 Views
4 replies
0 kudos

Issue while handling Deletes and Inserts in Structured Streaming

HelloWe have a framework which reads the CDF logs from the source table and then merges to the target table. The logic is implemented in such a way that( if there are multiple commit_versions in the source table), a window function is applied to iden...

Data Engineering

69 Views
4 replies
0 kudos

yesterday

View Replies

Latest Reply

aleksandra_ch
Databricks Employee

19 hours ago

0 kudos

Hi @bricks_2026 , I recommend you to consider moving to AUTO CDC which handles the merge and window logic of CDF flow automatically. You need SCD Type 1 to get the last operation only. Check out these docs: Stop hand-coding change data capture pipeli...

0 kudos

19 hours ago

3 More Replies

by NageshPatil • New Contributor III

4 weeks ago

267 Views
4 replies
1 kudos

Lakeflow partial data ingestion for first load

Hi Team,I am doing ingestion of 10 tables from Azure SQL through Lakeflow connect. I have created gateway and ingestion pipelines using databricks SDK. I am starting ingestion pipeline only when gateway is in Running status with resources. I observed...

Data Engineering

267 Views
4 replies
1 kudos

4 weeks ago

View Replies

Latest Reply

emma_s
Databricks Employee

17 hours ago

1 kudos

Hi, the recommended approach on this is to just run the pipeline multiple times on the initial load until all the data is captured. You can also monitor the snapshot completed events in the gateway completed log before triggering the ingestion, but t...

1 kudos

17 hours ago

3 More Replies

Databricks Community

Forum Posts

Questions on Auto Loader auto Listing Logic

Resolved! ingestion pipeline configuration

How to disable full refresh on materialized view if the src is not updated since last run

Databricks Cost Estimation Template

SQL schemas migration

Logging inside foreachbatch

setup justfile command in order to launch your spark application

OrderBy is not sorting the results

Service Principal access notebooks created under /Workspace/Users

Parametrizing queries in DAB deployments

Notebook dashboards: ugly export

Resolved! Discrepancy between Azure Billing and Databricks System Tables

Error updating schema: SCHEMA_FOREIGN_SQLSERVER update_mask requirement.

Issue while handling Deletes and Inserts in Structured Streaming

Lakeflow partial data ingestion for first load

ingestion pipeline configuration

Question on cluster sizing as per SLA - No resourc...

Lakebridge Reconcile Config not supporting "sfAuth...

Discrepancy between Azure Billing and Databricks S...

Databricks App Issue– “socket hang up / ECONNRESET...