cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

JIWON
by New Contributor III
  • 14 Views
  • 0 replies
  • 0 kudos

Questions on Auto Loader auto Listing Logic

Hi everyone,I’m investigating some performance patterns in our Auto Loader (S3) pipelines and would like to clarify the internal listing logic.Context: We run a batch job every hour using Auto Loader. Recently, after March 10th, we noticed our execut...

  • 14 Views
  • 0 replies
  • 0 kudos
Neelimak
by New Contributor
  • 120 Views
  • 4 replies
  • 2 kudos

Resolved! ingestion pipeline configuration

When trying to create a ingestion pipelines, auto generated cluster is hitting quota limit errors. The type of vm its trying to use is not available in our region and there seems no way to add fallback to different types of vms. Can you please help h...

  • 120 Views
  • 4 replies
  • 2 kudos
Latest Reply
Neelimak
New Contributor
  • 2 kudos

Thanks Ashwin. I hope that when creating pipelines through UI, SKU availability and quota is taken into account in future improvements. As it stands today for simpler/ POC type of implementation, this is a major roadblock. Thank you. 

  • 2 kudos
3 More Replies
Phani1
by Databricks MVP
  • 104 Views
  • 2 replies
  • 0 kudos

Databricks Cost Estimation Template

Hi Databricks Team, Is there a standard Databricks cost estimation template(xl), sizing calculator, or TCO tool that allows us to provide the following inputs and derive an approximate monthly and annual platform cost:Source systems and their types (...

  • 104 Views
  • 2 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, There isn't anything publicly available that I'm aware of. For this kind of complex migration I'd recommend working with your account team. As somebody who does Databricks sizing a lot, it's a nuanced art which I suspect is why we don't have any ...

  • 0 kudos
1 More Replies
maikel
by Contributor II
  • 376 Views
  • 4 replies
  • 0 kudos

SQL schemas migration

Hello Community!I would like to ask for your recommendation in terms of SQL schemas migration best practice. In our project, currently we have different SQL schemas definition and data seeding saved in SQL files. Since we are going to higher environm...

  • 376 Views
  • 4 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Great question — and since you already have DABs and numbered SQL files, you're most of the way there. You do not need Alembic or SQLAlchemy. Here's a concrete implementation of the migration runner pattern that plugs directly into your existing DABs...

  • 0 kudos
3 More Replies
AndriusVitkausk
by New Contributor III
  • 26 Views
  • 0 replies
  • 0 kudos

Logging inside foreachbatch

Im having difficulty carrying out logging inside a foreachbatch.Seems that all logging outside the foreachbatch works as expected, but inside they are only visable from the spark UI in driver logs. Is there a way to get this to work (inc serverless)?

  • 26 Views
  • 0 replies
  • 0 kudos
seefoods
by Valued Contributor
  • 142 Views
  • 2 replies
  • 0 kudos

setup justfile command in order to launch your spark application

Hello Guys, Actually, I build a just file for my project which will be execute my wheel job task using command line, but when i run my wheel task i have encountered this error. from pyspark.sql.connect.expressions import PythonUDFEnvironmentImportErr...

  • 142 Views
  • 2 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This ImportError happens because you have both standalone pyspark and databricks-connect installed, and they conflict with each other. databricks-connect bundles its own version of PySpark internally — when the standalone pyspark package is also pres...

  • 0 kudos
1 More Replies
IM_01
by Contributor II
  • 79 Views
  • 5 replies
  • 0 kudos

OrderBy is not sorting the results

Hi,I am currently using Lakeflow SDP ,firstly I am creating 2 views and then joining them and creating materialized view and using order by in the materialized view create function , but the results are not sorted does order by not work on materializ...

  • 79 Views
  • 5 replies
  • 0 kudos
Latest Reply
IM_01
Contributor II
  • 0 kudos

Hi @Ashwin_DSA Even the tables does not guarantee ordering could you please explain me the reason just curious.. I was in perception that using delta tables would solve problem.And view its a wrapper around select I thought it would work.

  • 0 kudos
4 More Replies
DineshOjha
by New Contributor II
  • 125 Views
  • 3 replies
  • 2 kudos

Service Principal access notebooks created under /Workspace/Users

What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?Hi everyone,We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a wor...

  • 125 Views
  • 3 replies
  • 2 kudos
Latest Reply
DineshOjha
New Contributor II
  • 2 kudos

Thank you so much for your response.We dont prefer to keep the notebooks under Shared or run our jobs pointing to the Shared location. We have more than 200 applications and different teams working on them. Each application has a service principal as...

  • 2 kudos
2 More Replies
malterializedvw
by New Contributor II
  • 185 Views
  • 5 replies
  • 3 kudos

Parametrizing queries in DAB deployments

Hi folks,I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs...

  • 185 Views
  • 5 replies
  • 3 kudos
Latest Reply
Pat
Esteemed Contributor
  • 3 kudos

Not sure if I understand correctly, but the issue is that you are using .sql file that have hardcoded env?Use bundle substitutions/variables inside the SQL file itselfDatabricks Asset Bundles support variable substitution not only in YAML but also in...

  • 3 kudos
4 More Replies
amrim
by New Contributor III
  • 35 Views
  • 1 replies
  • 0 kudos

Notebook dashboards: ugly export

Hello,Recently (within the last 2 weeks) the appearance of the notebook dashboard export got quite ugly.Consider this notebook dashboard as seen from Databricks:Databricks-dashboard-viewWhen exporting it as a file, it shows as this:file-dashboard-vie...

Screenshot 2026-03-26 102956.png Screenshot 2026-03-26 102851.png previous-dashboard-file-view.png
  • 35 Views
  • 1 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

Hi, This sounds like a regression in the Databricks platform from a recent release. My recommendation would be to file a support ticket or raise with your account team. They'll be able to look into whether a fix for it is available and when it will b...

  • 0 kudos
Danish11052000
by Contributor
  • 111 Views
  • 2 replies
  • 1 kudos

Resolved! Discrepancy between Azure Billing and Databricks System Tables

Hi everyone,I am currently building a cost-tracking dashboard, but I’ve run into a major reconciliation issue between the Azure Portal and Databricks.The Numbers (Current Month):Azure Cost Management Portal: $2,041.79 (Total cost as per specific adb ...

  • 111 Views
  • 2 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, Yes you're correct in your conclusion the Databricks tables just use list price and therefore don't apply any negotiated discounts. They also won't include the underying VM cost when using classic compute. Most of our customers just handle the di...

  • 1 kudos
1 More Replies
AngelShrestha
by New Contributor II
  • 49 Views
  • 4 replies
  • 2 kudos

Error updating schema: SCHEMA_FOREIGN_SQLSERVER update_mask requirement.

What I tried:Updating the description via UI (AI Suggested Description / manual editI’m running into an issue while trying to update the description for the schema.Context:Type: SCHEMA_FOREIGN_SQLSERVERError message:Failed to save description. Please...

  • 49 Views
  • 4 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, Yes 100%, if you use Lakeflow connect, it will ingest the data and they will become managed tables. Which will support the descriptions and comments. You should also get some query improvement as you're actually moving the data rather than queryi...

  • 2 kudos
3 More Replies
bricks_2026
by New Contributor
  • 69 Views
  • 4 replies
  • 0 kudos

Issue while handling Deletes and Inserts in Structured Streaming

HelloWe have a framework which reads the CDF logs from the source table and then merges to the target table. The logic is implemented in such a way that( if there are multiple commit_versions in the source table), a window function is applied to iden...

  • 69 Views
  • 4 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @bricks_2026 , I recommend you to consider moving to AUTO CDC which handles the merge and window logic of CDF flow automatically. You need SCD Type 1 to get the last operation only. Check out these docs: Stop hand-coding change data capture pipeli...

  • 0 kudos
3 More Replies
NageshPatil
by New Contributor III
  • 267 Views
  • 4 replies
  • 1 kudos

Lakeflow partial data ingestion for first load

Hi Team,I am doing ingestion of 10 tables from Azure SQL through Lakeflow connect. I have created gateway and ingestion pipelines using databricks SDK. I am starting ingestion pipeline only when gateway is in Running status with resources. I observed...

  • 267 Views
  • 4 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi, the recommended approach on this is to just run the pipeline multiple times on the initial load until all the data is captured. You can also monitor the snapshot completed events in the gateway completed log before triggering the ingestion, but t...

  • 1 kudos
3 More Replies
Labels