by
JIWON
• New Contributor III
- 14 Views
- 0 replies
- 0 kudos
Hi everyone,I’m investigating some performance patterns in our Auto Loader (S3) pipelines and would like to clarify the internal listing logic.Context: We run a batch job every hour using Auto Loader. Recently, after March 10th, we noticed our execut...
- 14 Views
- 0 replies
- 0 kudos
- 120 Views
- 4 replies
- 2 kudos
When trying to create a ingestion pipelines, auto generated cluster is hitting quota limit errors. The type of vm its trying to use is not available in our region and there seems no way to add fallback to different types of vms. Can you please help h...
- 120 Views
- 4 replies
- 2 kudos
Latest Reply
Thanks Ashwin. I hope that when creating pipelines through UI, SKU availability and quota is taken into account in future improvements. As it stands today for simpler/ POC type of implementation, this is a major roadblock. Thank you.
3 More Replies
by
IM_01
• Contributor II
- 27 Views
- 0 replies
- 0 kudos
Hi,Is there any option to control refresh rate of materialized view such as , even the dlt is triggered in full refresh mode and src Streaming tables are not updated then the full refresh should not happen on mvs . Is there any way to achieve this.
- 27 Views
- 0 replies
- 0 kudos
- 104 Views
- 2 replies
- 0 kudos
Hi Databricks Team, Is there a standard Databricks cost estimation template(xl), sizing calculator, or TCO tool that allows us to provide the following inputs and derive an approximate monthly and annual platform cost:Source systems and their types (...
- 104 Views
- 2 replies
- 0 kudos
Latest Reply
Hi, There isn't anything publicly available that I'm aware of. For this kind of complex migration I'd recommend working with your account team. As somebody who does Databricks sizing a lot, it's a nuanced art which I suspect is why we don't have any ...
1 More Replies
- 376 Views
- 4 replies
- 0 kudos
Hello Community!I would like to ask for your recommendation in terms of SQL schemas migration best practice. In our project, currently we have different SQL schemas definition and data seeding saved in SQL files. Since we are going to higher environm...
- 376 Views
- 4 replies
- 0 kudos
Latest Reply
Great question — and since you already have DABs and numbered SQL files, you're most of the way there. You do not need Alembic or SQLAlchemy. Here's a concrete implementation of the migration runner pattern that plugs directly into your existing DABs...
3 More Replies
- 26 Views
- 0 replies
- 0 kudos
Im having difficulty carrying out logging inside a foreachbatch.Seems that all logging outside the foreachbatch works as expected, but inside they are only visable from the spark UI in driver logs. Is there a way to get this to work (inc serverless)?
- 26 Views
- 0 replies
- 0 kudos
- 142 Views
- 2 replies
- 0 kudos
Hello Guys, Actually, I build a just file for my project which will be execute my wheel job task using command line, but when i run my wheel task i have encountered this error. from pyspark.sql.connect.expressions import PythonUDFEnvironmentImportErr...
- 142 Views
- 2 replies
- 0 kudos
Latest Reply
This ImportError happens because you have both standalone pyspark and databricks-connect installed, and they conflict with each other. databricks-connect bundles its own version of PySpark internally — when the standalone pyspark package is also pres...
1 More Replies
by
IM_01
• Contributor II
- 79 Views
- 5 replies
- 0 kudos
Hi,I am currently using Lakeflow SDP ,firstly I am creating 2 views and then joining them and creating materialized view and using order by in the materialized view create function , but the results are not sorted does order by not work on materializ...
- 79 Views
- 5 replies
- 0 kudos
Latest Reply
Hi @Ashwin_DSA Even the tables does not guarantee ordering could you please explain me the reason just curious.. I was in perception that using delta tables would solve problem.And view its a wrapper around select I thought it would work.
4 More Replies
- 125 Views
- 3 replies
- 2 kudos
What permissions does a Service Principal need to run Databricks jobs that reference notebooks created by a user and stored in Git?Hi everyone,We are exploring the notebooks‑first development approach with Databricks Bundles, and we’ve run into a wor...
- 125 Views
- 3 replies
- 2 kudos
Latest Reply
Thank you so much for your response.We dont prefer to keep the notebooks under Shared or run our jobs pointing to the Shared location. We have more than 200 applications and different teams working on them. Each application has a service principal as...
2 More Replies
- 185 Views
- 5 replies
- 3 kudos
Hi folks,I would like to ask for best practises concerning the topic of parametrizing queries in Databricks Asset Bundle deployments.This topic is relevant to differentiate between deployments on different environments as well as [dev]-deployments vs...
- 185 Views
- 5 replies
- 3 kudos
Latest Reply
Not sure if I understand correctly, but the issue is that you are using .sql file that have hardcoded env?Use bundle substitutions/variables inside the SQL file itselfDatabricks Asset Bundles support variable substitution not only in YAML but also in...
4 More Replies
by
amrim
• New Contributor III
- 35 Views
- 1 replies
- 0 kudos
- 35 Views
- 1 replies
- 0 kudos
Latest Reply
Hi,
This sounds like a regression in the Databricks platform from a recent release. My recommendation would be to file a support ticket or raise with your account team. They'll be able to look into whether a fix for it is available and when it will b...
- 111 Views
- 2 replies
- 1 kudos
Hi everyone,I am currently building a cost-tracking dashboard, but I’ve run into a major reconciliation issue between the Azure Portal and Databricks.The Numbers (Current Month):Azure Cost Management Portal: $2,041.79 (Total cost as per specific adb ...
- 111 Views
- 2 replies
- 1 kudos
Latest Reply
Hi,
Yes you're correct in your conclusion the Databricks tables just use list price and therefore don't apply any negotiated discounts. They also won't include the underying VM cost when using classic compute. Most of our customers just handle the di...
1 More Replies
- 49 Views
- 4 replies
- 2 kudos
What I tried:Updating the description via UI (AI Suggested Description / manual editI’m running into an issue while trying to update the description for the schema.Context:Type: SCHEMA_FOREIGN_SQLSERVERError message:Failed to save description. Please...
- 49 Views
- 4 replies
- 2 kudos
Latest Reply
Hi, Yes 100%, if you use Lakeflow connect, it will ingest the data and they will become managed tables. Which will support the descriptions and comments. You should also get some query improvement as you're actually moving the data rather than queryi...
3 More Replies
- 69 Views
- 4 replies
- 0 kudos
HelloWe have a framework which reads the CDF logs from the source table and then merges to the target table. The logic is implemented in such a way that( if there are multiple commit_versions in the source table), a window function is applied to iden...
- 69 Views
- 4 replies
- 0 kudos
Latest Reply
Hi @bricks_2026 ,
I recommend you to consider moving to AUTO CDC which handles the merge and window logic of CDF flow automatically. You need SCD Type 1 to get the last operation only. Check out these docs:
Stop hand-coding change data capture pipeli...
3 More Replies
- 267 Views
- 4 replies
- 1 kudos
Hi Team,I am doing ingestion of 10 tables from Azure SQL through Lakeflow connect. I have created gateway and ingestion pipelines using databricks SDK. I am starting ingestion pipeline only when gateway is in Running status with resources. I observed...
- 267 Views
- 4 replies
- 1 kudos
Latest Reply
Hi, the recommended approach on this is to just run the pipeline multiple times on the initial load until all the data is captured. You can also monitor the snapshot completed events in the gateway completed log before triggering the ingestion, but t...
3 More Replies