Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best prac...
Explore discussions on Databricks administration, deployment strategies, and architectural best prac...
Join discussions on data engineering best practices, architectures, and optimization strategies with...
Join discussions on data governance practices, compliance, and security within the Databricks Commun...
Explore discussions on generative artificial intelligence techniques and applications within the Dat...
Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithm...
Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Communi...
Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.When we read CSV or JSON files with multiLine=true, the load becomes sing...
spark_partition_id is the closest and most performant function available as an alternative, and I migrated to use this function. So far, no issues.https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.spark_p...
Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.We have several scenarios where we need to send an inline email via Pytho...
The solution we implemented as an alternative for email sending from Serverless is via the Microsoft Graph API.https://learn.microsoft.com/en-us/graph/api/user-sendmail?view=graph-rest-1.0&tabs=python
HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...
Hi @Nick_Hughes ,I am very late to the party, but I was digging in the internet to find more people discussing a relatable problem for which I am on my way building a definitive solution, and came across your post from 3 years ago. Times have changed...
Hi,I have created 2 identical tables but one is partitioned and the one is a Liquid Clustered with Auto Clustering.I inserted 30M rows x 2 (60M) for two dates , date 1 = 2026-06-01 and date = 2026-06-02 , then I overwrite the date 2026-06-02 with a s...
Hi,I need to compare the sizes of my delta tables , what's the correct approach ?Table size reported by analyze command ? , but how do I check the delta log size , if I enable CDF .. how do I know the CDF log size(the overhead it adds) ? , kind of l...
Hi @RGSLCA DESCRIBE DETAIL is the best starting point if you're comparing Delta table sizes, but it's important to understand what it reports. The sizeInBytes value represents only the latest active snapshot of the table, not the total storage consum...
Hi community,I've been trying to connect Databricks Serverless to a SQL Serverrunning on an EC2 instance using NCC Private Endpoint, but trafficis not being routed through PrivateLink.## Setup- Databricks Serverless (AWS, us-east-1)- NCC attached to ...
Louis thanks,My issue was with the NLB. The Databricks documentation does not specify some of the required settings for this configuration. I found the solution in this post:https://medium.com/databricks-platform-sme/aws-databricks-serverless-private...
In the article Manage budgets and cost controls for Genie at Manage budgets and cost controls for Genie - Azure Databricks | Microsoft Learn it mentions setting the resource type of Unity AI gateway from the dropdown. However when using Account Conso...
Hi @AndyRoyle, Per the public docs, the expected flow is that once that preview is enabled, you go to the Budgets page in Account Console, and the budget definition should show a Resource types dropdown where you can choose Unity AI Gateway. You can ...
Is the "move fast and break things" startup mindset actually fatal for custom AI agent development?I read that rushing MVPs creates massive tech debt and unstable guardrails that cause projects to crash by year two.Have any of you hit this "year 2 wa...
Hello!I'm getting "Error while starting SQL warehouse. Sorry, cannot run the resource because you have hit your free daily limit. Please come back again tomorrow."Do you know when exactly the reset happens?Thank you!
Hi @mark_lenders, Hi, I don’t believe there’s a publicly documented exact reset time for this limit. The Databricks Free Edition limitations page shared by @balajij8 explains that if you exceed your quota, compute resources can become unavailable for...
Hi everyone,Don't know where to put this specific question. I'm working on a reference data mesh implementation to connect and combine datasets to find matching trials for patients with a specific genetic profile. - Do you know anyone that might be i...
Hi @steff_horemans, My guess is that the Free Edition is probably not the best place to demonstrate true external sharing. Free Edition is positioned as a serverless-only, quota-limited, non-commercial environment, with one workspace, one metastore, ...
Hi everyone,Don't know where to put this specific question. I'm working on a reference data mesh implementation to connect and combine datasets to find matching trials for patients with a specific genetic profile. - Do you know anyone that might be i...
Hi @steff_horemans, Yes, this is absolutely fine to ask here. You're touching a few quite different areas, though... Trial matching/reference data design, GenAI extraction of eligibility criteria, and governance/serving of bespoke models. You’ll like...
Hi, Databricks Champions I am using SQL MCP server. I was able to connect to MCP and run my sql queries. However as my query out goes above 32,768 it gets truncated and I am not able to get the complete output. I can only pass warehouse_id in "_meta"...
HI @Manas2000 ,This appears to be a Databricks SQL MCP limitation, not a Databricks SQL warehouse limitation.A hacky workaround might be to manually paginate in SQL, e.g., with `ROW_NUMBER()` or `LIMIT/OFFSET`, and run separate MCP calls for each pag...
Hi,I’m setting up an external model serving endpoint for Azure OpenAI through Databricks Model Serving / Mosaic AI Gateway, and I’m trying to understand whether the newer (more than a year old at this point) Azure OpenAI v1 API is currently supported...
Short answer: based on the current Databricks docs, I would treat the built-in Azure OpenAI external model provider as expecting the older Azure OpenAI configuration shape, not the newer `/openai/v1/` shape.The key clue is that the Databricks Azure O...
As I understand it, streaming tables and materialized views produced by Lakeflow Spark Declarative Pipelines (DLT) are always Unity Catalog managed tables , there's no LOCATION/path option on create_streaming_table or apply_changes.Is that correct? A...
Hi @nidhin, What you’re saying is basically correct for a Unity Catalog-enabled Lakeflow Spark Declarative Pipelines setup. In that model, pipelines publish streaming tables and materialized views into the target catalog and schema, the data is store...
Hi all,I’m currently exploring how we can format notebook code using Black (installed via libraries) with specific configurations.I understand that we can configure Black locally using a pyproject.toml file. However, I’d like to know if there’s a way...
I followed this description (black, pyproject.toml) and it worked for months.But now we found that the "format code" task does not consider pyproject.toml-defined line-length anymore.Is there any change in availability?
| User | Count |
|---|---|
| 1837 | |
| 885 | |
| 791 | |
| 471 | |
| 312 |