cancel
Showing results for 
Search instead for 
Did you mean: 
Databricks Platform Discussions
Dive into comprehensive discussions covering various aspects of the Databricks platform. Join the conversation to deepen your understanding and maximize your usage of the Databricks platform.
cancel
Showing results for 
Search instead for 
Did you mean: 

Browse the Community

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies with...

12404 Posts

Data Governance

Join discussions on data governance practices, compliance, and security within the Databricks Commun...

540 Posts

Generative AI

Explore discussions on generative artificial intelligence techniques and applications within the Dat...

421 Posts

Machine Learning

Dive into the world of machine learning on the Databricks platform. Explore discussions on algorithm...

1029 Posts

Warehousing & Analytics

Engage in discussions on data warehousing, analytics, and BI solutions within the Databricks Communi...

700 Posts

Activity in Databricks Platform Discussions

Ramana
by > Valued Contributor II
  • 2670 Views
  • 6 replies
  • 4 kudos

Resolved! Serverless Compute - pySpark - Any alternative for rdd.getNumPartitions()

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.When we read CSV or JSON files with multiLine=true, the load becomes sing...

  • 2670 Views
  • 6 replies
  • 4 kudos
Latest Reply
Ramana
Valued Contributor II
  • 4 kudos

spark_partition_id is the closest and most performant function available as an alternative, and I migrated to use this function. So far, no issues.https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.functions.spark_p...

  • 4 kudos
5 More Replies
Ramana
by > Valued Contributor II
  • 1288 Views
  • 3 replies
  • 0 kudos

Resolved! Serverless Compute - Python - Custom Emails via SMTP (smtplib.SMTP(host_name)) - Any alternative?

Hello Community,We have been trying to migrate our jobs from Classic Compute to Serverless Compute. As part of this process, we face several challenges, and this is one of them.We have several scenarios where we need to send an inline email via Pytho...

  • 1288 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ramana
Valued Contributor II
  • 0 kudos

The solution we implemented as an alternative for email sending from Serverless is via the Microsoft Graph API.https://learn.microsoft.com/en-us/graph/api/user-sendmail?view=graph-rest-1.0&tabs=python 

  • 0 kudos
2 More Replies
Nick_Hughes
by > New Contributor III
  • 17215 Views
  • 4 replies
  • 1 kudos

Best way to generate fake data using underlying schema

HiWe are trying to generate fake data to run our tests. For example, we have a pipeline that creates a gold layer fact table form 6 underlying source tables in our silver layer. We want to generate the data in a way that recognises the relationships ...

  • 17215 Views
  • 4 replies
  • 1 kudos
Latest Reply
muhammedrasin
  • 1 kudos

Hi @Nick_Hughes ,I am very late to the party, but I was digging in the internet to find more people discussing a relatable problem for which I am on my way building a definitive solution, and came across your post from 3 years ago. Times have changed...

  • 1 kudos
3 More Replies
RGSLCA
by > New Contributor II
  • 56 Views
  • 0 replies
  • 0 kudos

Selective overwrite on Partition and Liquid clustered tables

Hi,I have created 2 identical tables but one is partitioned and the one is a Liquid Clustered with Auto Clustering.I inserted 30M rows x 2 (60M) for two dates , date 1 = 2026-06-01 and date = 2026-06-02 , then I overwrite the date 2026-06-02 with a s...

  • 56 Views
  • 0 replies
  • 0 kudos
RGSLCA
by > New Contributor II
  • 381 Views
  • 7 replies
  • 0 kudos

Sizing Tables and delt logs/CDF

Hi,I need to compare the sizes of my delta tables , what's the correct approach ?Table size reported by analyze  command ? , but how do I check the delta log size , if I enable CDF .. how do I know the CDF log size(the overhead it adds) ? , kind of l...

  • 381 Views
  • 7 replies
  • 0 kudos
Latest Reply
Vikram10
New Contributor II
  • 0 kudos

Hi @RGSLCA DESCRIBE DETAIL is the best starting point if you're comparing Delta table sizes, but it's important to understand what it reports. The sizeInBytes value represents only the latest active snapshot of the table, not the total storage consum...

  • 0 kudos
6 More Replies
Javier_Epad
by > New Contributor II
  • 367 Views
  • 2 replies
  • 0 kudos

Resolved! Serverless NCC Private Endpoint ESTABLISHED but traffic routes via eth0 instead of PrivateLink (AWS

Hi community,I've been trying to connect Databricks Serverless to a SQL Serverrunning on an EC2 instance using NCC Private Endpoint, but trafficis not being routed through PrivateLink.## Setup- Databricks Serverless (AWS, us-east-1)- NCC attached to ...

  • 367 Views
  • 2 replies
  • 0 kudos
Latest Reply
Javier_Epad
New Contributor II
  • 0 kudos

Louis thanks,My issue was with the NLB. The Databricks documentation does not specify some of the required settings for this configuration. I found the solution in this post:https://medium.com/databricks-platform-sme/aws-databricks-serverless-private...

  • 0 kudos
1 More Replies
AndyRoyle
by > New Contributor II
  • 245 Views
  • 5 replies
  • 0 kudos

Resolved! Manage budgets and cost controls for Genie

In the article Manage budgets and cost controls for Genie at Manage budgets and cost controls for Genie - Azure Databricks | Microsoft Learn it mentions setting the resource type of Unity AI gateway from the dropdown. However when using Account Conso...

AndyRoyle_0-1781788662412.png
  • 245 Views
  • 5 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @AndyRoyle, Per the public docs, the expected flow is that once that preview is enabled, you go to the Budgets page in Account Console, and the budget definition should show a Resource types dropdown where you can choose Unity AI Gateway. You can ...

  • 0 kudos
4 More Replies
Inument
by > New Contributor
  • 71 Views
  • 0 replies
  • 0 kudos

Does "move fast and break things" ruin AI agents?

Is the "move fast and break things" startup mindset actually fatal for custom AI agent development?I read that rushing MVPs creates massive tech debt and unstable guardrails that cause projects to crash by year two.Have any of you hit this "year 2 wa...

  • 71 Views
  • 0 replies
  • 0 kudos
mark_lenders
by > New Contributor
  • 179 Views
  • 2 replies
  • 0 kudos

You have hit your free daily limit

Hello!I'm getting  "Error while starting SQL warehouse. Sorry, cannot run the resource because you have hit your free daily limit. Please come back again tomorrow."Do you know when exactly the reset happens?Thank you!

  • 179 Views
  • 2 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @mark_lenders, Hi, I don’t believe there’s a publicly documented exact reset time for this limit. The Databricks Free Edition limitations page shared by @balajij8 explains that if you exceed your quota, compute resources can become unavailable for...

  • 0 kudos
1 More Replies
steff_horemans
by > New Contributor
  • 80 Views
  • 1 replies
  • 0 kudos

First time community member working on a project

Hi everyone,Don't know where to put this specific question. I'm working on a reference data mesh implementation to connect and combine datasets to find matching trials for patients with a specific genetic profile. - Do you know anyone that might be i...

  • 80 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @steff_horemans, My guess is that the Free Edition is probably not the best place to demonstrate true external sharing. Free Edition is positioned as a serverless-only, quota-limited, non-commercial environment, with one workspace, one metastore, ...

  • 0 kudos
steff_horemans
by > New Contributor
  • 78 Views
  • 1 replies
  • 0 kudos

First time user of the community platform

Hi everyone,Don't know where to put this specific question. I'm working on a reference data mesh implementation to connect and combine datasets to find matching trials for patients with a specific genetic profile. - Do you know anyone that might be i...

  • 78 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 0 kudos

Hi @steff_horemans, Yes, this is absolutely fine to ask here. You're touching a few quite different areas, though... Trial matching/reference data design, GenAI extraction of eligibility criteria, and governance/serving of bespoke models. You’ll like...

  • 0 kudos
Manas2000
by > New Contributor
  • 180 Views
  • 1 replies
  • 0 kudos

DBSQL MCP output limit

Hi, Databricks Champions I am using SQL MCP server. I was able to connect to MCP and run my sql queries. However as my query out goes above 32,768 it gets truncated and I am not able to get the complete output. I can only pass warehouse_id in "_meta"...

  • 180 Views
  • 1 replies
  • 0 kudos
Latest Reply
frankieseabrook
New Contributor
  • 0 kudos

HI @Manas2000 ,This appears to be a Databricks SQL MCP limitation, not a Databricks SQL warehouse limitation.A hacky workaround might be to manually paginate in SQL, e.g., with `ROW_NUMBER()` or `LIMIT/OFFSET`, and run separate MCP calls for each pag...

  • 0 kudos
mbecker
by > New Contributor
  • 318 Views
  • 1 replies
  • 0 kudos

Azure OpenAI v1 API support for External Model Serving / Mosaic AI Gateway?

Hi,I’m setting up an external model serving endpoint for Azure OpenAI through Databricks Model Serving / Mosaic AI Gateway, and I’m trying to understand whether the newer (more than a year old at this point) Azure OpenAI v1 API is currently supported...

  • 318 Views
  • 1 replies
  • 0 kudos
Latest Reply
frankieseabrook
New Contributor
  • 0 kudos

Short answer: based on the current Databricks docs, I would treat the built-in Azure OpenAI external model provider as expecting the older Azure OpenAI configuration shape, not the newer `/openai/v1/` shape.The key clue is that the Databricks Azure O...

  • 0 kudos
nidhin
by > New Contributor III
  • 110 Views
  • 2 replies
  • 1 kudos

Lakeflow SDP (DLT) produce external tables, or only UC-managed

As I understand it, streaming tables and materialized views produced by Lakeflow Spark Declarative Pipelines (DLT) are always Unity Catalog managed tables , there's no LOCATION/path option on create_streaming_table or apply_changes.Is that correct? A...

  • 110 Views
  • 2 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @nidhin, What you’re saying is basically correct for a Unity Catalog-enabled Lakeflow Spark Declarative Pipelines setup. In that model, pipelines publish streaming tables and materialized views into the target catalog and schema, the data is store...

  • 1 kudos
1 More Replies
Barnita
by > New Contributor III
  • 1518 Views
  • 5 replies
  • 2 kudos

Resolved! How to run black code-formating on the notebooks using custom configurations in UI

Hi all,I’m currently exploring how we can format notebook code using Black (installed via libraries) with specific configurations.I understand that we can configure Black locally using a pyproject.toml file. However, I’d like to know if there’s a way...

  • 1518 Views
  • 5 replies
  • 2 kudos
Latest Reply
holunder42
New Contributor III
  • 2 kudos

I followed this description (black, pyproject.toml) and it worked for months.But now we found that the "format code" task does not consider pyproject.toml-defined line-length anymore.Is there any change in availability?

  • 2 kudos
4 More Replies