cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ChristianRRL
by Honored Contributor
  • 1036 Views
  • 3 replies
  • 1 kudos

Spark Declarative Pipelines use in All-purpose compute?

Hi there, I know Spark Declarative Pipelines (previously DLT) has undergone some changes since last year and is even now open source (announcement). For a long while, I know that SDP/DLT was locked to only working with job compute. I'm wondering, wit...

  • 1036 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @ChristianRRL, To address both your original question and your follow-up about the open-source angle: CURRENT STATE ON DATABRICKS Lakeflow Spark Declarative Pipelines (SDP), the current name for what was previously known as DLT, runs on its own ma...

  • 1 kudos
2 More Replies
JothyGanesan
by New Contributor III
  • 850 Views
  • 2 replies
  • 0 kudos

DLT Continuous Pipeline load

Hi All,In our project we are working on the DLT pipeline with the DLT tables as target running in continuous mode.These tables are common for multiple countries, and we go live in batches for different countries.So, every time a new change is request...

  • 850 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @JothyGanesan,This is a common scenario when running Lakeflow Spark Declarative Pipelines (SDP), previously known as DLT, in continuous mode across multi-country rollouts. There are several strategies to handle metadata changes on your streaming t...

  • 0 kudos
1 More Replies
Rose_15
by New Contributor II
  • 1799 Views
  • 4 replies
  • 0 kudos

Databricks SQL Warehouse fails when streaming ~53M rows via Python (token/session expiry)

Hello Team,I am facing a consistent issue when streaming a large table (~53 million rows) from a Databricks SQL Warehouse using Python (databricks-sql-connector) with OAuth authentication.I execute a single long-running query and fetch data in batche...

  • 1799 Views
  • 4 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Rose_15,The behavior you are seeing is expected for a result set of this size fetched over a single long-lived cursor. Here is what is happening and several approaches to resolve it.WHY THE FAILURE OCCURSThe Databricks SQL Connector for Python us...

  • 0 kudos
3 More Replies
Chandana_Ramesh
by New Contributor II
  • 1091 Views
  • 6 replies
  • 1 kudos

Lakebridge SetUp Issue

Hi,I'm getting the below error upon executing databricks labs lakebridge analyze command. All the dependencies have been installed before execution of the command. Can someone please give a solution, or suggest if anything is missing? Below attached ...

  • 1091 Views
  • 6 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Chandana_Ramesh,The FileNotFoundError you are seeing means the analyzer.exe binary is not present on disk at the expected path:C:\Users\chandana.r\.databricks\labs\lakebridge\state\venv\lib\site-packages\databricks\labs\bladespector\Analyzer\Wind...

  • 1 kudos
5 More Replies
Malthe
by Valued Contributor II
  • 838 Views
  • 4 replies
  • 0 kudos

Intermittent task execution issues

We're getting intermittent errors:[ISOLATION_STARTUP_FAILURE.SANDBOX_STARTUP] Failed to start isolated execution environment. Sandbox startup failed. Exception class: INTERNAL. Exception message: INTERNAL: LaunchSandboxRequest create failed - Error e...

  • 838 Views
  • 4 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Malthe,The ISOLATION_STARTUP_FAILURE.SANDBOX_STARTUP error you are seeing is a transient infrastructure-level issue where the serverless execution sandbox fails its internal liveness check before your code even starts running. Since the error mes...

  • 0 kudos
3 More Replies
Punit_Prajapati
by Databricks Partner
  • 2361 Views
  • 3 replies
  • 1 kudos

Long-lived authentication for Databricks Apps / FastAPI when using Service Principal (IoT use case)

Hi Community,I’m working with Databricks Apps (FastAPI) and invoking the API from external IoT devices.Currently, the recommended approach is to authenticate using a Bearer token generated via a Databricks Apps Service Principal (Client ID + Client S...

Punit_Prajapati_1-1767935971292.png
  • 2361 Views
  • 3 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @Punit_Prajapati,This is a common architectural pattern for IoT scenarios, and there are a few approaches to consider depending on exactly how your IoT devices are calling the Databricks App.UNDERSTANDING THE ARCHITECTUREFirst, it helps to clarify...

  • 1 kudos
2 More Replies
Kyu-007
by New Contributor II
  • 999 Views
  • 6 replies
  • 1 kudos

ai_query() Failing in DLT Serverless Pipelines During SETTING_UP_TABLES Phase

I am getting an error message:org.apache.spark.SparkException: [INTERNAL_ERROR] Cannot get workspace url and credential from data plane.Workspace URL: Some(url here ) SQLSTATE: XX000. When I am trying to run the ai_query command to access my personal...

  • 999 Views
  • 6 replies
  • 1 kudos
Latest Reply
Kyu-007
New Contributor II
  • 1 kudos

Hi Steve Thank you for the reply. I worked again after a couple of days (without any code change), I suspect it had something to do with the runtime serverless compute rollout. All is resolved and I will be much more cognizant of similar issues shoul...

  • 1 kudos
5 More Replies
lw2
by New Contributor
  • 847 Views
  • 1 replies
  • 0 kudos

Read sqlite file from s3 bucket into databricks, creating delta tables

I have a sqlite database that I want to read into databricks to create delta tables/dataframes in Python that I can export to power BI and have a live connection. When there is new data added to my sqlite data base, the changes will need to reflect i...

  • 847 Views
  • 1 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @lw2, The approach you have been using (copy SQLite to local, read with sqlite3, export to CSV, then manually create tables) works as a one-shot load but, as you noticed, it does not give you an easy path to keep things in sync. Below is a streaml...

  • 0 kudos
Charansai
by New Contributor III
  • 920 Views
  • 3 replies
  • 0 kudos

Notebooks Not Deploying in Development Mode Using Databricks Asset Bundles (Deploying from Workspace

Hi everyone,I’m using Databricks Asset Bundles and running into an issue when deploying to my dev environment in development mode. Even though my bundle includes sync paths and notebook directories, the deployment only creates the .databricks/artifac...

  • 920 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Charansai, The behavior you are seeing is actually expected when deploying a bundle from the Databricks Workspace Editor. Here is why. SOURCE-LINKED DEPLOYMENT When you deploy a bundle from within the workspace (as opposed to using the Databricks...

  • 0 kudos
2 More Replies
dtb_usr
by New Contributor III
  • 1467 Views
  • 10 replies
  • 1 kudos

Resolved! SELECT Permission error when reading materialised views associated with a pipeline

I am having to pass ownership of pipelines to users for them to read materialised views associated with any pipeline otherwise they get a 'User does not have SELECT on table...' error. This is obviously bonkers as any pipeline can only have one owner...

  • 1467 Views
  • 10 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @dtb_usr, Based on the error message and the fact that the query works in the SQL Editor (which uses a SQL warehouse) but fails on a personal/dedicated cluster in notebooks, this is almost certainly a compute access mode issue rather than a Unity ...

  • 1 kudos
9 More Replies
Seunghyun
by Contributor
  • 1704 Views
  • 2 replies
  • 0 kudos

Resolved! Deploy dashboard with asset bundle

Hello, I have some questions regarding dashboard development using Asset Bundles.I have been following the procedure for developing dashboards by referring to this page: Databricks CI/CD for Dashboard Developers.Here is the workflow I followed:Create...

  • 1704 Views
  • 2 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Seunghyun, This is a common workflow question when getting started with AI/BI Dashboard deployment through Databricks Asset Bundles. Here is a walkthrough of the recommended approach to maintain a single dashboard and handle ongoing modifications...

  • 0 kudos
1 More Replies
lw2
by New Contributor
  • 796 Views
  • 3 replies
  • 0 kudos

Read Sqlite file in to create delta table/dataframe with live connection

I have a sqlite database that I want to read into databricks to create delta tables/dataframes in Python that I can export to power BI and have a live connection. When there is new data added to my sqlite data base, the changes will need to reflect i...

  • 796 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @lw2, The key to getting a "live connection" end-to-end is replacing the manual CSV export with a scheduled pipeline that writes directly to Delta tables, then connecting Power BI to those Delta tables via DirectQuery. Here is a complete approach....

  • 0 kudos
2 More Replies
ChristianRRL
by Honored Contributor
  • 1587 Views
  • 4 replies
  • 0 kudos

Asset Bundles Overriding Existing Jobs (despite different name_prefix)

Hi there, I'm seeing what seems to be unexpected behavior on databricks asset bundle deployment and I'm hoping I can get clarification on this.Basically, what I'm trying to do is to deploy the same asset bundle twice (two different variations), with ...

ChristianRRL_1-1769815680547.png ChristianRRL_0-1769815600150.png
  • 1587 Views
  • 4 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @ChristianRRL, This behavior comes down to how Databricks Asset Bundles track deployed resources using Terraform state, and specifically where that state is stored locally. HOW BUNDLE STATE TRACKING WORKS When you run "databricks bundle deploy", t...

  • 0 kudos
3 More Replies
RutujaKadam
by New Contributor II
  • 621 Views
  • 2 replies
  • 1 kudos

Getting Error when connecting azure databricks to azure sql server using lakeflow connect

Hi, Can anyone please let me know how to resolve this error . I am trying to connect azure sql server to azure databricks using lakeflow connect data ingestion. I am able to create the connection but afterwards it gives me error as :Error starting ga...

  • 621 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @RutujaKadam, The error you are seeing, "Error starting gateway compute resources" with a message about VM quota exhaustion, is related to your Azure subscription's vCPU quota rather than a misconfiguration in Databricks itself. Here is what is ha...

  • 1 kudos
1 More Replies
bts136
by Databricks Partner
  • 3192 Views
  • 2 replies
  • 1 kudos

Reading Excel files with Spark returns formula values instead of computed values

Hi,I'm seeing inconsistent behavior when reading Excel files using the built-in connector Lakeflow Connector with spark.read.format("excel") (doc: https://docs.databricks.com/aws/en/query/formats/excel). I read an .xlsx file from S3 using this functi...

  • 3192 Views
  • 2 replies
  • 1 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 1 kudos

Hi @bts136, This behavior is related to how Excel files store formula results internally, and it is something you can work around. BACKGROUND: HOW EXCEL STORES FORMULAS Excel files (.xlsx) store both the formula text and a cached computed result for ...

  • 1 kudos
1 More Replies
Labels