cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

eric-c
by New Contributor II
  • 261 Views
  • 3 replies
  • 0 kudos

Job tasks failing with error "Failed to fetch SQL file" when file exists

I have a job with anywhere from 500-1000 sql tasks where the sql task is using a sql warehouse instance and running a sql script stored in a warehouse path like /Workspace/folder/file.sql. The sql task will fail with the error:Run failed with error m...

  • 261 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @antgei , Thanks for sharing your experience and the workaround. You raise a valid point -- the platform should ensure task files are fully synced before attempting execution, regardless of API rate limiting on the backend. When a job has hundreds...

  • 0 kudos
2 More Replies
praveenm00
by New Contributor
  • 180 Views
  • 3 replies
  • 0 kudos

How to read and optimize Physical plans in Spark to optimize for TBs and PBs of data workflows

One of the Amazon interviews I attended, which was for a Big data engineer asked me for this particular skill of reading and understanding physical plans in spark to optimize MASSIVE dataloads. But I though spark automatically does all these optimiza...

  • 180 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @praveenm00 ,  good question, and you're right that AQE handles a lot automatically. But understanding physical plans is still worth the investment, especially at TB/PB scale, because AQE works within constraints. It can't fix a bad query s...

  • 0 kudos
2 More Replies
praveenm00
by New Contributor
  • 119 Views
  • 2 replies
  • 2 kudos

Question on cluster sizing as per SLA - No resources in DE certification

How do we optimally size clusters and set configs for any given SLA in production workloads. It would have been great to have a real-life project or implementation to understand in detail. I wish Databricks had a good resource in their certification ...

  • 119 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Greetings @praveenm00 ,  Good question, and honestly a fair callout on the cert — it covers cluster config conceptually but never puts you in front of a real sizing problem and there is a good reason for this - it is hard and depends on many factors....

  • 2 kudos
1 More Replies
maikel
by Contributor II
  • 259 Views
  • 5 replies
  • 2 kudos

Resolved! Do not deploy all notebook to the given environment

Hello Community!what is the best way to avoid deploying some notebooks from the asset bundle to higher environments?Given I have following resources structure:resources/ ├── jobs/ │ ├── notebook_a.yml │ ├── notebook_b.yml ← dev onl...

  • 259 Views
  • 5 replies
  • 2 kudos
Latest Reply
maikel
Contributor II
  • 2 kudos

This is perfect! Thank you very much @Ashwin_DSA !

  • 2 kudos
4 More Replies
xwu
by New Contributor II
  • 213 Views
  • 1 replies
  • 1 kudos

Resolved! Iceberg native table Streaming in databricks

Hi ! I’ve been exploring the new Managed Iceberg tables integration and noticed a potential discrepancy between the documentation and actual behavior regarding streaming/incremental workloads.According to the official limitations, managed Iceberg tab...

xwu_5-1773939524300.png Capture d’écran 2026-03-19 175915.png
  • 213 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @xwu, Given that managed Iceberg and many of its features are still in Public Preview and explicitly "subject to change," you should treat this as a preview or advanced usage, not as a contractually supported workaround. In other words, it is not ...

  • 1 kudos
thackman
by New Contributor III
  • 134 Views
  • 2 replies
  • 1 kudos

Intermittent failure with Python IMPORTS statements after upgrading to DBR18.0

We have a python module (WidgetUtil.py) that sits in the same folder as our notebook. For the past few years we have been using a simple import statement to use it. Starting with DBR18.0 the imports fails intermittently (25% of the time) when running...

imports.png image (1).png TestCode.jpg WorkingRun.jpg
  • 134 Views
  • 2 replies
  • 1 kudos
Latest Reply
thackman
New Contributor III
  • 1 kudos

Thanks for the suggestion Fabricio.  We tried your suggestion of using sys.path.insert and it didn't improve the reliability.  We found that converting some of the modules into notebooks improved reliability a lot. But other python modules we couldn'...

  • 1 kudos
1 More Replies
Saf4Databricks
by Contributor
  • 160 Views
  • 2 replies
  • 0 kudos

Resolved! Why this notebook is returning an error only when called by another notebook?

When I uncomment the last two lines of Called_Notebook.py and run it manually by itself, it correctly returns the output as:Status: SUCCESSCircle area: 50.26544But when I comment out the last two lines of Called_Notebook.py and run it from the Caller...

  • 160 Views
  • 2 replies
  • 0 kudos
Latest Reply
Saf4Databricks
Contributor
  • 0 kudos

Hi @pradeep_singh, your suggestion worked. Thank you for sharing your knowledge. Worth noticing that not including dbutils.notebook.exit(f"{Value to return}") raised the error in the exception block of the function inside the Called_Notebook - and th...

  • 0 kudos
1 More Replies
utkarshamone
by New Contributor III
  • 156 Views
  • 3 replies
  • 0 kudos

Getting driver error for my job when migrating to Unity

I am in the process of migrating our jobs from the legacy hive metastore to Unity. I have modified my existing job to read and write from a different bucket as part of the migration. The only change I have made to my job config is to enable this sett...

  • 156 Views
  • 3 replies
  • 0 kudos
Latest Reply
balajij8
Contributor
  • 0 kudos

You can validate using "SINGLE_USER"

  • 0 kudos
2 More Replies
Jake3
by New Contributor II
  • 91 Views
  • 1 replies
  • 1 kudos

matching sas proc survey means for quantiles in databricks

Hi, I currently have the following code in databricks that i am using to calculate survey estimates and quantiles. I wish to match (or get as close) to SAS results using proc survey means for quantiles as possible (I am able to match proportions fine...

  • 91 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Jake3, I didn’t expect to jump into this thread given my complete lack of SAS knowledge and the rather serious-looking statistics you’re working with. But I decided to treat your post as a chance to see what Genie Code could do with it...  and it...

  • 1 kudos
tsmith-11
by New Contributor
  • 155 Views
  • 1 replies
  • 1 kudos

Resolved! Azure Databricks S3 External Location

Hi,I have recently created a new Azure Databricks account and several workspaces. I am needing to ingest data from an S3 bucket and am trying to follow the documentation detailed here:https://learn.microsoft.com/en-us/azure/databricks/connect/unity-c...

chrome_SAyP6JuECH.png
  • 155 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @tsmith-11, Having checked internally and from the screenshot, this doesn’t look like a configuration issue on your side but rather that the cross‑cloud S3 feature isn’t enabled on your Azure Databricks account/metastore yet. You should see an AWS...

  • 1 kudos
Anandhi-Sekaran
by New Contributor II
  • 116 Views
  • 3 replies
  • 1 kudos

Refresh streaming table error

Refresh streaming table sql succeeds for the first time.The subsequent  refresh statements fails with TABLE_OR_VIEW_NOT_FOUND error.The streaming table is still available in the same catalog and schema

  • 116 Views
  • 3 replies
  • 1 kudos
Latest Reply
Anandhi-Sekaran
New Contributor II
  • 1 kudos

HiHere is the query i use REFRESH STREAMING TABLE `cxxxxx`.`tgt_dev`.`ldp_csv` It is successful when i execute the first time.If i run the same query after 30 min, it throws error TABLE_OR_VIEW_NOT_FOUND  

  • 1 kudos
2 More Replies
Sega2
by New Contributor III
  • 187 Views
  • 2 replies
  • 0 kudos

Resolved! Creating a sync table from a workspace catalog to a project

We have a table in a workspace we would like to sync to a project. And we can fine choose the database project but we cannot see the database in the first section (Destination), see attached file.        

  • 187 Views
  • 2 replies
  • 0 kudos
Latest Reply
sv_databricks
Databricks Employee
  • 0 kudos

Hi @Sega2, Thanks for sharing the screenshot — this helps clarify what's happening. There are two likely reasons why you're not seeing your database in the Destination section: 1. The source table must be in Unity Catalog Synced tables only support ...

  • 0 kudos
1 More Replies
Jpeterson
by New Contributor III
  • 7278 Views
  • 7 replies
  • 4 kudos

Databricks SQL Warehouse, Tableau and spark.driver.maxResultSize error

I'm attempting to create a tableau extract on tableau server with a connection to databricks large sql warehouse. The extract process fails due to spark.driver.maxResultSize error.Using a databricks interactive cluster in the data science & engineer...

  • 7278 Views
  • 7 replies
  • 4 kudos
Latest Reply
IsabellaNelson
New Contributor
  • 4 kudos

This is a common headache, I've definitely hit this wall myself when dealing with large datasets! My go-to is usually optimizing queries to return less data initially, maybe aggregating more in SQL. Have you considered checking your click speed on a ...

  • 4 kudos
6 More Replies
Yousry_Ibrahim
by New Contributor III
  • 2922 Views
  • 9 replies
  • 4 kudos

Resolved! Directories added to the Python sys.path do not always work fine on executors for shared access mod

Let's assume we have a workspace folder containing two Python files.module1 with a simple addition function:def add_numbers(a, b): return a + bmodule2 with a dummy PySpark custom data source:from pyspark.sql.datasource import DataSource, DataSource...

Yousry_Ibrahim_3-1756774969049.png Yousry_Ibrahim_1-1756774189101.png Yousry_Ibrahim_2-1756774813091.png
  • 2922 Views
  • 9 replies
  • 4 kudos
Latest Reply
Yousry_Ibrahim
New Contributor III
  • 4 kudos

Hi all,Thanks for the feedback and proposed ideas.@szymon_dybczak  Your idea of relative imports work when the module is hosted in a child directory to the current running notebook. It does not work if we need to go up one or two directories and navi...

  • 4 kudos
8 More Replies
IM_01
by Contributor
  • 732 Views
  • 18 replies
  • 3 kudos

Resolved! Lakeflow SDP failed with DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG

Hi,A column was deleted on the source table, when I ran LSDP it failed with error DELTA_STREAMING_INCOMPATIBLE_SCHEMA_CHANGE_USE_LOG : Streaming read is not supported on tables with read-incompatible schema changes( e.g: rename or drop or datatype ch...

  • 732 Views
  • 18 replies
  • 3 kudos
Latest Reply
IM_01
Contributor
  • 3 kudos

Thanks @SteveOstrowski  now its working

  • 3 kudos
17 More Replies
Labels