cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Adrianj
by New Contributor III
  • 27412 Views
  • 21 replies
  • 13 kudos

Databricks Bundles - How to select which jobs resources to deploy per target?

Hello, My team and I are experimenting with bundles, we follow the pattern of having one main file Databricks.yml and each job definition specified in a separate yaml for modularization. We wonder if it is possible to select from the main Databricks....

  • 27412 Views
  • 21 replies
  • 13 kudos
Latest Reply
IM_01
Contributor III
  • 13 kudos

One solution can be you can create separate databricks.yml file for each target such asqa/databricks.ymlprod/databricks.ymlqa,prod are folders named after target environmentHope this helps..

  • 13 kudos
20 More Replies
Annie420
by Databricks Partner
  • 1176 Views
  • 3 replies
  • 2 kudos

Resolved! Workspace folder is visible but .py file cannot be read on job cluster (DBR 18)

Hi everyone,We are running into a strange issue when running notebooks on Databricks job clusters using DBR 18. It looks like the Workspace folder is mounted, but the .py file inside cannot be read immediately. I wanted to check if anyone else has ex...

  • 1176 Views
  • 3 replies
  • 2 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 2 kudos

+1 to @pradeep_singh  The Workspace FUSE (WSFS) daemons use ports 1015, 1017, and 1021 for communication between the driver and the executor. NFS tooling (hardcoded in glibc) can race with these ports during cluster startup, causing FUSE daemons to f...

  • 2 kudos
2 More Replies
DB1To3
by New Contributor III
  • 1515 Views
  • 6 replies
  • 5 kudos

Resolved! Apache "Spark Connect"

Can someone confirm if this is the right message board for discussing the opensource Apache core of "Spark Connect". (aka databricks connect)We are hosting workloads on Azure Databricks, but would like to ensure that these workloads are following the...

  • 1515 Views
  • 6 replies
  • 5 kudos
Latest Reply
DB1To3
New Contributor III
  • 5 kudos

>> there is no native R UDF pathway over the wire. sparklyr works around this using rpy2, a Python library that embeds and executes R codeThis is interesting.  I would not think of python as the best runtime for bridging.  I'm wondering if this invol...

  • 5 kudos
5 More Replies
dpc
by Contributor III
  • 905 Views
  • 3 replies
  • 4 kudos

Resolved! Allowing a job parameter that is pushed down to be overridden

HelloI have a Job that calls a job which calls a job (this could go on)I want to generate an id for each job and log that id along with a parent id and job nameSo, I am creating an id at each level as the first taskThen passing this to the next level...

  • 905 Views
  • 3 replies
  • 4 kudos
Latest Reply
stbjelcevic
Databricks Employee
  • 4 kudos

+1 to @jooguilhermesc Option 1 response above.

  • 4 kudos
2 More Replies
jessicaY26
by New Contributor
  • 460 Views
  • 1 replies
  • 1 kudos

Unable To Complete Lab Due To Missing Lab Data

I’m a Databricks Academy Labs Subscription customer currently working through the self-paced learning plan “Deploy Workloads with Lakeflow Jobs” in preparation for the Databricks DE Associate certification. However, I have encountered an unexpected t...

  • 460 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @jessicaY26, Sorry you're experiencing an issue with the labs. Unfortunately, this is not something the community can fix. I recommend submitting a support ticket. Visit the Databricks Training & Certification page, scroll to the Resources section...

  • 1 kudos
kenmyers-8451
by Contributor II
  • 2133 Views
  • 3 replies
  • 2 kudos

should have the option to mark succeeded with failures as a failure rather than a success

Hi we are having an issue with the way succeeded with failures is handled. We will get emails telling us that we have a failure, which is correct, but then the pipeline actually treats it like a success and keeps going, but actually we would like to ...

  • 2133 Views
  • 3 replies
  • 2 kudos
Latest Reply
Anish_2
New Contributor III
  • 2 kudos

@Advika  Currently, if we put run_if: ALL_DONE on task level, it is flagged as succeededwithfailures. But there is another workflow under which this workflow runs. In master workflow, it is marked as succeeded only not succeededwithfailures.My orches...

  • 2 kudos
2 More Replies
gkapri
by New Contributor II
  • 3426 Views
  • 15 replies
  • 0 kudos

DLT table reading not performing file pruning on partition column

I have created bronze table and partitioned on processing date which is date column. In silver table i am putting filter on basis of processing date column to read last 2 days data but it is reading 37 million data but i have only 24722 in last 2 day...

gkapri_0-1770221522007.png
  • 3426 Views
  • 15 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @Anish_2, Looking at your pipeline DAG, the issue is that you have two separate APPLY CHANGES INTO flows both targeting the same silver table (ag_vlc_hist), one from ag_swt_vlchistory_historical and one from ag_swt_vlchistory. When you define mult...

  • 0 kudos
14 More Replies
eric-c
by New Contributor II
  • 809 Views
  • 3 replies
  • 0 kudos

Job tasks failing with error "Failed to fetch SQL file" when file exists

I have a job with anywhere from 500-1000 sql tasks where the sql task is using a sql warehouse instance and running a sql script stored in a warehouse path like /Workspace/folder/file.sql. The sql task will fail with the error:Run failed with error m...

  • 809 Views
  • 3 replies
  • 0 kudos
Latest Reply
SteveOstrowski
Databricks Employee
  • 0 kudos

Hi @antgei , Thanks for sharing your experience and the workaround. You raise a valid point -- the platform should ensure task files are fully synced before attempting execution, regardless of API rate limiting on the backend. When a job has hundreds...

  • 0 kudos
2 More Replies
praveenm00
by Databricks Partner
  • 2118 Views
  • 3 replies
  • 0 kudos

How to read and optimize Physical plans in Spark to optimize for TBs and PBs of data workflows

One of the Amazon interviews I attended, which was for a Big data engineer asked me for this particular skill of reading and understanding physical plans in spark to optimize MASSIVE dataloads. But I though spark automatically does all these optimiza...

  • 2118 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Greetings @praveenm00 ,  good question, and you're right that AQE handles a lot automatically. But understanding physical plans is still worth the investment, especially at TB/PB scale, because AQE works within constraints. It can't fix a bad query s...

  • 0 kudos
2 More Replies
praveenm00
by Databricks Partner
  • 859 Views
  • 2 replies
  • 3 kudos

Resolved! Question on cluster sizing as per SLA - No resources in DE certification

How do we optimally size clusters and set configs for any given SLA in production workloads. It would have been great to have a real-life project or implementation to understand in detail. I wish Databricks had a good resource in their certification ...

  • 859 Views
  • 2 replies
  • 3 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 3 kudos

Greetings @praveenm00 ,  Good question, and honestly a fair callout on the cert — it covers cluster config conceptually but never puts you in front of a real sizing problem and there is a good reason for this - it is hard and depends on many factors....

  • 3 kudos
1 More Replies
maikel
by Contributor III
  • 904 Views
  • 5 replies
  • 2 kudos

Resolved! Do not deploy all notebook to the given environment

Hello Community!what is the best way to avoid deploying some notebooks from the asset bundle to higher environments?Given I have following resources structure:resources/ ├── jobs/ │ ├── notebook_a.yml │ ├── notebook_b.yml ← dev onl...

  • 904 Views
  • 5 replies
  • 2 kudos
Latest Reply
maikel
Contributor III
  • 2 kudos

This is perfect! Thank you very much @Ashwin_DSA !

  • 2 kudos
4 More Replies
xwu
by Databricks Partner
  • 1257 Views
  • 1 replies
  • 1 kudos

Resolved! Iceberg native table Streaming in databricks

Hi ! I’ve been exploring the new Managed Iceberg tables integration and noticed a potential discrepancy between the documentation and actual behavior regarding streaming/incremental workloads.According to the official limitations, managed Iceberg tab...

xwu_5-1773939524300.png Capture d’écran 2026-03-19 175915.png
  • 1257 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @xwu, Given that managed Iceberg and many of its features are still in Public Preview and explicitly "subject to change," you should treat this as a preview or advanced usage, not as a contractually supported workaround. In other words, it is not ...

  • 1 kudos
Saf4Databricks
by Contributor
  • 613 Views
  • 2 replies
  • 0 kudos

Resolved! Why this notebook is returning an error only when called by another notebook?

When I uncomment the last two lines of Called_Notebook.py and run it manually by itself, it correctly returns the output as:Status: SUCCESSCircle area: 50.26544But when I comment out the last two lines of Called_Notebook.py and run it from the Caller...

  • 613 Views
  • 2 replies
  • 0 kudos
Latest Reply
Saf4Databricks
Contributor
  • 0 kudos

Hi @pradeep_singh, your suggestion worked. Thank you for sharing your knowledge. Worth noticing that not including dbutils.notebook.exit(f"{Value to return}") raised the error in the exception block of the function inside the Called_Notebook - and th...

  • 0 kudos
1 More Replies
utkarshamone
by New Contributor III
  • 604 Views
  • 3 replies
  • 0 kudos

Getting driver error for my job when migrating to Unity

I am in the process of migrating our jobs from the legacy hive metastore to Unity. I have modified my existing job to read and write from a different bucket as part of the migration. The only change I have made to my job config is to enable this sett...

  • 604 Views
  • 3 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can validate using "SINGLE_USER"

  • 0 kudos
2 More Replies
Jake3
by New Contributor III
  • 391 Views
  • 1 replies
  • 1 kudos

matching sas proc survey means for quantiles in databricks

Hi, I currently have the following code in databricks that i am using to calculate survey estimates and quantiles. I wish to match (or get as close) to SAS results using proc survey means for quantiles as possible (I am able to match proportions fine...

  • 391 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @Jake3, I didn’t expect to jump into this thread given my complete lack of SAS knowledge and the rather serious-looking statistics you’re working with. But I decided to treat your post as a chance to see what Genie Code could do with it...  and it...

  • 1 kudos
Labels