cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

deano2025
by New Contributor II
  • 1997 Views
  • 1 replies
  • 1 kudos

Databricks asset bundles CI/CD design for github actions

We are wanting to use Databricks asset bundles and deploy code changes and tests using github actions. We have seen lots of content online, but nothing concrete on how this is done at scale. So I'm wondering, if we have many changes and therefore man...

Data Engineering
asset bundles
  • 1997 Views
  • 1 replies
  • 1 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 1 kudos

Have you read about following approach before?    Repository Structure Options     1. Monorepo with Multiple Bundles     repo-root/   ├── .github/   │   └── workflows/   │       ├── bundle-ci.yml   │       └── bundle-deploy.yml   ├── bundles/   │   ├...

  • 1 kudos
JanFalta
by New Contributor
  • 508 Views
  • 1 replies
  • 0 kudos

Data Masking

Hi all,I need some help on this masking problem. If you create a view with used masking function based on table.The user reading this view has to have read access to underlying table. So theoretically, he can access unmasked data in the table.I would...

  • 508 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

Are you on Unity catalog?  Databricks has a solution for this through Unity Catalog Column Masking (also called Dynamic Views or Column-Level Security). https://learn.microsoft.com/en-us/azure/databricks/data-governance/unity-catalog/filters-and-mask...

  • 0 kudos
bhawana-pandey
by Databricks Partner
  • 552 Views
  • 1 replies
  • 0 kudos

Looking for reference DABs bundle yaml and resources for Databricks app deployment (FastAPI redirect

Looking for example databricks.yml and bundle resources for deploying a FastAPI Databricks app using DABs from one environment to another. Deployment works but FastAPI redirects to localhost after deployment, though the homepage loads fine. Need refe...

  • 552 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 0 kudos

This is a great place to start: https://apps-cookbook.dev/resources/ Happy to answer specifics as they come after you've reviewed that resource. 

  • 0 kudos
kfoster
by Databricks Partner
  • 7338 Views
  • 8 replies
  • 7 kudos

Azure DevOps Repo - Invalid Git Credentials

I have a Repo in Databricks connected to Azure DevOps Repositories.The repo has been working fine for almost a month, until last week. Now when I try to open the Git settings in Databricks, I am getting "Invalid Git Credentials". Nothing has change...

  • 7338 Views
  • 8 replies
  • 7 kudos
Latest Reply
klaas
New Contributor II
  • 7 kudos

I had a similar problem. I could fix following these steps:in the Azure Devops repository: User Settings -> Personal access tokens  -> + New tokenin Databricks: Settings -> User -> Linked accounts -> Azure Devops (Personal access token)You could also...

  • 7 kudos
7 More Replies
whatever
by New Contributor
  • 1551 Views
  • 1 replies
  • 0 kudos

broken file API and inconsistent behavior

Since there is no way to file a bug, I'll post it here.. Honestly, I haven't seen such a broken and inconsistent API from production system yet in my life..what is worse - this same issue is in 'os' module:And their UI (despite actually showing the f...

whatever_0-1753367689463.png whatever_0-1753368641377.png whatever_1-1753368764667.png
  • 1551 Views
  • 1 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @whatever,  Thanks for sharing this. I will test this and report internally, meanwhile you can also submit a new idea/request/bug using this portal from your end: https://docs.databricks.com/en/resources/ideas.html#create-an-idea-in-the-ideas-port...

  • 0 kudos
Rainier_dw
by Databricks Partner
  • 4142 Views
  • 2 replies
  • 0 kudos

, Help Needed: Obtaining and Applying Blade Bridge License for SSIS-to-DB SQL Conversion

Hello everyone,I’m in the process of using Blade Bridge to convert my SSIS .dtsx packages into Databricks SQL, but I’ve run into a licensing issue and could use some guidance.What I’m doing:Installed Blade Bridge and followed the required folder stru...

  • 4142 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Hi @Rainier_dw @Eric_Kieft ,  https://github.com/databrickslabs/lakebridge/issues/1819 is now tracked under https://github.com/databrickslabs/lakebridge/issues/1836 as an enhancement for product and https://github.com/databrickslabs/lakebridge/pull/1...

  • 0 kudos
1 More Replies
rajanchaturvedi
by New Contributor
  • 2972 Views
  • 2 replies
  • 0 kudos

Executors getting killed while Scaling Spark jobs on GPU using RAPIDS(NVIDIA)

Hi Team , I want to take advantage of Spark Distribution over GPU clusters using RAPID(NVIDIA) , everything is setup 1. The Jar is loaded correctly via Init script , the jar is downloaded and uploaded on volume (workspace is unity enabled) and via In...

rajanchaturvedi_0-1750067083816.png rajanchaturvedi_1-1750067171780.png rajanchaturvedi_2-1750067287042.png
  • 2972 Views
  • 2 replies
  • 0 kudos
Latest Reply
NandiniN
Databricks Employee
  • 0 kudos

Also try to gradually reduce spark.executor.memory You need to allocate less memory to the JVM heap because the GPU needs a large chunk of the node's off-heap (system) memory. The GPU memory is allocated outside the JVM heap. If the heap is too large...

  • 0 kudos
1 More Replies
toproximahk
by New Contributor II
  • 860 Views
  • 4 replies
  • 1 kudos

Inquiry on GraphFrame Library Upgrade Timeline for Databricks Runtime for Machine Learning

Thanks for the Databricks community and maintaining such a valuable platform.I would like to inquire if there is a planned timeline for upgrading the GraphFrame library. We’ve noticed that the latest release on GitHub is v0.9.3, while the Databricks ...

  • 860 Views
  • 4 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Greeting @toproximahk ,  thanks for the kind words and for the detailed pointers.   What’s in Databricks Runtime 17.3 LTS ML today The preinstalled GraphFrames JAR in Databricks Runtime 17.3 LTS for Machine Learning is org.graphframes:graphframes_2.1...

  • 1 kudos
3 More Replies
gudurusreddy99
by New Contributor II
  • 1087 Views
  • 4 replies
  • 1 kudos

Resolved! Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records per batch

Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records from Delta Table for each and every Micro batch.How to overcome this issue to not to read 2 Billion records for every micro batch.Your suggestions and feedback w...

  • 1087 Views
  • 4 replies
  • 1 kudos
Latest Reply
ManojkMohan
Honored Contributor II
  • 1 kudos

@gudurusreddy99  Any update here , did you try the above solutions ?

  • 1 kudos
3 More Replies
Dhruv-22
by Contributor III
  • 807 Views
  • 6 replies
  • 3 kudos

BUG - withColumns in pyspark doesn't handle empty dictionary

Today, while reading a delta load my notebook failed and I wanted to report a bug. The withColumns command does not tolerate an empty dictionary and gives the following error in PySpark.flat_tuple = namedtuple("flat_tuple", ["old_col", "new_col", "lo...

  • 807 Views
  • 6 replies
  • 3 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 3 kudos

Hello @Dhruv-22 ,  I have tested this internally, and this seems to be a bug with the new Serverless env version 4  As a solution, you can try switching the version to 3 as shown bleow and re-run the above code, and it should work. 

  • 3 kudos
5 More Replies
a_user12
by Contributor
  • 515 Views
  • 1 replies
  • 2 kudos

Resolved! Drop Delta Log seems not to be working

 I have a delta table where I set the following propertylogRetentionDuration: "interval 1 days"I was doing some table operations and see in the _delta_log folder files such as00000000000000000000.json 00000000000000000001.json 00000000000000000002.js...

  • 515 Views
  • 1 replies
  • 2 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 2 kudos

Hello @a_user12 ,deltaLogRetentionDuration is the interval after which the delta log files will be removed from the delta log. Delta Lake adheres to a set of internal rules to clean up the delta log when the retention duration is exceeded. Setting de...

  • 2 kudos
adhi_databricks
by Contributor
  • 4002 Views
  • 1 replies
  • 1 kudos

Size of output data increased 4 times average size.

Hey guys,We have a databricks job, which dumps data in S3 at daily level, and average size of file would be 60GB and file format is ORC, one inner join operation was taking more than 3hrs , when debugged the join was not auto-broadcasted and it was d...

  • 4002 Views
  • 1 replies
  • 1 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 1 kudos

Hey @adhi_databricks , I did some digging and have come up with some helpful tips.   The significant increase in file size from 60GB to 200GB after implementing broadcast join, despite having identical data, is most likely caused by poor compression ...

  • 1 kudos
crami
by New Contributor III
  • 445 Views
  • 2 replies
  • 1 kudos

Quota Limit Exhausted Error when Creating declarative pipeline

I am trying to develop a declarative pipeline. As per platform policy, I cannot use serverless, reason, I am using asset bundle to create declarative pipeline. In the bundle, I am trying to specify compute for the pipeline. However, I am constantly f...

crami_1-1761925275134.png crami_0-1761925248664.png crami_2-1761925397717.png
  • 445 Views
  • 2 replies
  • 1 kudos
Latest Reply
Khaja_Zaffer
Esteemed Contributor
  • 1 kudos

Hello @crami Good day!!As the error tells. you need to increase the VM size, i know you have enough things in your place but spot fallback + Photon + autoscale triggers the failure.   Go to Azure Portal → Subscriptions → Usage + quotasFilter: Provide...

  • 1 kudos
1 More Replies
Nidhig
by Databricks Partner
  • 1152 Views
  • 2 replies
  • 1 kudos

Resolved! Conversational Agent App integration with genie in Databricks

Hi,I have recently explore the feature of conversational agent app from marketplace integration with Genie Space.The connection setup went well but I could find sync issue between the app and genie space. Even after multiple deployment I couldn't see...

  • 1152 Views
  • 2 replies
  • 1 kudos
Latest Reply
HariSankar
Contributor III
  • 1 kudos

Hi @Nidhig,This isn’t expected behavior,it usually happens when the app's service principal lacks permissions to access the SQL warehouse, Genie Space, or underlying Unity Catalog tables.Try these fixes:--> SQL Warehouse: Go to Compute -> SQL Warehou...

  • 1 kudos
1 More Replies
Dhruv-22
by Contributor III
  • 622 Views
  • 1 replies
  • 2 kudos

Resolved! Reading empty json file in serverless gives error

I have a pipeline which puts json files in a storage location after reading a daily delta load. Today I encountered a case where the file as empty. I tried running the notebook manually using serverless cluster (Environment version 4) and encountered...

  • 622 Views
  • 1 replies
  • 2 kudos
Latest Reply
K_Anudeep
Databricks Employee
  • 2 kudos

Solution provided here:  https://community.databricks.com/t5/data-engineering/reading-empty-json-file-in-serverless-gives-error/m-p/137022#M50682

  • 2 kudos
Labels