cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ShankarM
by Contributor
  • 337 Views
  • 2 replies
  • 0 kudos

Notebook exposure

i have created a notebook as per client requirement. I have to migrate the notebook in the client env for testing with live data but do not want to expose the Databricks notebook code to the testers in the client env.Is there a way to package the not...

  • 337 Views
  • 2 replies
  • 0 kudos
Latest Reply
WiliamRosa
Contributor III
  • 0 kudos

Hi @ShankarM,I’ve had to do something similar—packaging a Python class as a wheel. This documentation might help: https://docs.databricks.com/aws/en/dev-tools/bundles/python-wheel

  • 0 kudos
1 More Replies
DatabricksEngi1
by Contributor
  • 639 Views
  • 2 replies
  • 1 kudos

Resolved! databricks assets bundles issue

Hii all,I’m working with Databricks Asset Bundles (DAB) and trying to move from a single repository-level bundle to a structure where each workflow (folder under resources/jobs) has its own bundle.• My repository contains:• Shared src/variables.yml a...

  • 639 Views
  • 2 replies
  • 1 kudos
Latest Reply
DatabricksEngi1
Contributor
  • 1 kudos

I solved it.For some reason, the Terraform folder created under the bundles wasn’t set up correctly.I copied it from a working bundle, and everything completed successfully.

  • 1 kudos
1 More Replies
JPNP
by New Contributor
  • 453 Views
  • 3 replies
  • 1 kudos

Not able to creare Secret scope in Azure databricks

Hello,I am trying to create the  Azure Key Vault-backed secret scope, but it failing with the below error, I have tried to clear the cache, and logged out , used incognito browser as well but not able to create a scope. Can you please help here ? 

JPNP_0-1755692310711.jpeg
  • 453 Views
  • 3 replies
  • 1 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 1 kudos

If the UI keeps failing with that vague error, the CLI approach suggested above is the best next step, since it usually gives a clearer error message. Also make sure that:The service principal you’re using to create the scope has Key Vault Administra...

  • 1 kudos
2 More Replies
jar
by Contributor
  • 265 Views
  • 1 replies
  • 0 kudos

Excluding job update from DAB .yml deployment

Hi.We have a range of scheduled jobs and _one_ continuous job all defined in .yml and deployed with DAB. The continuous job is paused per default and we use a scheduled job of a notebook to pause and unpause it so that it only runs during business ho...

  • 265 Views
  • 1 replies
  • 0 kudos
Latest Reply
Yogesh_Verma_
Contributor
  • 0 kudos

You’re running into this because DAB treats the YAML definition as the source of truth — so every time you redeploy, it will reset the job state (including the paused/running status) back to what’s defined in the file. Unfortunately, there isn’t curr...

  • 0 kudos
karthik_p
by Esteemed Contributor
  • 15329 Views
  • 5 replies
  • 1 kudos

does delta live tables supports identity columns

we are able to test identity columns using sql/python, but when we are trying same using DLT, we are not seeing values under identity column. it is always empty for coloumn we created "id BIGINT GENERATED ALWAYS AS IDENTITY" 

  • 15329 Views
  • 5 replies
  • 1 kudos
Latest Reply
Gowrish
New Contributor II
  • 1 kudos

Hi,i see from the following databricks documentaion - https://docs.databricks.com/aws/en/dlt/limitationsit states the following which kind of giving an impression that you can define identity column to a steaming table Identity columns might be recom...

  • 1 kudos
4 More Replies
mtreigelman
by New Contributor III
  • 320 Views
  • 1 replies
  • 3 kudos

First Lakeflow (DLT) Pipeline Best Practice Question

Hi, I am writing my first streaming pipeline and trying to ensure it is setup to work as a "Lakeflow" pipeline.  It is connecting an external Oracle database with some external Azure Blob storage data (all managed in the same Unity Catalog). The pipe...

  • 320 Views
  • 1 replies
  • 3 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 3 kudos

@mtreigelmanthanks for providing the update. If you wouldn't mind, could you explain why you think the first way didn't work and why the second way did? Then you can mark your response as a solution to the question .I found this article to be useful ...

  • 3 kudos
ck7007
by Contributor
  • 561 Views
  • 1 replies
  • 2 kudos

Cost

Reduced Monthly Databricks Bill from $47K to $12.7KThe Problem: We were scanning 2.3TB for queries needing only 8GB of data.Three Quick Wins1. Multi-dimensional Partitioning (30% savings)# Beforedf.write.partitionBy("date").parquet(path)# After-parti...

  • 561 Views
  • 1 replies
  • 2 kudos
Latest Reply
BS_THE_ANALYST
Esteemed Contributor III
  • 2 kudos

@ck7007 thanks so much for sharing! That's such a saving, by the way. Congrats.Out of curiosity, did you consider using Liquid Clustering which was meant to replace partitioning and z-order: https://docs.databricks.com/aws/en/delta/clustering I found...

  • 2 kudos
AbhishekNakka15
by New Contributor II
  • 442 Views
  • 1 replies
  • 1 kudos

Resolved! Unable to login to partner account

When I try to login with my office email to the partner acccount. It says, The service is currently unavailable. Please try again later. It says "You are not authorized to access https://partner-academy.databricks.com. Please select a platform you ca...

  • 442 Views
  • 1 replies
  • 1 kudos
Latest Reply
Advika
Databricks Employee
  • 1 kudos

Hello @AbhishekNakka15! Please raise a ticket with the Databricks Support Team, and include your email address so they can review your account and provide further assistance.

  • 1 kudos
viralpatel
by New Contributor II
  • 621 Views
  • 2 replies
  • 1 kudos

Lakebridge Synapse Conversion to DBX and Custom transpiler

I have 2 questions about Lakebridge solution,Synapse with dedicated pool ConversionWe were conducting a PoC for Synapse to DBX migration using Lakebridge. What we have observed is that the conversions are not correct. I was anticipating all tables wi...

  • 621 Views
  • 2 replies
  • 1 kudos
Latest Reply
yourssanjeev
New Contributor II
  • 1 kudos

We are also checking on this use case but got it confirmed from Databricks that it does not work for this use case yet, not sure whether it is in their roadmap

  • 1 kudos
1 More Replies
vishalv4476
by New Contributor III
  • 310 Views
  • 1 replies
  • 0 kudos

Databricks job runs failures Py4JJavaError: An error occurred while calling o404.sql. : java.util.No

Hi ,We had a successful running pipeline but it started failing since 20th august , no change were published. Can you please guide me resolve this issue.I've tried increasing delta.deletedFileRetentionDuration' = 'interval 365 days' but it didn't hel...

  • 310 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor
  • 0 kudos

Hi @vishalv4476 ,The error is likely due to a corrupted Delta transaction log or files deleted manually/outside of Delta. Check the table history and verify that no user or automated process removed data files. If issues are found, restore the table ...

  • 0 kudos
anazen13
by New Contributor III
  • 805 Views
  • 9 replies
  • 2 kudos

databricks api to create a serverless job

I am trying to follow your documentation on how to create serverless job via API https://docs.databricks.com/api/workspace/jobs/create#environments-spec-environment_version So i see that sending the json request resulted for me to see serverless clus...

  • 805 Views
  • 9 replies
  • 2 kudos
Latest Reply
siennafaleiro
New Contributor II
  • 2 kudos

It looks like you’re hitting one of the current limitations of Databricks serverless jobs. Even though the API supports passing an environments object, only certain fields are honored right now. In particular:The environment_version parameter will de...

  • 2 kudos
8 More Replies
zero234
by New Contributor III
  • 6549 Views
  • 3 replies
  • 1 kudos

i have created a materialized view table using delta live table pipeline and its not appending data

i have created a materialized view table using delta live table pipeline , for some reason it is overwriting data every day , i want it to append data to the table instead of doing full refresh suppose i had 8 million records in table and if irun the...

  • 6549 Views
  • 3 replies
  • 1 kudos
Latest Reply
UMAREDDY06
New Contributor II
  • 1 kudos

[expect_table_not_view.no_alternative] 'insert' expects a table but dim_airport_unharmonised is a view can you please help how to reslove this.thanksuma devi

  • 1 kudos
2 More Replies
ManojkMohan
by Honored Contributor II
  • 421 Views
  • 1 replies
  • 2 kudos

Best practices : Silver Layer to Salesforce

Need community view to evaluate my solution based best practice                                                                         Problem i am solving is reading match data from a CSV, this was uploaded into a volume , then i  clean and transfo...

Data Engineering
Bestpractice
  • 421 Views
  • 1 replies
  • 2 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 2 kudos

- skip the pandas conversion- persist the transformed data in a databricks table and then write to salesforce.

  • 2 kudos
seefoods
by Valued Contributor
  • 1008 Views
  • 10 replies
  • 4 kudos

Resolved! sync delta table to Nosql

Hello Guys,Whats is best way to build sync process which sync data for two engine database like delta table and Nosql table ( Mongo) ?Thanx Cordially, 

  • 1008 Views
  • 10 replies
  • 4 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 4 kudos

The other option I can think of is change streams. Here is a blogpost on it.https://contact-rajeshvinayagam.medium.com/mongodb-changestream-spark-delta-table-an-alliance-a70962133b95 

  • 4 kudos
9 More Replies
collierd
by New Contributor III
  • 898 Views
  • 7 replies
  • 5 kudos

Resolved! timestamp date filter does not work

HelloI have a column called LastUpdated defined as timestampIf I select from the table it displays as (e.g.) 2025-08-27T10:50:31.610+00:00How do I filter on this without having to be specific with the year, month, day, ... This does not work:select *...

  • 898 Views
  • 7 replies
  • 5 kudos
Latest Reply
Pilsner
Contributor III
  • 5 kudos

Hello @collierd ,The way I would tackle this would involve data time specifiers. Because your value is likely stored as a timestamp which you can see via the catalog explorer, you cannot compare it to a string value such as "2025-08-27T10:50:31.610+0...

  • 5 kudos
6 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels