cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

niruban
by New Contributor II
  • 3630 Views
  • 3 replies
  • 0 kudos

Databricks Asset Bundle to deploy only one workflow

Hello Community -I am trying to deploy only one workflow from my CICD. But whenever I am trying to deploy one workflow using "databricks bundle deploy - prod", it is deleting all the existing workflow in the target environment. Is there any option av...

Data Engineering
CICD
DAB
Databricks Asset Bundle
DevOps
  • 3630 Views
  • 3 replies
  • 0 kudos
Latest Reply
nvashisth
New Contributor III
  • 0 kudos

Hi Team, the deployment via DAB(Databricks Asset Bundle) reads all yml files present and based on that workflows are generated. In the previous versions of Databricks CLI prior to 0.236(or latest one), it use to delete all the workflow by making dele...

  • 0 kudos
2 More Replies
cltj
by New Contributor III
  • 13184 Views
  • 5 replies
  • 2 kudos

Experiences using managed tables

We are looking into the use of managed tables on databricks. As this decision won’t be easy to reverse I am reaching out to all of you fine folks to learn more about your experience with using this.If I understand correctly we dont have to deal with ...

  • 13184 Views
  • 5 replies
  • 2 kudos
Latest Reply
JimmyEatBrick
Databricks Employee
  • 2 kudos

Databricks recommends to ALWAYS use Managed Tables always UNLESS:Your tables are not DeltaYou explicitly need to have the table files in a specific location Managed Tables are just better... Databricks manages:the upgrades (Deletion Vectors? Column M...

  • 2 kudos
4 More Replies
Subhrajyoti
by New Contributor
  • 3701 Views
  • 1 replies
  • 0 kudos

Deriving a relation between spark job and underlying code

For one of our requirement, we need to derive a relation between spark job, stage ,task id with the underlying code executed after a workflow job is getting triggered using a job cluster. So far we are able to develop a relation between the Workflow ...

  • 3701 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @Subhrajyoti thanks for your question! I'm not sure if you have tried this already, but by combining listener logs with structured tabular data, you can create a clear mapping between Spark job executions and the corresponding notebook code. You c...

  • 0 kudos
OldManCoder
by New Contributor II
  • 1096 Views
  • 2 replies
  • 2 kudos

Resolved! Should Vacuum Be Tied to Workflows?

I have a process expected to run every two weeks. Throughout the process (~30 notebooks), when I write to a table for the last time in the overall process, I run my vacuum such as below - I'm never running a vac against the same table twice.  I've no...

  • 1096 Views
  • 2 replies
  • 2 kudos
Latest Reply
VZLA
Databricks Employee
  • 2 kudos

Hi @OldManCoder , thanks for your question! 1) Yes, separating cleanup tasks into a dedicated workflow is often more efficient. Here's why: Performance: Vacuum and optimization are resource-intensive operations. Running them inline with your primary ...

  • 2 kudos
1 More Replies
TamD
by Contributor
  • 7258 Views
  • 7 replies
  • 2 kudos

How do I drop a delta live table?

I'm a newbie and I've just done the "Run your first Delta Live Tables pipeline" tutorial.The tutorial downloads a publicly available csv baby names file and creates two new Delta Live tables from it.  Now I want to be a good dev and clean up the reso...

  • 7258 Views
  • 7 replies
  • 2 kudos
Latest Reply
ImranA
Contributor
  • 2 kudos

@gchandra for example a table called "cars", if I remove the table from DLT pipeline and drop the table from catalog. Now if I change the schema of the table, and create the table again using the same table name "cars" through the same pipeline, Why ...

  • 2 kudos
6 More Replies
cosmicwhoop
by New Contributor
  • 743 Views
  • 1 replies
  • 0 kudos

Delta Live Tables UI - missing EVENTS

I am new to Databricks and my setup is using Microsoft Azure (Premium Tier) + DatabricksI am trying to build Delta Live Tables and dont see events, without it i am finding it hard to understand the reason for job failure. Attached are 2 screenshot1) ...

  • 743 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, If you are looking for the reason of job failure you can navigate to view details tab-> logs to figure out the root cause of the failure.The blank screen with no events might be caused as you might have selected one of the DLT table.You can navig...

  • 0 kudos
ChristianRRL
by Valued Contributor III
  • 4609 Views
  • 2 replies
  • 1 kudos

DLT Dedupping Best Practice in Medallion

Hi there, I have what may be a deceptively simple question but I suspect may have a variety of answers:What is the "right" place to handle dedupping using the medallion architecture?In my example, I already have everything properly laid out with data...

  • 4609 Views
  • 2 replies
  • 1 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 1 kudos

1. Deduplication in medallion architecture can be handled in bronze or silver layer.2. If keeping a complete history of all raw data, including duplicates, in the bronze layer, handle deduplication in the silver layer.3. If not keeping a complete his...

  • 1 kudos
1 More Replies
Manzilla
by New Contributor II
  • 4831 Views
  • 2 replies
  • 0 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

  • 4831 Views
  • 2 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

When you use `dlt.apply_changes` to update the silver table, it adds four hidden columns for tracking changes. These columns include `event_time`, `read_version`, `commit_version`, and `is_deleted`. When you run this process for the first time agains...

  • 0 kudos
1 More Replies
MR07
by New Contributor II
  • 4095 Views
  • 1 replies
  • 0 kudos

Optimal Cluster Selection for Continuous Delta Live Tables Pipelines: Bronze and Silver

Hi,I have two Delta Live Tables Pipelines. The first one is the Bronze pipeline, which handles bronze tables. These tables are defined as streaming tables, and this pipeline needs to be executed continuously. The second one is the Silver pipeline, wh...

  • 4095 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sidhant07
Databricks Employee
  • 0 kudos

Hi, The best cluster type can depend on various factors, such as the specific requirements of your pipelines, the volume of data you're processing, and your budget. Therefore, it's always a good idea to test different cluster types and configurations...

  • 0 kudos
bantarobugs
by New Contributor
  • 3941 Views
  • 1 replies
  • 0 kudos

Job Run failure - Azure Container does not exist

Hello,I have an ETL pipeline in Databricks that works perfectly when I execute it manually in the notebook using an all-purpose cluster. However, when I try to schedule it using a job cluster, it fails immediately with the error message: 'Azure conta...

Screenshot 2024-08-28 154926.png
  • 3941 Views
  • 1 replies
  • 0 kudos
Latest Reply
PiotrMi
Contributor
  • 0 kudos

Hey @bantarobugs There might be a problem with the permissions or roles assigned to the user or service principal trying to access the Azure container. Please check who/what is assigned and it role/permission: 

  • 0 kudos
flamezi2
by New Contributor
  • 4430 Views
  • 1 replies
  • 0 kudos

Invalid request when using the Manual generation of an account-level access token

I need to generate access token using REST API and was using the guide seen here:manually-generate-an-account-level-access-tokenWhen i try this cURL in postman, i get an error but the error description is not helpfulError: I don't know what I'm missi...

flamezi2_1-1727934079195.png flamezi2_0-1727934045043.png
  • 4430 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Are you replacing the Account_id with your actual account id associated with your subscription? Also what token are you using to authenticate or run this API call?

  • 0 kudos
GodSpeed
by New Contributor
  • 3972 Views
  • 1 replies
  • 0 kudos

Postman Collection Alternatives for Data-Centric API Management?

I’ve been using Postman collections to manage APIs in my data projects, but I’m exploring alternatives. Are there tools like Apidog or Insomnia that perform better for API management, particularly when working with large data sets or data-driven work...

  • 3972 Views
  • 1 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Insomnia: Insomnia is another strong alternative that is frequently recommended. It is known for its simplicity and effectiveness in making REST API requests. Insomnia supports the import of Postman collections and is praised for its performance and ...

  • 0 kudos
AcrobaticMonkey
by New Contributor II
  • 1130 Views
  • 2 replies
  • 0 kudos

Alerts for Failed Queries in Databricks

How can we set up automated alerts to notify us when queries executed by a specific service principal fail in Databricks?

  • 1130 Views
  • 2 replies
  • 0 kudos
Latest Reply
AcrobaticMonkey
New Contributor II
  • 0 kudos

@Alberto_UmanaOur service principal uses the SQL Statement API to execute queries. We want to receive notifications for each query failure. While SQL Alerts are an option, they do not provide immediate responses. Is there a better solution to achieve...

  • 0 kudos
1 More Replies
Balram-snaplogi
by New Contributor II
  • 1370 Views
  • 1 replies
  • 1 kudos

How can we customize the access token expiry duration?

Hi,I am using OAuth machine-to-machine (M2M) authentication. I created a service principal and wrote a Java application that allows me to connect to the Databricks warehouse. My question is regarding the code below:String url = "jdbc:databricks://<se...

  • 1370 Views
  • 1 replies
  • 1 kudos
Latest Reply
Walter_C
Databricks Employee
  • 1 kudos

I would say that your token should be manually refreshed as mentioned in the following statement in docs: Databricks tools and SDKs that implement the Databricks client unified authentication standard will automatically generate, refresh, and use Dat...

  • 1 kudos
hk-modi
by New Contributor
  • 698 Views
  • 1 replies
  • 0 kudos

How to increase autoloader speed while working with s3 (AWS)

Hey everyone! I am trying to switch from a batch processing job to a autoloader (directory listing mode) on my s3 bucket that has millions of files. I am using modifiedAfter to create my initial checkpoint for the autoloader and want to speed up the ...

  • 698 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

What mode are you using? File notification?

  • 0 kudos
Labels