cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

minhhung0507
by Contributor III
  • 288 Views
  • 14 replies
  • 3 kudos

API for Restarting Individual Failed Tasks within a Job?

Hi everyone,I'm exploring ways to streamline my workflow in Databricks and could really use some expert advice. In my current setup, I have a job (named job_silver) with multiple tasks (e.g., task 1, task 2, task 3). When one of these tasks fails—say...

  • 288 Views
  • 14 replies
  • 3 kudos
Latest Reply
RiyazAli
Valued Contributor III
  • 3 kudos

Hey @minhhung0507 - quick question - what is the cluster type you're using to run your workflow?I'm using a shared, interactive cluster, so I'm passing the parameter {'existing_cluster_id' : task['existing_cluster_id']}in the payload. This parameter ...

  • 3 kudos
13 More Replies
GregTyndall
by New Contributor II
  • 656 Views
  • 4 replies
  • 0 kudos

Resolved! Materialized View Refresh - NUM_JOINS_THRESHOLD_EXCEEDED?

I have a very basic view with 3 inner joins that will only do a full refresh. Is there a limit to the number of joins you can have and still get an incremental refresh?"incrementalization_issues": [{"issue_type": "INCREMENTAL_PLAN_REJECTED_BY_COST_MO...

  • 656 Views
  • 4 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor III
  • 0 kudos

Hey @TheSmike In the DLT Pipeline's top right cornor, you can click on settings and scroll down to Advanced and click on Add Configuration and give the key as `pipelines.enzyme.numberOfJoinsThreshold` and value as 5.Hope this helps.

  • 0 kudos
3 More Replies
Malthe
by New Contributor II
  • 47 Views
  • 1 replies
  • 0 kudos

Parametrize DLT pipeline

If I'm using Databricks Asset Bundles, how would I parametrize a DLT pipeline based on a static configuration file.In pseudo-code, I would have a .py-file:import dlt # Something that pulls a pipeline resource (or artifact) and parses from JSON table...

  • 47 Views
  • 1 replies
  • 0 kudos
Latest Reply
Emmitt18Lefebvr
New Contributor
  • 0 kudos

Hello!To parametrize a Databricks DLT pipeline with a static configuration file using Asset Bundles, include your JSON/YAML config file in the bundle. In your DLT pipeline code, read this file using Python's file I/O (referencing its deployed path). ...

  • 0 kudos
yashojha1995
by New Contributor
  • 145 Views
  • 1 replies
  • 0 kudos

Error while running update statement using delta lake linked service through ADF

Hi All, I am getting the below error while running an update query in a lookup activity using the delta lake linked service:ErrorCode=AzureDatabricksCommandError,Hit an error when running the command in Azure Databricks. Error details: <span class='a...

  • 145 Views
  • 1 replies
  • 0 kudos
Latest Reply
RiyazAli
Valued Contributor III
  • 0 kudos

Hi @yashojha1995 EOL while scanning string literal hints that there might be a syntax error in the update query.could you share your update query here, and any other info such as how are you creating a Linked service to your delta lake? Does it mean ...

  • 0 kudos
Dharinip
by New Contributor III
  • 2164 Views
  • 5 replies
  • 3 kudos

Resolved! How to decide on creating views vs Tables in Gold layer?

We have the following use case:We receive raw form of data from an application and that is ingested in the Iron Layer. The raw data is in the JSON FormatThe Bronze layer will the first level of transformation. The flattening of the JSON file happens ...

  • 2164 Views
  • 5 replies
  • 3 kudos
Latest Reply
artus2050189155
New Contributor
  • 3 kudos

The whole medallion architecture is unnecesarily complex.   Bronze, Silver, Gold.  Some places I have seen people do -  RAW , Trusted RAW , Silver, Trusted Silver, Gold

  • 3 kudos
4 More Replies
HoussemBL
by New Contributor III
  • 236 Views
  • 4 replies
  • 1 kudos

DLT Pipeline & Automatic Liquid Clustering Syntax

Hi everyone,I noticed Databricks recently released the automatic liquid clustering feature, which looks very promising. I'm currently implementing a DLT pipeline and would like to leverage this new functionality.However, I'm having trouble figuring o...

  • 236 Views
  • 4 replies
  • 1 kudos
Latest Reply
RiyazAli
Valued Contributor III
  • 1 kudos

Hey @HoussemBL You're correct about DLT not support Auto LC. You can assign any columns in the cluster_by but if you set it to auto, it will throw an error complaining about auto not being present in the list of columns.Maybe, altering thee table to ...

  • 1 kudos
3 More Replies
minhhung0507
by Contributor III
  • 80 Views
  • 1 replies
  • 0 kudos

Handling Hanging Pipelines in Real-Time Environments: Leveraging Databricks’ Idle Event Monitoring

Hi everyone,I’m running multiple real-time pipelines on Databricks using a single job that submits them via a thread pool. While most pipelines are running smoothly, I’ve noticed that a few of them occasionally get “stuck” or hang for several hours w...

  • 80 Views
  • 1 replies
  • 0 kudos
Latest Reply
-werners-
Esteemed Contributor III
  • 0 kudos

may I ask why you use threadpools?  with jobs you can define multiple tasks which do the same.I'm asking because threadpools and spark resource management can intervene with each other.

  • 0 kudos
Dnirmania
by Contributor
  • 533 Views
  • 4 replies
  • 0 kudos

Read file from AWS S3 using Azure Databricks

Hi TeamI am currently working on a project to read CSV files from an AWS S3 bucket using an Azure Databricks notebook. My ultimate goal is to set up an autoloader in Azure Databricks that reads new files from S3 and loads the data incrementally. Howe...

Dnirmania_0-1744106993274.png
  • 533 Views
  • 4 replies
  • 0 kudos
Latest Reply
Aviral-Bhardwaj
Esteemed Contributor III
  • 0 kudos

no ,it is very easy follow this guide it will work - https://github.com/aviral-bhardwaj/MyPoCs/blob/main/SparkPOC/ETLProjectsAWS-S3toDatabricks.ipynb   

  • 0 kudos
3 More Replies
iskidet_glenny
by New Contributor
  • 94 Views
  • 1 replies
  • 0 kudos

Possibility of creating and running concurrent Job Runs from a single job all parameters driven

Hello Community,I hope everyone is doing well.I’ve been exploring the idea of creating multiple instances of a job which will be jobs runs with different parameter configurations. Has anyone else considered this approach?Imagine a scenario where you ...

  • 94 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
New Contributor II
  • 0 kudos

Hi @iskidet_glenny Yes, running multiple instances of a Databricks job with different parameters is a common and solid approach especially when it comes to backfilling data.So usually, we set up one job and just pass in different parameters each time...

  • 0 kudos
aravind-ey
by New Contributor
  • 612 Views
  • 4 replies
  • 0 kudos

vocareum lab access

Hi I am doing a data engineering course in databricks(Partner labs) and would like to have access to vocareum workspace to practice using the demo sessions.can you please help me to get the access to this workspace?regards,Aravind

  • 612 Views
  • 4 replies
  • 0 kudos
Latest Reply
twnlBO
New Contributor II
  • 0 kudos

Can you please provide links? screenshot? more info? This answer is not specific enough.I'm taking the Data Analysis learning path, there are different demos I'd like to practice and there are no SP Lab environment links as mentioned in the videos.

  • 0 kudos
3 More Replies
Phani1
by Valued Contributor II
  • 3596 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks with Private cloud

Hi Databricks Team,Is it possible for Databricks to offer support for private cloud environments other than Azure, GCP, and AWS? The client intends to utilize Databricks in their own cloud for enhanced security. If this is feasible, what is the proce...

  • 3596 Views
  • 2 replies
  • 0 kudos
Latest Reply
mtatusDHS
New Contributor II
  • 0 kudos

We're looking at Databricks, but would prefer to use a Pure Storage Array to house data, mostly because of the cost of data storage for cloud providers. We're okay using cloud compute, but storage is much more feasible for us with local/private stora...

  • 0 kudos
1 More Replies
chethankumar
by New Contributor III
  • 1247 Views
  • 4 replies
  • 1 kudos

How to execute SQL statement using terraform

Is there a way to execute SQL statements using Terraform I can see it can be possible using API as bellow, https://docs.databricks.com/api/workspace/statementexecution/executestatementbut I want to know is a strength way to run like bellow code provi...

  • 1247 Views
  • 4 replies
  • 1 kudos
Latest Reply
KartikeyaJain
New Contributor II
  • 1 kudos

The official Databricks provider in Terraform only allows you to create SQL queries, not execute them. To actually run queries, you can either:Use the http provider to make API calls to the Databricks REST API to execute SQL queries.Alternatively, if...

  • 1 kudos
3 More Replies
Labels