cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mkEngineer
by New Contributor III
  • 358 Views
  • 1 replies
  • 1 kudos

Resolved! DLT: "cannot run a cell when connected to pipeline databricks"

Hi,I have several different cells in my notebook that are connected to a DLT pipeline. Why are some cells skipped and others aren't?I get the message "cannot run a cell when connected to the pipeline Databricks" when try running a cell when I'm conne...

  • 358 Views
  • 1 replies
  • 1 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 1 kudos

Hi, @mkEngineer When working with Delta Live Tables (DLT) in Databricks, you cannot run individual cells interactively as you would in a standard Databricks notebook.DLT Pipeline Behavior:Delta Live Tables notebooks are executed as part of a managed ...

  • 1 kudos
issa
by New Contributor III
  • 1609 Views
  • 9 replies
  • 5 kudos

Resolved! How to access bronze dlt in silver dlt

I have a job in Workflows thatt runs two DLT pipelines, one for Bronze_Transaction and on for Silver_Transaction. The reason for two DLT pipelines is because i want the tables to be created in bronze catalog and erp schema, and silver catalog and erp...

Data Engineering
dlt
DLT pipeline
Medallion
Workflows
  • 1609 Views
  • 9 replies
  • 5 kudos
Latest Reply
issa
New Contributor III
  • 5 kudos

Final solution for the Bronze:# Define view as the source@dlt.viewdef Transactions_Bronze_View():    return (        spark.readStream.format("cloudFiles")        .option("cloudFiles.format", "json")        .option("inferSchema", True)        .option(...

  • 5 kudos
8 More Replies
Timmes0815
by New Contributor III
  • 446 Views
  • 3 replies
  • 0 kudos

Resolved! Set up Loacation using widget

I'm struggeling using the databricks widget to set up the location in an sql create table statement. I tried the following to set up the location:Step1: Creating a notebook (Notebook1) to define the variable.Location_Path =   'abfss:xxxxx@xxxx.xxx.ne...

  • 446 Views
  • 3 replies
  • 0 kudos
Latest Reply
Timmes0815
New Contributor III
  • 0 kudos

I finaly solved my problem by using the parameters in python F-Strings:Location_Path = dbutils.widgets.text("Location_Path","") -- create table using widget query = f""" CREATE OR REPLACE TABLE schema.Tabelname1 LOCATION '{Location_Path}' AS SELECT...

  • 0 kudos
2 More Replies
holychs
by New Contributor III
  • 317 Views
  • 1 replies
  • 0 kudos

Running child job under parent job using run_job_task

Hi Community,I am trying to call another job under a workflow job using run_job_task. Currently I am manually providing job_id of the child job. I want to know if there is any way to pass job_name instead of run_id. This will automate the deployment ...

  • 317 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @holychs ,It is possible to do using lookup in Databricks Asset Bundles.You define the job id variable that finds id of the job based on its name and use this variable to specify job_id in the run_job_task. Here is the code: variables: my_job_id...

  • 0 kudos
holychs
by New Contributor III
  • 398 Views
  • 2 replies
  • 0 kudos

Resolved! Concurrent Workflow Jobs

Hi Community, I am trying to run a Databricks workflow job using run_job_task under a for_loop. I have set the concurrent jobs as 2. I can see 2 iteration jobs getting triggered successfully. But both fail with an error:"ConnectException: Connection ...

  • 398 Views
  • 2 replies
  • 0 kudos
Latest Reply
holychs
New Contributor III
  • 0 kudos

It was an internal bug resolved with managing different parameters for each loop jobs.

  • 0 kudos
1 More Replies
JK2021
by New Contributor III
  • 4527 Views
  • 6 replies
  • 3 kudos

Resolved! Exception handling in Databricks

We are planning to customise code on Databricks to call Salesforce bulk API 2.0 to load data from databricks delta table to Salesforce.My question is : All the exception handling, retries and all around Bulk API can be coded explicitly in Data bricks...

  • 4527 Views
  • 6 replies
  • 3 kudos
Latest Reply
Rolx
New Contributor II
  • 3 kudos

Bulk api is working as expected for loading data?

  • 3 kudos
5 More Replies
suryateja405555
by New Contributor III
  • 980 Views
  • 1 replies
  • 1 kudos

Databricks workflow deployment issue

The below one is the data bricks workflow. Note: ETL_schema check is if/else task in databricks workflow)Declaring below taskValues based on some conditions in ETL_data_check notebooks. Based on the below output the next task"ETL_schema_checks" (if/e...

suryateja405555_2-1732082532269.png suryateja405555_3-1732083107096.png
Data Engineering
assetbundles
DAB
  • 980 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Honored Contributor III
  • 1 kudos

Hi @suryateja405555, How are you doing today?As per my understanding, Make sure ETL_data_checks is properly declared in the tasks section of your workflow configuration in the root module. For example, add it with a task_key and its respective proper...

  • 1 kudos
Faisal
by Contributor
  • 2638 Views
  • 2 replies
  • 1 kudos

DLT maintainace clusters

How does maintenance clusters do the cleanup using optimize, zorder and vacuum. I read that it is handled automatically but how does maintenance cluster know which column to optimize, where do we need to specify that info ?

  • 2638 Views
  • 2 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

At this time, Z-order columns must be specified in the asset definition, the property is pipelines.autoOptimize.zOrderCols. This may change in the future with Predictive Optimization.

  • 1 kudos
1 More Replies
Pramod_G
by New Contributor II
  • 429 Views
  • 4 replies
  • 0 kudos

Job Cluster with Continuous Trigger Type: Is Frequent Restart Required?

Hi All,I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The...

Data Engineering
Driver or Cluster Stability Issues
Long-Running Job Challenges
  • 429 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

How are you ingesting the data? Are you using the Delta Live Table mechanism - https://docs.databricks.com/en/delta-live-tables/index.html?

  • 0 kudos
3 More Replies
rgooch_cfa
by New Contributor III
  • 1772 Views
  • 4 replies
  • 1 kudos

Resolved! Override ruff linter settings for notebook cells

How can I override the ruff linter settings for my notebooks?I have various projects/git folders in my workspace, and oftentimes, they represent different teams and thus different sets of code formatting patterns. I would like to override the default...

  • 1772 Views
  • 4 replies
  • 1 kudos
Latest Reply
BigRoux
Databricks Employee
  • 1 kudos

If your `pyproject.toml` file is not being picked up by Ruff in your Databricks notebooks, there are a few potential reasons and solutions to address the issue: Common Causes and Solutions 1. Ruff Version Compatibility:- Ensure you are using a recent...

  • 1 kudos
3 More Replies
lauraxyz
by Contributor
  • 1373 Views
  • 4 replies
  • 1 kudos

Put file into volume within Databricks

Hi!  From a Databricks job, i want to copy a workspace file into volume.  how can i do that?I tried`dbutils.fs.cp("/Workspace/path/to/the/file", "/Volumes/path/to/destination")`but got Public DBFS root is disabled. Access is denied on path: /Workspac...

  • 1373 Views
  • 4 replies
  • 1 kudos
Latest Reply
lauraxyz
Contributor
  • 1 kudos

Found the reason!  It's the runtime, it doesn't work on Databricks Runtime Version 15.4 LTS, but started to work after changing to 16.0.   Maybe this is something supported from the latest version?

  • 1 kudos
3 More Replies
GS_S
by New Contributor III
  • 712 Views
  • 7 replies
  • 0 kudos

Resolved! Error during merge operation: 'NoneType' object has no attribute 'collect'

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an...

  • 712 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned  This data filtering is performed behind the scenes using serverless compute. In terms of costs:Customers are charged for ...

  • 0 kudos
6 More Replies
manojpatil04
by New Contributor III
  • 547 Views
  • 5 replies
  • 0 kudos

External dependency on serverless job from Airflow is not working on s3 path and workspace

I am working on use case where we have to run python script from serverless job through Airflow. when we are trying to trigger serverless job and passing external dependency as wheel from s3 path or workspace path it is not working, but on volume it ...

  • 547 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As per serverless compute limitations I can see the following:  Task libraries are not supported for notebook tasks. Use notebook-scoped libraries instead. See Notebook-scoped Python libraries.

  • 0 kudos
4 More Replies
stadelmannkevin
by New Contributor II
  • 525 Views
  • 4 replies
  • 2 kudos

init_script breaks Notebooks

 Hi everyoneWe would like to use our private company Python repository for installing Python libraries with pip install.To achieve this, I created a simple script which sets the index-url configuration of pip to our private repoI set this script as a...

  • 525 Views
  • 4 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

Did you also try cloning the cluster or using other cluster for the testing? The metastore down is normally a Hive Metastore issue, should not be impacting here, but you could check for more details on the error on the log4j under Driver logs.

  • 2 kudos
3 More Replies
sensanjoy
by Contributor
  • 22933 Views
  • 6 replies
  • 1 kudos

Resolved! Performance issue with pyspark udf function calling rest api

Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...

  • 22933 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sanjoy Sen​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

  • 1 kudos
5 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels