cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

mkEngineer
by New Contributor III
  • 868 Views
  • 1 replies
  • 1 kudos

Resolved! DLT: "cannot run a cell when connected to pipeline databricks"

Hi,I have several different cells in my notebook that are connected to a DLT pipeline. Why are some cells skipped and others aren't?I get the message "cannot run a cell when connected to the pipeline Databricks" when try running a cell when I'm conne...

  • 868 Views
  • 1 replies
  • 1 kudos
Latest Reply
Takuya-Omi
Valued Contributor III
  • 1 kudos

Hi, @mkEngineer When working with Delta Live Tables (DLT) in Databricks, you cannot run individual cells interactively as you would in a standard Databricks notebook.DLT Pipeline Behavior:Delta Live Tables notebooks are executed as part of a managed ...

  • 1 kudos
issa
by New Contributor III
  • 3175 Views
  • 9 replies
  • 5 kudos

Resolved! How to access bronze dlt in silver dlt

I have a job in Workflows thatt runs two DLT pipelines, one for Bronze_Transaction and on for Silver_Transaction. The reason for two DLT pipelines is because i want the tables to be created in bronze catalog and erp schema, and silver catalog and erp...

Data Engineering
dlt
DLT pipeline
Medallion
Workflows
  • 3175 Views
  • 9 replies
  • 5 kudos
Latest Reply
issa
New Contributor III
  • 5 kudos

Final solution for the Bronze:# Define view as the source@dlt.viewdef Transactions_Bronze_View():    return (        spark.readStream.format("cloudFiles")        .option("cloudFiles.format", "json")        .option("inferSchema", True)        .option(...

  • 5 kudos
8 More Replies
Timmes0815
by New Contributor III
  • 1130 Views
  • 3 replies
  • 0 kudos

Resolved! Set up Loacation using widget

I'm struggeling using the databricks widget to set up the location in an sql create table statement. I tried the following to set up the location:Step1: Creating a notebook (Notebook1) to define the variable.Location_Path =   'abfss:xxxxx@xxxx.xxx.ne...

  • 1130 Views
  • 3 replies
  • 0 kudos
Latest Reply
Timmes0815
New Contributor III
  • 0 kudos

I finaly solved my problem by using the parameters in python F-Strings:Location_Path = dbutils.widgets.text("Location_Path","") -- create table using widget query = f""" CREATE OR REPLACE TABLE schema.Tabelname1 LOCATION '{Location_Path}' AS SELECT...

  • 0 kudos
2 More Replies
holychs
by New Contributor III
  • 936 Views
  • 1 replies
  • 0 kudos

Running child job under parent job using run_job_task

Hi Community,I am trying to call another job under a workflow job using run_job_task. Currently I am manually providing job_id of the child job. I want to know if there is any way to pass job_name instead of run_id. This will automate the deployment ...

  • 936 Views
  • 1 replies
  • 0 kudos
Latest Reply
filipniziol
Esteemed Contributor
  • 0 kudos

Hi @holychs ,It is possible to do using lookup in Databricks Asset Bundles.You define the job id variable that finds id of the job based on its name and use this variable to specify job_id in the run_job_task. Here is the code: variables: my_job_id...

  • 0 kudos
holychs
by New Contributor III
  • 823 Views
  • 2 replies
  • 0 kudos

Resolved! Concurrent Workflow Jobs

Hi Community, I am trying to run a Databricks workflow job using run_job_task under a for_loop. I have set the concurrent jobs as 2. I can see 2 iteration jobs getting triggered successfully. But both fail with an error:"ConnectException: Connection ...

  • 823 Views
  • 2 replies
  • 0 kudos
Latest Reply
holychs
New Contributor III
  • 0 kudos

It was an internal bug resolved with managing different parameters for each loop jobs.

  • 0 kudos
1 More Replies
JK2021
by New Contributor III
  • 5518 Views
  • 6 replies
  • 3 kudos

Resolved! Exception handling in Databricks

We are planning to customise code on Databricks to call Salesforce bulk API 2.0 to load data from databricks delta table to Salesforce.My question is : All the exception handling, retries and all around Bulk API can be coded explicitly in Data bricks...

  • 5518 Views
  • 6 replies
  • 3 kudos
Latest Reply
Rolx
New Contributor II
  • 3 kudos

Bulk api is working as expected for loading data?

  • 3 kudos
5 More Replies
suryateja405555
by New Contributor III
  • 1349 Views
  • 1 replies
  • 1 kudos

Databricks workflow deployment issue

The below one is the data bricks workflow. Note: ETL_schema check is if/else task in databricks workflow)Declaring below taskValues based on some conditions in ETL_data_check notebooks. Based on the below output the next task"ETL_schema_checks" (if/e...

suryateja405555_2-1732082532269.png suryateja405555_3-1732083107096.png
Data Engineering
assetbundles
DAB
  • 1349 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi @suryateja405555, How are you doing today?As per my understanding, Make sure ETL_data_checks is properly declared in the tasks section of your workflow configuration in the root module. For example, add it with a task_key and its respective proper...

  • 1 kudos
Faisal
by Contributor
  • 3052 Views
  • 2 replies
  • 1 kudos

DLT maintainace clusters

How does maintenance clusters do the cleanup using optimize, zorder and vacuum. I read that it is handled automatically but how does maintenance cluster know which column to optimize, where do we need to specify that info ?

  • 3052 Views
  • 2 replies
  • 1 kudos
Latest Reply
cgrant
Databricks Employee
  • 1 kudos

At this time, Z-order columns must be specified in the asset definition, the property is pipelines.autoOptimize.zOrderCols. This may change in the future with Predictive Optimization.

  • 1 kudos
1 More Replies
Pramod_G
by New Contributor II
  • 1075 Views
  • 4 replies
  • 0 kudos

Job Cluster with Continuous Trigger Type: Is Frequent Restart Required?

Hi All,I have a job continuously processing IoT data. The workflow reads data from Azure Event Hub and inserts it into the Databricks bronze layer. From there, the data is read, processed, validated, and inserted into the Databricks silver layer. The...

Data Engineering
Driver or Cluster Stability Issues
Long-Running Job Challenges
  • 1075 Views
  • 4 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

How are you ingesting the data? Are you using the Delta Live Table mechanism - https://docs.databricks.com/en/delta-live-tables/index.html?

  • 0 kudos
3 More Replies
GS_S
by New Contributor III
  • 1849 Views
  • 7 replies
  • 0 kudos

Resolved! Error during merge operation: 'NoneType' object has no attribute 'collect'

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an...

  • 1849 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned  This data filtering is performed behind the scenes using serverless compute. In terms of costs:Customers are charged for ...

  • 0 kudos
6 More Replies
manojpatil04
by New Contributor III
  • 1118 Views
  • 5 replies
  • 0 kudos

External dependency on serverless job from Airflow is not working on s3 path and workspace

I am working on use case where we have to run python script from serverless job through Airflow. when we are trying to trigger serverless job and passing external dependency as wheel from s3 path or workspace path it is not working, but on volume it ...

  • 1118 Views
  • 5 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

As per serverless compute limitations I can see the following:  Task libraries are not supported for notebook tasks. Use notebook-scoped libraries instead. See Notebook-scoped Python libraries.

  • 0 kudos
4 More Replies
stadelmannkevin
by New Contributor II
  • 1143 Views
  • 4 replies
  • 2 kudos

init_script breaks Notebooks

 Hi everyoneWe would like to use our private company Python repository for installing Python libraries with pip install.To achieve this, I created a simple script which sets the index-url configuration of pip to our private repoI set this script as a...

  • 1143 Views
  • 4 replies
  • 2 kudos
Latest Reply
Walter_C
Databricks Employee
  • 2 kudos

Did you also try cloning the cluster or using other cluster for the testing? The metastore down is normally a Hive Metastore issue, should not be impacting here, but you could check for more details on the error on the log4j under Driver logs.

  • 2 kudos
3 More Replies
sensanjoy
by Contributor II
  • 25196 Views
  • 6 replies
  • 1 kudos

Resolved! Performance issue with pyspark udf function calling rest api

Hi All,I am facing some performance issue with one of pyspark udf function that post data to REST API(uses cosmos db backend to store the data).Please find the details below: # The spark dataframe(df) contains near about 30-40k data. # I am using pyt...

  • 25196 Views
  • 6 replies
  • 1 kudos
Latest Reply
Anonymous
Not applicable
  • 1 kudos

Hi @Sanjoy Sen​ Thank you for your question! To assist you better, please take a moment to review the answer and let me know if it best fits your needs.Please help us select the best solution by clicking on "Select As Best" if it does.Your feedback w...

  • 1 kudos
5 More Replies
shusharin_anton
by New Contributor II
  • 911 Views
  • 1 replies
  • 1 kudos

Resolved! Sort after update on DWH

Running query on serverless DWH:UPDATEcatalog.schema.tableSETcol_tmp = CAST(col as DECIMAL(30, 15))In query profiling, it has some sort and shuffle stages in graph.Table has partition by partition_date columnSome details in sort node mentions that so...

  • 911 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @shusharin_anton, The sort and shuffle stages in your query profile are likely triggered by the need to redistribute and order the data based on the partition_date column. This behavior can be attributed to the way Spark handles data partitioning ...

  • 1 kudos
rai00
by New Contributor
  • 613 Views
  • 1 replies
  • 0 kudos

Mock user doesn't have the required privileges to access catalog `remorph` while running 'make test'

Utility : Remorph (Databricks)Issue  : 'User `me@example.com` doesn't have required privileges :: ``to access catalog `remorph`' while running 'make test' cmdI am encountering an issue while running tests for Databricks Labs Remorph using 'make test'...

  • 613 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @rai00, Ensure that the mock user me@example.com has the necessary privileges at both the catalog and schema levels. The user needs specific privileges such as USE_SCHEMA and CREATE_VOLUME   Use the WorkspaceClient to check the effective privilege...

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels