Data Engineering

Forum Posts

Sorted by:

by Skr7 • New Contributor

09-21-2023 7:27:06 AM

1137 Views
2 replies
1 kudos

Resolved! Scheduled job output export

Hi ,I have a Databricks job that results in a dashboard post run , I'm able to download the dashboard as HTML from the view job runs page , but I want to automate the process , so I tried using the Databricks API , but it says {"error_code":"INVALID_...

Data Engineering

data engineering

1137 Views
2 replies
1 kudos

09-21-2023 7:27:06 AM

View Replies

Latest Reply

Kaniz
Community Manager

09-22-2023 12:13:20 AM

1 kudos

Hi @Skr7, You cannot automate exporting the dashboard as HTML using the Databricks API. The Databricks API only supports exporting results for notebook task runs, not for job run dashboards. Here's the relevant excerpt from the provided sources: Exp...

1 kudos

09-22-2023 12:13:20 AM

1 More Replies

by Anske • New Contributor II

2 weeks ago

91 Views
1 replies
0 kudos

DLT apply_changes applies only deletes and inserts not updates

Hi,I have a DLT pipeline that applies changes from a source table (cdctest_cdc_enriched) to a target table (cdctest), by the following code:dlt.apply_changes( target = "cdctest", source = "cdctest_cdc_enriched", keys = ["ID"], sequence_by...

Data Engineering

Delta Live Tables

91 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

9m ago

0 kudos

Hi @Anske, It seems you’re encountering an issue with your Delta Live Tables (DLT) pipeline where updates from the source table are not being correctly applied to the target table. Let’s troubleshoot this together! Pipeline Update Process: Whe...

0 kudos

9m ago

by niruban • New Contributor II

2 weeks ago

79 Views
1 replies
0 kudos

Migrate a notebook that reside in workspace using Databricks Asset Bundle

Hello Community Folks -Did anyone implemented migration of notebooks that is in workspace to production databricks workspace using Databricks Asset Bundle? If so can you please help me with any documentation which I can refer? Thanks!!RegardsNiruban ...

Data Engineering

79 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

11m ago

0 kudos

Hi @niruban, Migrating notebooks from one Databricks workspace to another using Databricks Asset Bundles is a useful approach. Let me guide you through the process and provide relevant documentation. Databricks Asset Bundles Overview: Databricks ...

0 kudos

11m ago

by Oliver_Angelil • Valued Contributor II

2 weeks ago

104 Views
1 replies
0 kudos

Append-only table from non-streaming source in Delta Live Tables

I have a DLT pipeline, where all tables are non-streaming (materialized views), except for the last one, which needs to be append-only, and is therefore defined as a streaming table.The pipeline runs successfully on the first run. However on the seco...

Data Engineering

104 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

14m ago

0 kudos

Hi @Oliver_Angelil, It appears that you’re encountering an issue with your DLT (Databricks Delta Live Tables) pipeline, specifically related to having an append-only table at the end of the pipeline. Let’s explore some potential solutions: Stream...

0 kudos

14m ago

by BerkerKozan • New Contributor III

2 weeks ago

73 Views
1 replies
0 kudos

Using AAD Spn on AWS Databricks

I use AWS Databricks which has an SSO&Scim integration with AAD. I generated an SPN in AAD, synced it to Databricks, and want to use this SPN with using AAD client secrets to use Databricks SDK. But it doesnt work. I dont want to generate another tok...

Data Engineering

73 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

30m ago

0 kudos

Hi @BerkerKozan, It sounds like you’re trying to set up provisioning to Databricks using Microsoft Entra ID (formerly known as Azure Active Directory) and encountering some issues. Let’s break down the steps and address your concerns: Provisionin...

0 kudos

30m ago

by sasi2 • New Contributor II

2 weeks ago

246 Views
1 replies
0 kudos

Connecting to MuleSoft from Databricks

Hi, Is there any connectivity pipeline established already to access MuleSoft or AnyPoint exchange data using Databricks. I have seen many options to access databricks data in mulesoft but can we read the data from Mulesoft into databricks. Please gi...

Data Engineering

246 Views
1 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Kaniz
Community Manager

36m ago

0 kudos

Hi @sasi2, Connecting MuleSoft or AnyPoint to exchange data with Databricks is possible, and there are several options you can explore. Let’s dive into some solutions: Using JDBC Driver for Databricks in Mule Applications: The CData JDBC Driver...

0 kudos

36m ago

by MartinH • New Contributor II

03-23-2023 3:09:56 PM

2477 Views
7 replies
4 kudos

Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

Data Engineering

2477 Views
7 replies
4 kudos

03-23-2023 3:09:56 PM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-16-2024 11:22:48 PM

4 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

4 kudos

01-16-2024 11:22:48 PM

6 More Replies

by subha2 • New Contributor II

Friday

340 Views
1 replies
0 kudos

Not able to read tables in Unity Catalog parallel

There are some tables under schema/database under Unity Catalog.The Notebook need to read the table parallel using loop and thread and execute the query configuredBut the sql statement is not getting executed via spark.sql() or spark.read.table().It ...

Data Engineering

340 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

48m ago

0 kudos

Hi @subha2, It seems you’re encountering an issue related to executing SQL statements in Spark. Let’s troubleshoot this step by step: Check the Unity Catalog Configuration: Verify that the Unity Catalog configuration is correctly set up. Ensure t...

0 kudos

48m ago

by DBX-2024 • New Contributor

Friday

79 Views
1 replies
0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering

Databricks

79 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

51m ago

0 kudos

Hi @DBX-2024, Let’s break down your questions: High CPU Utilization Spikes: Are They Problematic? High CPU utilization spikes can be problematic depending on the context. Here are some considerations: Normal Behavior: It’s common for CPU utilizat...

0 kudos

51m ago

by smukhi • New Contributor

Friday

82 Views
1 replies
0 kudos

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of ...

Data Engineering

82 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

57m ago

0 kudos

Hi @smukhi, The error message you’re encountering, specifically the “Py4JJavaError” with the “Missing Credential Scope” issue, can be quite puzzling. Let’s explore some potential solutions and ideas to troubleshoot this problem: Check Cluster Con...

0 kudos

57m ago

by Skr7 • New Contributor

58m ago

10 Views
0 replies
0 kudos

Databricks Asset Bundles

Hi, I'm implementing Databricks Asset bundles, my scripts are in GitHub and my /resource has all the .yml of my Databricks workflow which are pointing to the main branch git_source: git_url: https://github.com/xxxx git_provider: ...

Data Engineering

Databricks

10 Views
0 replies
0 kudos

58m ago

by gabe123 • New Contributor

Friday

336 Views
1 replies
0 kudos

Strange Error with custom module in delta live table pipeline

The chunk of code in questionsys.path.append( spark.conf.get("util_path", "/Workspace/Repos/Production/loch-ness/utils/") ) from broker_utils import extract_day_with_suffix, proper_case_address_udf, proper_case_last_name_first_udf, proper_case_ud...

Data Engineering

336 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

an hour ago

0 kudos

Hi @gabe123 , It seems like you’re encountering a ModuleNotFoundError when trying to import the broker_utils module in your Python code. Let’s troubleshoot this issue step by step: Check Module Location: First, ensure that the broker_utils.py fil...

0 kudos

an hour ago

by lieber_augustin • New Contributor

Friday

102 Views
1 replies
0 kudos

Reading from one Postgres table result in several Scan JDBCRelation operations

Hello,I am working on a Spark job where I'm reading several tables from PostgreSQL into DataFrames as follows: df = (spark.read .format("postgresql") .option("query", query) .option("host", database_host) .option("port...

Data Engineering

102 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

an hour ago

0 kudos

Hi @lieber_augustin, Optimizing the performance of your PostgreSQL queries involves several considerations. Let’s address both the potential optimizations and the reason behind multiple Scan JDBCRelation operations. Database Design: Properly des...

0 kudos

an hour ago

by Husky • New Contributor III

02-08-2024 2:16:59 AM

1203 Views
4 replies
1 kudos

Resolved! Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Context:IDE: IntelliJ 2023.3.2Library: databricks-connect 13.3Python: 3.10Description:I develop notebooks and python scripts locally in the IDE and I connect to the spark cluster via databricks-connect for a better developer experience. I download a...

Data Engineering

1203 Views
4 replies
1 kudos

02-08-2024 2:16:59 AM

View Replies

Latest Reply

lathaniel
New Contributor III

04-02-2024 11:20:56 AM

1 kudos

Late to the discussion, but I too was looking for a way to do this _programmatically_, as opposed to the UI.The solution I landed on was using the Python SDK (though you could assuredly do this using an API request instead if you're not in Python):w ...

1 kudos

04-02-2024 11:20:56 AM

3 More Replies

by jainshasha • New Contributor

Monday

85 Views
4 replies
0 kudos

Job Cluster in Databricks workflow

Hi,I have configured 20 different workflows in Databricks. All of them configured with job cluster with different name. All 20 workfldows scheduled to run at same time. But even configuring different job cluster in all of them they run sequentially w...

Data Engineering

85 Views
4 replies
0 kudos

Monday

View Replies

Latest Reply

jainshasha
New Contributor

2 hours ago

0 kudos

Hi @Kaniz Attaching the screenshots of 5 of the workflows which schedule at same time

0 kudos

2 hours ago

3 More Replies

User

Count

1603

736

344

284

247

Databricks

Forum Posts

Resolved! Scheduled job output export

DLT apply_changes applies only deletes and inserts not updates

Migrate a notebook that reside in workspace using Databricks Asset Bundle

Append-only table from non-streaming source in Delta Live Tables

Using AAD Spn on AWS Databricks

Connecting to MuleSoft from Databricks

Azure Data Factory and Photon

Not able to read tables in Unity Catalog parallel

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

Databricks Asset Bundles

Strange Error with custom module in delta live table pipeline

Reading from one Postgres table result in several Scan JDBCRelation operations

Resolved! Upload file from local file system to Unity Catalog Volume (via databricks-connect)

Job Cluster in Databricks workflow

Scheduled job output export

Upload file from local file system to Unity Catalo...

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name