Data Engineering

Forum Posts

Sorted by:

by ashraf1395 • Honored Contributor

05-06-2024 1:18:33 AM

7221 Views
1 replies
1 kudos

Optimising Clusters in Databricks on GCP

Hi there everyone,We are trying to get hands on Databricks Lakehouse for a prospective client's project.Our Major aim for the project is to Compare Datalakehosue on Databricks and Bigquery Datawarehouse in terms of Costs and time to setup and run que...

Data Engineering

7221 Views
1 replies
1 kudos

05-06-2024 1:18:33 AM

View Replies

by smedegaard • New Contributor III

04-19-2024 2:42:49 PM

2363 Views
1 replies
0 kudos

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

I've created a streaming live table from a foreign catalog. When I run the DLT pipeline it fils with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found".I haven't seen any documentation that suggests I need to install Debezium manuall...

Data Engineering

2363 Views
1 replies
0 kudos

04-19-2024 2:42:49 PM

View Replies

by MartinH • New Contributor II

03-23-2023 3:09:56 PM

20543 Views
7 replies
6 kudos

Resolved! Azure Data Factory and Photon

Hello, we have Databricks Python workbooks accessing Delta tables. These workbooks are scheduled/invoked by Azure Data Factory. How can I enable Photon on the linked services that are used to call Databricks?If I specify new job cluster, there does n...

Data Engineering

20543 Views
7 replies
6 kudos

03-23-2023 3:09:56 PM

View Replies

Latest Reply

CharlesReily
New Contributor III

01-16-2024 11:22:48 PM

6 kudos

When you create a cluster on Databricks, you can enable Photon by selecting the "Photon" option in the cluster configuration settings. This is typically done when creating a new cluster, and you would find the option in the advanced cluster configura...

6 kudos

01-16-2024 11:22:48 PM

6 More Replies

by dbdude • New Contributor II

08-17-2023 4:01:48 PM

14461 Views
3 replies
1 kudos

AWS Secrets Works In One Cluster But Not Another

Why can I use boto3 to go to secrets manager to retrieve a secret with a personal cluster but I get an error with a shared cluster?NoCredentialsError: Unable to locate credentials

Data Engineering

14461 Views
3 replies
1 kudos

08-17-2023 4:01:48 PM

View Replies

Latest Reply

Husky
New Contributor III

05-06-2024 2:13:29 AM

1 kudos

Hey @dbdude, I am facing the same error. Did you find a solution to access the AWS credentials on a Shared Cluster?This article describes a way of storing credentials in a Unity Catalog Volume to fetch by the Shared Cluster:https://medium.com/@amluci...

1 kudos

05-06-2024 2:13:29 AM

2 More Replies

by mamiya • New Contributor II

05-02-2024 5:16:34 AM

1801 Views
1 replies
0 kudos

ODBC PowerBI 2 commands in one query

Hello everyone,I'm trying to use the ODBC DirectQuery option in PowerBI, but I keep getting an error about another command. The SQL query works while using the SQL Editor. Do I need to change the setup of my ODBC connector?DECLARE dateFrom DATE = DA...

Data Engineering

1801 Views
1 replies
0 kudos

05-02-2024 5:16:34 AM

View Replies

by Deepak_Kandpal • New Contributor III

09-27-2022 1:21:37 AM

13819 Views
3 replies
3 kudos

Resolved! Invalid configuration value detected for fs.azure.account.key with com.crealytics:spark-excel

I have setup my Databricks notebook to use Service Principal to access ADLS using below configuration.service_credential = dbutils.secrets.get(scope="<scope>",key="<service-credential-key>") spark.conf.set("fs.azure.account.auth.type.<storage-accou...

Data Engineering

13819 Views
3 replies
3 kudos

09-27-2022 1:21:37 AM

View Replies

Latest Reply

Harsha_Dbrs
New Contributor II

05-05-2024 9:32:04 PM

3 kudos

Below is the implementation of same code in scala:spark.sparkContext.hadoopConfiguration.set("fs.azure.account.key.<accountName>.dfs.core.windows.net",<accountKey>)

3 kudos

05-05-2024 9:32:04 PM

2 More Replies

by prats33 • New Contributor

05-05-2024 1:04:14 AM

1125 Views
1 replies
0 kudos

schedule job termination

Hi i want to terminate my databricks job daily at 11.59am, how can i achieve this in databricks

Data Engineering

1125 Views
1 replies
0 kudos

05-05-2024 1:04:14 AM

View Replies

Latest Reply

Ajay-Pandey
Databricks MVP

05-05-2024 6:14:50 AM

0 kudos

Hi @prats33 You can use databricks cluster API for terminate your cluster at any specific time, create notebook for API and schedule it as databricks workflow job on job cluster at 11:59.

0 kudos

05-05-2024 6:14:50 AM

by srikanth2 • New Contributor II

05-03-2024 12:35:45 PM

3502 Views
2 replies
0 kudos

Can we use Managed Identity to create mount point for ADLS Gen2

Hi,We would like to use Azure Managed Identity to create mount point to read/write data from/to ADLS Gen2?We are also using following code snippet to use MSI authentication to read data from ADLS Gen2 but it is giving error,storage_account_name = "<<...

Data Engineering

3502 Views
2 replies
0 kudos

05-03-2024 12:35:45 PM

View Replies

Latest Reply

Walter_C
Databricks Employee

05-04-2024 9:23:33 AM

0 kudos

It seems that using User Assigned Managed Identity to read/write from ADLS Gen2 inside a notebook is not directly supported at the moment.

0 kudos

05-04-2024 9:23:33 AM

1 More Replies

by stepysamud • New Contributor

04-25-2024 2:27:43 AM

1325 Views
1 replies
0 kudos

Workflow UI broken after creating job via the api

Hi all,I'm in the progress of migrating from Databricks Azure to Databricks AWS.One part of this is migrating all our workflows which I wanted to via the /api/2.1/jobs/create api with the workflow passed via the json body. I have successfully created...

Data Engineering

1325 Views
1 replies
0 kudos

04-25-2024 2:27:43 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

05-04-2024 9:19:24 AM

0 kudos

Hello, many thanks for your question, as per the error message showed it was mentioning a possible timeout or network issue. As first step have you tried to open the page on another browser or using incognito mode?Also have you tried using different ...

0 kudos

05-04-2024 9:19:24 AM

by Sasikala • New Contributor

05-03-2024 9:02:53 AM

1688 Views
1 replies
0 kudos

Service Principal Managed by Databricks

I have done the below steps1. Created a databricks managed service principal2. Created a Oauth Secret3. Gave all necessary permissions to the service principalI'm trying to use this Service principal in Azure Devops to automate CI/CD. but it fails as...

Data Engineering

1688 Views
1 replies
0 kudos

05-03-2024 9:02:53 AM

View Replies

Latest Reply

Walter_C
Databricks Employee

05-04-2024 9:07:28 AM

0 kudos

Have you follow the steps available for service principal for CI/CD available here: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-sp

0 kudos

05-04-2024 9:07:28 AM

by radothede • Valued Contributor II

05-02-2024 1:05:07 PM

2216 Views
1 replies
0 kudos

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?The reason I'm asking is I can see a downgrade...

Data Engineering

2216 Views
1 replies
0 kudos

05-02-2024 1:05:07 PM

View Replies

by lieber_augustin • New Contributor

05-03-2024 7:18:45 PM

2248 Views
0 replies
0 kudos

Reading from one Postgres table result in several Scan JDBCRelation operations

Hello,I am working on a Spark job where I'm reading several tables from PostgreSQL into DataFrames as follows: df = (spark.read .format("postgresql") .option("query", query) .option("host", database_host) .option("port...

Data Engineering

2248 Views
0 replies
0 kudos

05-03-2024 7:18:45 PM

by gabe123 • New Contributor

05-03-2024 2:05:03 PM

1351 Views
0 replies
0 kudos

Strange Error with custom module in delta live table pipeline

The chunk of code in questionsys.path.append( spark.conf.get("util_path", "/Workspace/Repos/Production/loch-ness/utils/") ) from broker_utils import extract_day_with_suffix, proper_case_address_udf, proper_case_last_name_first_udf, proper_case_ud...

Data Engineering

1351 Views
0 replies
0 kudos

05-03-2024 2:05:03 PM

by AKUMAR_DEngg • New Contributor II

05-03-2024 7:58:14 AM

2676 Views
0 replies
0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering

Databricks

2676 Views
0 replies
0 kudos

05-03-2024 7:58:14 AM

by RicardoS • New Contributor II

08-10-2023 10:28:41 PM

10685 Views
3 replies
1 kudos

Value of SQL variable in IF statement using Spark SQL

Hi there,I am new to Spark SQL and would like to know if it possible to reproduce the below T-SQL query in Databricks. This is a sample query, but I want to determine if a query needs to be executed or not. DECLARE @VariableA AS INT , @Vari...

Data Engineering

10685 Views
3 replies
1 kudos

08-10-2023 10:28:41 PM

View Replies

Latest Reply

Edthehead
Contributor III

05-03-2024 7:54:38 AM

1 kudos

Since you are looking for a single value back, you can use the CASE function to achieve what you need.%sqlSET var.myvarA = (SELECT 6);SET var.myvarB = (SELECT 7);SELECT CASE WHEN ${var.myvarA} = ${var.myvarB} THEN 'Equal' ELSE 'Not equal' END AS resu...

1 kudos

05-03-2024 7:54:38 AM

2 More Replies

Databricks Community

Forum Posts

Optimising Clusters in Databricks on GCP

DLT run filas with "com.databricks.cdc.spark.DebeziumJDBCMicroBatchProvider not found"

Resolved! Azure Data Factory and Photon

AWS Secrets Works In One Cluster But Not Another

ODBC PowerBI 2 commands in one query

Resolved! Invalid configuration value detected for fs.azure.account.key with com.crealytics:spark-excel

schedule job termination

Can we use Managed Identity to create mount point for ADLS Gen2

Workflow UI broken after creating job via the api

Service Principal Managed by Databricks

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

Reading from one Postgres table result in several Scan JDBCRelation operations

Strange Error with custom module in delta live table pipeline

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

Value of SQL variable in IF statement using Spark SQL

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...