Data Engineering

Forum Posts

Sorted by:

by Sasikala • New Contributor

yesterday

45 Views
1 replies
0 kudos

Service Principal Managed by Databricks

I have done the below steps1. Created a databricks managed service principal2. Created a Oauth Secret3. Gave all necessary permissions to the service principalI'm trying to use this Service principal in Azure Devops to automate CI/CD. but it fails as...

Data Engineering

45 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Walter_C
Valued Contributor II

6m ago

0 kudos

Have you follow the steps available for service principal for CI/CD available here: https://learn.microsoft.com/en-us/azure/databricks/dev-tools/ci-cd/ci-cd-sp

0 kudos

6m ago

by kDev • New Contributor

05-09-2023 12:45:33 PM

5021 Views
4 replies
1 kudos

UnauthorizedAccessException: PERMISSION_DENIED: User does not have READ FILES on External Location

Our jobs have been running fine so far w/o any issues on a specific workspace. These jobs read data from files on Azure ADLS storage containers and dont use the hive metastore data at all.Now we attached the unity metastore to this workspace, created...

Data Engineering

5021 Views
4 replies
1 kudos

05-09-2023 12:45:33 PM

View Replies

Latest Reply

Masha
New Contributor

5 hours ago

1 kudos

@Wojciech_BUK Thanks a lot for the feedback! I have a couple of questions: When you say "allow workspace clusters to access storage" - I understand when you talk about interactive cluster. In my case I was trying to trigger a Databricks Notebook/Job ...

1 kudos

5 hours ago

3 More Replies

by radothede • New Contributor

Thursday

114 Views
2 replies
1 kudos

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?The reason I'm asking is I can see a downgrade...

Data Engineering

114 Views
2 replies
1 kudos

Thursday

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

1 kudos

Hi @radothede, Cluster Pools and On-Demand Clusters: In Azure Databricks, a cluster pool is a collection of idle, pre-configured clusters that can be shared among multiple users or jobs. Instead of giving each user their own dedicated cluster, you...

1 kudos

yesterday

1 More Replies

by lieber_augustin • Visitor

yesterday

62 Views
0 replies
0 kudos

Reading from one Postgres table result in several Scan JDBCRelation operations

Hello,I am working on a Spark job where I'm reading several tables from PostgreSQL into DataFrames as follows: df = (spark.read .format("postgresql") .option("query", query) .option("host", database_host) .option("port...

Data Engineering

62 Views
0 replies
0 kudos

yesterday

by gabe123 • Visitor

yesterday

35 Views
0 replies
0 kudos

Strange Error with custom module in delta live table pipeline

The chunk of code in questionsys.path.append( spark.conf.get("util_path", "/Workspace/Repos/Production/loch-ness/utils/") ) from broker_utils import extract_day_with_suffix, proper_case_address_udf, proper_case_last_name_first_udf, proper_case_ud...

Data Engineering

35 Views
0 replies
0 kudos

yesterday

by smukhi • Visitor

yesterday

41 Views
0 replies
0 kudos

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

As of this morning we started receiving the following error message on a Databricks job with a single Pyspark Notebook task. The job has not had any code changes in 2 months. The cluster configuration has also not changed. The last successful run of ...

Data Engineering

41 Views
0 replies
0 kudos

yesterday

by srikanth2 • Visitor

yesterday

49 Views
0 replies
0 kudos

Can we use Managed Identity to create mount point for ADLS Gen2

Hi,We would like to use Azure Managed Identity to create mount point to read/write data from/to ADLS Gen2?We are also using following code snippet to use MSI authentication to read data from ADLS Gen2 but it is giving error,storage_account_name = "<<...

Data Engineering

49 Views
0 replies
0 kudos

yesterday

by Manzilla • New Contributor

Wednesday

164 Views
2 replies
1 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

Data Engineering

164 Views
2 replies
1 kudos

Wednesday

View Replies

Latest Reply

Manzilla
New Contributor

yesterday

1 kudos

Thank you thats what I understood too. It is just nice to get validation from someone else that works with this.

1 kudos

yesterday

1 More Replies

by gabrieleladd • New Contributor

yesterday

359 Views
2 replies
1 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering

Data Pipelines

Delta Live Tables

359 Views
2 replies
1 kudos

yesterday

View Replies

Latest Reply

Lakshay
Esteemed Contributor

yesterday

1 kudos

If you want to reprocess all the data, you can simply for a "Full Refresh" option in the DLT pipeline. You can read more about it here: https://docs.databricks.com/en/delta-live-tables/updates.html#how-delta-live-tables-updates-tables-and-views

1 kudos

yesterday

1 More Replies

by DBX-2024 • New Contributor

yesterday

56 Views
0 replies
0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering

Databricks

56 Views
0 replies
0 kudos

yesterday

by RicardoS • New Contributor II

08-10-2023 10:28:41 PM

3750 Views
3 replies
1 kudos

Value of SQL variable in IF statement using Spark SQL

Hi there,I am new to Spark SQL and would like to know if it possible to reproduce the below T-SQL query in Databricks. This is a sample query, but I want to determine if a query needs to be executed or not. DECLARE @VariableA AS INT , @Vari...

Data Engineering

3750 Views
3 replies
1 kudos

08-10-2023 10:28:41 PM

View Replies

Latest Reply

Edthehead
New Contributor III

yesterday

1 kudos

Since you are looking for a single value back, you can use the CASE function to achieve what you need.%sqlSET var.myvarA = (SELECT 6);SET var.myvarB = (SELECT 7);SELECT CASE WHEN ${var.myvarA} = ${var.myvarB} THEN 'Equal' ELSE 'Not equal' END AS resu...

1 kudos

yesterday

2 More Replies

by John_Rotenstein • New Contributor II

09-14-2023 1:44:25 AM

3718 Views
4 replies
3 kudos

Retrieve job-level parameters in Python

Parameters can be passed to Tasks and the values can be retrieved with:dbutils.widgets.get("parameter_name")More recently, we have been given the ability to add parameters to Jobs.However, the parameters cannot be retrieved like Task parameters.Quest...

Data Engineering

3718 Views
4 replies
3 kudos

09-14-2023 1:44:25 AM

View Replies

Latest Reply

cbern
New Contributor II

yesterday

3 kudos

an update to my answer: Databricks has advised us that the `dbutils.notebook.entry_point` method is not supported (could be deprecated), and the recommended way to read in a job parameter is through widgets, i.e. `dbutils.widgets.get("param_key")` (...

3 kudos

yesterday

3 More Replies

by jaredrohe • New Contributor II

10-26-2023 7:35:09 PM

1462 Views
5 replies
1 kudos

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

Hello,I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here...

Data Engineering

Access Mode

Delta Live Tables

Instance Profiles

No Isolation Shared

1462 Views
5 replies
1 kudos

10-26-2023 7:35:09 PM

View Replies

Latest Reply

jaredrohe
New Contributor II

yesterday

1 kudos

Unfortunately, I never got this to work.

1 kudos

yesterday

4 More Replies

by SreeG • New Contributor II

Sunday

194 Views
2 replies
0 kudos

CICD for Work Flows

HiI am facing issues when deploying work flows to different environment. The same works for Notebooks and Scripts, when deploying the work flows, it failed with "Authorization Failed. Your token may be expired or lack the valid scope". Anything shoul...

Data Engineering

CICD

194 Views
2 replies
0 kudos

Sunday

View Replies

Latest Reply

SreeG
New Contributor II

yesterday

0 kudos

Thanks, Yesh. The issue was because of a configuration parameter. After changing that, we could deploy. Thank you

0 kudos

yesterday

1 More Replies

by subha2 • New Contributor II

yesterday

270 Views
0 replies
0 kudos

Not able to read tables in Unity Catalog parallel

There are some tables under schema/database under Unity Catalog.The Notebook need to read the table parallel using loop and thread and execute the query configuredBut the sql statement is not getting executed via spark.sql() or spark.read.table().It ...

Data Engineering

270 Views
0 replies
0 kudos

yesterday

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Service Principal Managed by Databricks

UnauthorizedAccessException: PERMISSION_DENIED: User does not have READ FILES on External Location

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

Reading from one Postgres table result in several Scan JDBCRelation operations

Strange Error with custom module in delta live table pipeline

Encountering Error UNITY_CREDENTIAL_SCOPE_MISSING_SCOPE

Can we use Managed Identity to create mount point for ADLS Gen2

Delta Live table - Adding streaming to existing table

Clearing data stored by pipelines

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

Value of SQL variable in IF statement using Spark SQL

Retrieve job-level parameters in Python

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

CICD for Work Flows

Not able to read tables in Unity Catalog parallel

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...