Data Engineering

Forum Posts

Sorted by:

by kDev • New Contributor

05-09-2023 12:45:33 PM

4944 Views
2 replies
1 kudos

UnauthorizedAccessException: PERMISSION_DENIED: User does not have READ FILES on External Location

Our jobs have been running fine so far w/o any issues on a specific workspace. These jobs read data from files on Azure ADLS storage containers and dont use the hive metastore data at all.Now we attached the unity metastore to this workspace, created...

Data Engineering

4944 Views
2 replies
1 kudos

05-09-2023 12:45:33 PM

View Replies

Latest Reply

Masha
Visitor

18m ago

1 kudos

Hello @kDev were you able to solve this issue? I have now the same issue and seems like I already tried everything...

1 kudos

18m ago

1 More Replies

by DBX-2024 • New Contributor

43m ago

7 Views
0 replies
0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering

7 Views
0 replies
0 kudos

43m ago

by RicardoS • New Contributor II

08-10-2023 10:28:41 PM

3693 Views
3 replies
1 kudos

Value of SQL variable in IF statement using Spark SQL

Hi there,I am new to Spark SQL and would like to know if it possible to reproduce the below T-SQL query in Databricks. This is a sample query, but I want to determine if a query needs to be executed or not. DECLARE @VariableA AS INT , @Vari...

Data Engineering

3693 Views
3 replies
1 kudos

08-10-2023 10:28:41 PM

View Replies

Latest Reply

Edthehead
New Contributor III

47m ago

1 kudos

Since you are looking for a single value back, you can use the CASE function to achieve what you need.%sqlSET var.myvarA = (SELECT 6);SET var.myvarB = (SELECT 7);SELECT CASE WHEN ${var.myvarA} = ${var.myvarB} THEN 'Equal' ELSE 'Not equal' END AS resu...

1 kudos

47m ago

2 More Replies

by John_Rotenstein • New Contributor II

09-14-2023 1:44:25 AM

3666 Views
4 replies
3 kudos

Retrieve job-level parameters in Python

Parameters can be passed to Tasks and the values can be retrieved with:dbutils.widgets.get("parameter_name")More recently, we have been given the ability to add parameters to Jobs.However, the parameters cannot be retrieved like Task parameters.Quest...

Data Engineering

3666 Views
4 replies
3 kudos

09-14-2023 1:44:25 AM

View Replies

Latest Reply

cbern
New Contributor II

52m ago

3 kudos

an update to my answer: Databricks has advised us that the `dbutils.notebook.entry_point` method is not supported (could be deprecated), and the recommended way to read in a job parameter is through widgets, i.e. `dbutils.widgets.get("param_key")` (...

3 kudos

52m ago

3 More Replies

by jaredrohe • New Contributor II

10-26-2023 7:35:09 PM

1408 Views
5 replies
1 kudos

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

Hello,I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here...

Data Engineering

Access Mode

Delta Live Tables

Instance Profiles

No Isolation Shared

1408 Views
5 replies
1 kudos

10-26-2023 7:35:09 PM

View Replies

Latest Reply

jaredrohe
New Contributor II

an hour ago

1 kudos

Unfortunately, I never got this to work.

1 kudos

an hour ago

4 More Replies

by SreeG • New Contributor II

Sunday

135 Views
2 replies
0 kudos

CICD for Work Flows

HiI am facing issues when deploying work flows to different environment. The same works for Notebooks and Scripts, when deploying the work flows, it failed with "Authorization Failed. Your token may be expired or lack the valid scope". Anything shoul...

Data Engineering

CICD

135 Views
2 replies
0 kudos

Sunday

View Replies

Latest Reply

SreeG
New Contributor II

an hour ago

0 kudos

Thanks, Yesh. The issue was because of a configuration parameter. After changing that, we could deploy. Thank you

0 kudos

an hour ago

1 More Replies

by subha2 • New Contributor

2 hours ago

31 Views
0 replies
0 kudos

Not able to read tables in Unity Catalog parallel

There are some tables under schema/database under Unity Catalog.The Notebook need to read the table parallel using loop and thread and execute the query configuredBut the sql statement is not getting executed via spark.sql() or spark.read.table().It ...

Data Engineering

31 Views
0 replies
0 kudos

2 hours ago

by jorperort • New Contributor

yesterday

72 Views
2 replies
0 kudos

[Databricks Assets Bundles] no deployment state

Good morning, I'm trying to run: databricks bundle run --debug -t dev integration_tests_job My bundle looks: bundle: name: x include: - ./resources/*.yml targets: dev: mode: development default: true workspace: host: x r...

Data Engineering

Databricks Assets Bundles

Deployment Error

pid=265687

72 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

6 hours ago

0 kudos

Hi @jorperort, The error message you’re seeing, “no deployment state. Did you forget to run ‘databricks bundle deploy’?”, indicates that the deployment state is missing. Here are some steps you can take to resolve this issue: Verify Deploym...

0 kudos

6 hours ago

1 More Replies

by vinayaka_pallak • New Contributor

Wednesday

67 Views
1 replies
0 kudos

Pytest on Notebook

I am currently exploring testing methodologies for Databricks notebooks and would like to inquire whether it's possible to write pytest tests for notebooks that contain code not encapsulated within functions or classes.***********************a = 4b ...

Data Engineering

67 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

Kaniz
Community Manager

5 hours ago

0 kudos

Hi @vinayaka_pallak, Testing Databricks Notebooks is essential to ensure the correctness and reliability of your code. While notebooks are often used for exploratory analysis and prototyping, it’s still possible to write tests for code blocks withi...

0 kudos

5 hours ago

by JameDavi_51481 • New Contributor III

01-17-2024 7:30:00 AM

2170 Views
4 replies
0 kudos

Can we add tags to Unity Catalog through Terraform?

We use Terraform to manage most of our infrastructure, and I would like to extend this to Unity Catalog. However, we are extensive users of tagging to categorize our datasets, and the only programmatic method I can find for adding tags is to use SQL ...

Data Engineering

2170 Views
4 replies
0 kudos

01-17-2024 7:30:00 AM

View Replies

Latest Reply

jakubigla
Visitor

5 hours ago

0 kudos

huge databricks client here: we also need this

0 kudos

5 hours ago

3 More Replies

by Manzilla • New Contributor

Wednesday

57 Views
1 replies
0 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

Data Engineering

57 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

Kaniz
Community Manager

5 hours ago

0 kudos

Hi @Manzilla, When using Delta Live Tables’ dlt.apply_changes for change data capture (CDC), it’s essential to understand how it works. Let’s break down the process and address your specific scenario: CDC with Delta Live Tables: Delta Live Tables...

0 kudos

5 hours ago

by amitkmaurya • New Contributor

yesterday

65 Views
1 replies
0 kudos

Databricks job keep getting failed due to executor lost.

Getting following error while saving a dataframe partitioned by two columns.Job aborted due to stage failure: Task 5774 in stage 33.0 failed 4 times, most recent failure: Lost task 5774.3 in stage 33.0 (TID 7736) (13.2.96.110 executor 7): ExecutorLos...

Data Engineering

databricks jobs

spark

65 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

6 hours ago

0 kudos

Hi @amitkmaurya , The error message you’re encountering indicates that your Spark job failed due to a stage failure. Task Failure and Exit Code 137: The error message mentions that Task 5774 in stage 33.0 failed 4 times, with the most recent fai...

0 kudos

6 hours ago

by gabrieleladd • Visitor

6 hours ago

33 Views
0 replies
0 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering

Data Pipelines

Delta Live Tables

33 Views
0 replies
0 kudos

6 hours ago

by htu • Visitor

yesterday

68 Views
2 replies
0 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

Data Engineering

68 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

8 hours ago

0 kudos

Hi @htu, When you install Databricks Connect, it modifies the behaviour of PySpark in a way that prevents it from working with the local master node. This can be frustrating, especially when you’re trying to run unit tests for Spark-related code w...

0 kudos

8 hours ago

1 More Replies

by Fnazar • New Contributor

yesterday

48 Views
1 replies
0 kudos

Billing of Databricks Job clusters

Hi All,Please help me understand how the billing is calculated for using the Job cluster.Document says they are charged hourly basis, so if my job ran for 1hr 30mins then will be charged for the 30mins based on the hourly rate or it will be charged f...

Data Engineering

48 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

PL_db
New Contributor III

6 hours ago

0 kudos

Job clusters consume DBUs per hour depending on the VM size. The Databricks billing happens at "per second granularity", see here. That means if you run your job for 1.5 hours, you will be charged DBUs/hour*1.5*SKU_price; accordingly, if you run your...

0 kudos

6 hours ago

User

Count

1602

736

344

284

247

Databricks

Forum Posts

UnauthorizedAccessException: PERMISSION_DENIED: User does not have READ FILES on External Location

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

Value of SQL variable in IF statement using Spark SQL

Retrieve job-level parameters in Python

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

CICD for Work Flows

Not able to read tables in Unity Catalog parallel

[Databricks Assets Bundles] no deployment state

Pytest on Notebook

Can we add tags to Unity Catalog through Terraform?

Delta Live table - Adding streaming to existing table

Databricks job keep getting failed due to executor lost.

Clearing data stored by pipelines

Installing Databricks Connect breaks pyspark local cluster mode

Billing of Databricks Job clusters

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...