Data Engineering

Forum Posts

Sorted by:

by DBX-2024 • New Contributor

8m ago

1 Views
0 replies
0 kudos

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

I have Data Engineering Pipeline workload that run on Databricks.Job cluster has following configuration :- Worker i3.4xlarge with 122 GB memory and 16 coresDriver i3.4xlarge with 122 GB memory and 16 cores ,Min Worker -4 and Max Worker 8 We noticed...

Data Engineering

1 Views
0 replies
0 kudos

8m ago

by RicardoS • New Contributor II

08-10-2023 10:28:41 PM

3693 Views
3 replies
1 kudos

Value of SQL variable in IF statement using Spark SQL

Hi there,I am new to Spark SQL and would like to know if it possible to reproduce the below T-SQL query in Databricks. This is a sample query, but I want to determine if a query needs to be executed or not. DECLARE @VariableA AS INT , @Vari...

Data Engineering

3693 Views
3 replies
1 kudos

08-10-2023 10:28:41 PM

View Replies

Latest Reply

Edthehead
New Contributor III

11m ago

1 kudos

Since you are looking for a single value back, you can use the CASE function to achieve what you need.%sqlSET var.myvarA = (SELECT 6);SET var.myvarB = (SELECT 7);SELECT CASE WHEN ${var.myvarA} = ${var.myvarB} THEN 'Equal' ELSE 'Not equal' END AS resu...

1 kudos

11m ago

2 More Replies

by John_Rotenstein • New Contributor II

09-14-2023 1:44:25 AM

3665 Views
4 replies
3 kudos

Retrieve job-level parameters in Python

Parameters can be passed to Tasks and the values can be retrieved with:dbutils.widgets.get("parameter_name")More recently, we have been given the ability to add parameters to Jobs.However, the parameters cannot be retrieved like Task parameters.Quest...

Data Engineering

3665 Views
4 replies
3 kudos

09-14-2023 1:44:25 AM

View Replies

Latest Reply

cbern
New Contributor II

16m ago

3 kudos

an update to my answer: Databricks has advised us that the `dbutils.notebook.entry_point` method is not supported (could be deprecated), and the recommended way to read in a job parameter is through widgets, i.e. `dbutils.widgets.get("param_key")` (...

3 kudos

16m ago

3 More Replies

by jaredrohe • New Contributor II

10-26-2023 7:35:09 PM

1408 Views
5 replies
1 kudos

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

Hello,I am attempting to configure Autoloader in File Notification mode with Delta Live Tables. I configured an instance profile, but it is not working because I immediately get AWS access denied errors. This is the same issue that is referenced here...

Data Engineering

Access Mode

Delta Live Tables

Instance Profiles

No Isolation Shared

1408 Views
5 replies
1 kudos

10-26-2023 7:35:09 PM

View Replies

Latest Reply

jaredrohe
New Contributor II

33m ago

1 kudos

Unfortunately, I never got this to work.

1 kudos

33m ago

4 More Replies

by SreeG • New Contributor II

Sunday

134 Views
2 replies
0 kudos

CICD for Work Flows

HiI am facing issues when deploying work flows to different environment. The same works for Notebooks and Scripts, when deploying the work flows, it failed with "Authorization Failed. Your token may be expired or lack the valid scope". Anything shoul...

Data Engineering

CICD

134 Views
2 replies
0 kudos

Sunday

View Replies

Latest Reply

SreeG
New Contributor II

44m ago

0 kudos

Thanks, Yesh. The issue was because of a configuration parameter. After changing that, we could deploy. Thank you

0 kudos

44m ago

1 More Replies

by subha2 • New Contributor

an hour ago

22 Views
0 replies
0 kudos

Not able to read tables in Unity Catalog parallel

There are some tables under schema/database under Unity Catalog.The Notebook need to read the table parallel using loop and thread and execute the query configuredBut the sql statement is not getting executed via spark.sql() or spark.read.table().It ...

Data Engineering

22 Views
0 replies
0 kudos

an hour ago

by jorperort • New Contributor

yesterday

70 Views
2 replies
0 kudos

[Databricks Assets Bundles] no deployment state

Good morning, I'm trying to run: databricks bundle run --debug -t dev integration_tests_job My bundle looks: bundle: name: x include: - ./resources/*.yml targets: dev: mode: development default: true workspace: host: x r...

Data Engineering

Databricks Assets Bundles

Deployment Error

pid=265687

70 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

6 hours ago

0 kudos

Hi @jorperort, The error message you’re seeing, “no deployment state. Did you forget to run ‘databricks bundle deploy’?”, indicates that the deployment state is missing. Here are some steps you can take to resolve this issue: Verify Deploym...

0 kudos

6 hours ago

1 More Replies

by vinayaka_pallak • New Contributor

Wednesday

65 Views
1 replies
0 kudos

Pytest on Notebook

I am currently exploring testing methodologies for Databricks notebooks and would like to inquire whether it's possible to write pytest tests for notebooks that contain code not encapsulated within functions or classes.***********************a = 4b ...

Data Engineering

65 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

Kaniz
Community Manager

4 hours ago

0 kudos

Hi @vinayaka_pallak, Testing Databricks Notebooks is essential to ensure the correctness and reliability of your code. While notebooks are often used for exploratory analysis and prototyping, it’s still possible to write tests for code blocks withi...

0 kudos

4 hours ago

by JameDavi_51481 • New Contributor III

01-17-2024 7:30:00 AM

2168 Views
4 replies
0 kudos

Can we add tags to Unity Catalog through Terraform?

We use Terraform to manage most of our infrastructure, and I would like to extend this to Unity Catalog. However, we are extensive users of tagging to categorize our datasets, and the only programmatic method I can find for adding tags is to use SQL ...

Data Engineering

2168 Views
4 replies
0 kudos

01-17-2024 7:30:00 AM

View Replies

Latest Reply

jakubigla
Visitor

5 hours ago

0 kudos

huge databricks client here: we also need this

0 kudos

5 hours ago

3 More Replies

by Manzilla • New Contributor

Wednesday

57 Views
1 replies
0 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

Data Engineering

57 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

Kaniz
Community Manager

5 hours ago

0 kudos

Hi @Manzilla, When using Delta Live Tables’ dlt.apply_changes for change data capture (CDC), it’s essential to understand how it works. Let’s break down the process and address your specific scenario: CDC with Delta Live Tables: Delta Live Tables...

0 kudos

5 hours ago

by amitkmaurya • New Contributor

yesterday

63 Views
1 replies
0 kudos

Databricks job keep getting failed due to executor lost.

Getting following error while saving a dataframe partitioned by two columns.Job aborted due to stage failure: Task 5774 in stage 33.0 failed 4 times, most recent failure: Lost task 5774.3 in stage 33.0 (TID 7736) (13.2.96.110 executor 7): ExecutorLos...

Data Engineering

databricks jobs

spark

63 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

5 hours ago

0 kudos

Hi @amitkmaurya , The error message you’re encountering indicates that your Spark job failed due to a stage failure. Task Failure and Exit Code 137: The error message mentions that Task 5774 in stage 33.0 failed 4 times, with the most recent fai...

0 kudos

5 hours ago

by gabrieleladd • Visitor

6 hours ago

32 Views
0 replies
0 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering

Data Pipelines

Delta Live Tables

32 Views
0 replies
0 kudos

6 hours ago

by htu • Visitor

yesterday

66 Views
2 replies
0 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

Data Engineering

66 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

7 hours ago

0 kudos

Hi @htu, When you install Databricks Connect, it modifies the behaviour of PySpark in a way that prevents it from working with the local master node. This can be frustrating, especially when you’re trying to run unit tests for Spark-related code w...

0 kudos

7 hours ago

1 More Replies

by Fnazar • New Contributor

yesterday

46 Views
1 replies
0 kudos

Billing of Databricks Job clusters

Hi All,Please help me understand how the billing is calculated for using the Job cluster.Document says they are charged hourly basis, so if my job ran for 1hr 30mins then will be charged for the 30mins based on the hourly rate or it will be charged f...

Data Engineering

46 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

PL_db
New Contributor III

6 hours ago

0 kudos

Job clusters consume DBUs per hour depending on the VM size. The Databricks billing happens at "per second granularity", see here. That means if you run your job for 1.5 hours, you will be charged DBUs/hour*1.5*SKU_price; accordingly, if you run your...

0 kudos

6 hours ago

by mamiya • New Contributor

yesterday

45 Views
1 replies
0 kudos

ODBC PowerBI 2 commands in one query

Hello everyone,I'm trying to use the ODBC DirectQuery option in PowerBI, but I keep getting an error about another command. The SQL query works while using the SQL Editor. Do I need to change the setup of my ODBC connector?DECLARE dateFrom DATE = DA...

Data Engineering

45 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

6 hours ago

0 kudos

Hi @mamiya , Here are a few steps you can take to address the error: Check Power Query Editor Steps: The error might be related to a specific step in the Power Query Editor. Try opening the Power Query Editor and reviewing the steps. If there’s a...

0 kudos

6 hours ago

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Job Cluster's CPU utilization goes higher than 100% few times during the workload run

Value of SQL variable in IF statement using Spark SQL

Retrieve job-level parameters in Python

Instance Profiles Do Not Work with Delta Live Tables Default Cluster Policy Access Mode "Shared"

CICD for Work Flows

Not able to read tables in Unity Catalog parallel

[Databricks Assets Bundles] no deployment state

Pytest on Notebook

Can we add tags to Unity Catalog through Terraform?

Delta Live table - Adding streaming to existing table

Databricks job keep getting failed due to executor lost.

Clearing data stored by pipelines

Installing Databricks Connect breaks pyspark local cluster mode

Billing of Databricks Job clusters

ODBC PowerBI 2 commands in one query

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...