Data Engineering

Forum Posts

Sorted by:

by JameDavi_51481 • New Contributor III

01-17-2024 7:30:00 AM

2127 Views
4 replies
0 kudos

Can we add tags to Unity Catalog through Terraform?

We use Terraform to manage most of our infrastructure, and I would like to extend this to Unity Catalog. However, we are extensive users of tagging to categorize our datasets, and the only programmatic method I can find for adding tags is to use SQL ...

Data Engineering

2127 Views
4 replies
0 kudos

01-17-2024 7:30:00 AM

View Replies

Latest Reply

jakubigla
Visitor

7m ago

0 kudos

huge databricks client here: we also need this

0 kudos

7m ago

3 More Replies

by Manzilla • New Contributor

Wednesday

55 Views
1 replies
0 kudos

Delta Live table - Adding streaming to existing table

Currently, the bronze table ingests JSON files using @Dlt.table decorator on a spark.readStream functionA daily batch job does some transformation on bronze data and stores results in the silver table.New ProcessBronze still the same.A stream has bee...

Data Engineering

55 Views
1 replies
0 kudos

Wednesday

View Replies

Latest Reply

Kaniz
Community Manager

13m ago

0 kudos

Hi @Manzilla, When using Delta Live Tables’ dlt.apply_changes for change data capture (CDC), it’s essential to understand how it works. Let’s break down the process and address your specific scenario: CDC with Delta Live Tables: Delta Live Tables...

0 kudos

13m ago

by amitkmaurya • New Contributor

yesterday

54 Views
1 replies
0 kudos

Databricks job keep getting failed due to executor lost.

Getting following error while saving a dataframe partitioned by two columns.Job aborted due to stage failure: Task 5774 in stage 33.0 failed 4 times, most recent failure: Lost task 5774.3 in stage 33.0 (TID 7736) (13.2.96.110 executor 7): ExecutorLos...

Data Engineering

databricks jobs

spark

54 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

35m ago

0 kudos

Hi @amitkmaurya , The error message you’re encountering indicates that your Spark job failed due to a stage failure. Task Failure and Exit Code 137: The error message mentions that Task 5774 in stage 33.0 failed 4 times, with the most recent fai...

0 kudos

35m ago

by gabrieleladd • Visitor

an hour ago

8 Views
0 replies
0 kudos

Clearing data stored by pipelines

Hi everyone! I'm new to Databricks and moving my first steps with Delta Live Tables, so please forgive my inexperience. I'm building my first DLT pipeline and there's something that I can't really grasp: how to clear all the objects generated or upda...

Data Engineering

Data Pipelines

Delta Live Tables

8 Views
0 replies
0 kudos

an hour ago

by jorperort • New Contributor

yesterday

61 Views
1 replies
0 kudos

[Databricks Assets Bundles] no deployment state

Good morning, I'm trying to run: databricks bundle run --debug -t dev integration_tests_job My bundle looks: bundle: name: x include: - ./resources/*.yml targets: dev: mode: development default: true workspace: host: x r...

Data Engineering

Databricks Assets Bundles

Deployment Error

pid=265687

61 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

an hour ago

0 kudos

Hi @jorperort, The error message you’re seeing, “no deployment state. Did you forget to run ‘databricks bundle deploy’?”, indicates that the deployment state is missing. Here are some steps you can take to resolve this issue: Verify Deploym...

0 kudos

an hour ago

by htu • Visitor

yesterday

50 Views
2 replies
0 kudos

Installing Databricks Connect breaks pyspark local cluster mode

Hi, It seems that when databricks-connect is installed, pyspark is at the same time modified so that it will not anymore work with local master node. This has been especially useful in testing, when unit tests for spark-related code without any remot...

Data Engineering

50 Views
2 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

3 hours ago

0 kudos

Hi @htu, When you install Databricks Connect, it modifies the behaviour of PySpark in a way that prevents it from working with the local master node. This can be frustrating, especially when you’re trying to run unit tests for Spark-related code w...

0 kudos

3 hours ago

1 More Replies

by Fnazar • New Contributor

yesterday

44 Views
1 replies
0 kudos

Billing of Databricks Job clusters

Hi All,Please help me understand how the billing is calculated for using the Job cluster.Document says they are charged hourly basis, so if my job ran for 1hr 30mins then will be charged for the 30mins based on the hourly rate or it will be charged f...

Data Engineering

44 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

PL_db
New Contributor III

an hour ago

0 kudos

Job clusters consume DBUs per hour depending on the VM size. The Databricks billing happens at "per second granularity", see here. That means if you run your job for 1.5 hours, you will be charged DBUs/hour*1.5*SKU_price; accordingly, if you run your...

0 kudos

an hour ago

by mamiya • Visitor

yesterday

43 Views
1 replies
0 kudos

ODBC PowerBI 2 commands in one query

Hello everyone,I'm trying to use the ODBC DirectQuery option in PowerBI, but I keep getting an error about another command. The SQL query works while using the SQL Editor. Do I need to change the setup of my ODBC connector?DECLARE dateFrom DATE = DA...

Data Engineering

43 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

2 hours ago

0 kudos

Hi @mamiya , Here are a few steps you can take to address the error: Check Power Query Editor Steps: The error might be related to a specific step in the Power Query Editor. Try opening the Power Query Editor and reviewing the steps. If there’s a...

0 kudos

2 hours ago

by stevenayers-bge • New Contributor II

yesterday

96 Views
3 replies
2 kudos

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

I am having an issue where when I do a shallow clone using :create or replace table `catalog_a_test`.`schema_a`.`table_a` shallow clone `catalog_a`.`schema_a`.`table_a` I get:[TABLE_OR_VIEW_NOT_FOUND] The table or view catalog_a_test.schema_a.table_a...

Data Engineering

96 Views
3 replies
2 kudos

yesterday

View Replies

Latest Reply

Omar_hamdan
Community Manager

yesterday

2 kudos

Hi StevenThis is really a strange issue. First let's exclude some possible causes for this. We need to check the following:- The permission to table A and Catalog B. take a look at the following link to check what permission is needed: https://docs.d...

2 kudos

yesterday

2 More Replies

by radothede • Visitor

yesterday

42 Views
1 replies
0 kudos

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

I have a cluster pool with max capacity. I run multiple jobs against that cluster pool.Can on-demand clusters, created within this cluster pool, be shared across multiple different jobs, at the same time?The reason I'm asking is I can see a downgrade...

Data Engineering

42 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

2 hours ago

0 kudos

Hi @radothede, Cluster Pools and On-Demand Clusters: In Azure Databricks, a cluster pool is a collection of idle, pre-configured clusters that can be shared among multiple users or jobs. Instead of giving each user their own dedicated cluster, you...

0 kudos

2 hours ago

by Erik_L • Contributor II

yesterday

59 Views
1 replies
0 kudos

BUG: Unity Catalog kills UDF

We have UDFs in a few locations and today we noticed they died in performance. This seems to be caused by Unity Catalog.Test environment 1:Databricks Runtime Environment: 14.3 / 15.1Compute: 1 master, 4 nodesPolicy: UnrestrictedAccess Mode: SharedTes...

Data Engineering

59 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

Kaniz
Community Manager

2 hours ago

0 kudos

Hi @Erik_L , It appears that you’re experiencing performance issues related to Unity Catalog in your Databricks environment. Let’s explore some potential reasons and solutions: Mismanagement of Metastores: Unity Catalog, with one metastore per re...

0 kudos

2 hours ago

by Kayl669 • New Contributor III

Tuesday

154 Views
5 replies
0 kudos

SQL code against tables with '>' in headers suddenly failing?

Just want to post this issue we're experiencing here in case other people are facing something similar. Below is the wording of the support ticket request I've raised:SQL code that has been working is suddenly failing due to syntax errors today. Ther...

Data Engineering

154 Views
5 replies
0 kudos

Tuesday

View Replies

Latest Reply

Kayl669
New Contributor III

3 hours ago

0 kudos

The point that we've got to with this is that MS Support / Databricks have acknowledged that they did something and are working on a fix. "The issue occurred due to the regression in the recent DBR maintenance release...Our engineering team is workin...

0 kudos

3 hours ago

4 More Replies

by erigaud • Honored Contributor

Friday

79 Views
2 replies
1 kudos

Pass Dataframe to child job in "Run Job" task

Hello,I have a Job A that runs a Job B, and Job A defines a globalTempView and I would like to somehow access it in the child job. Is that in anyway possible ? Can the same cluster be used for both jobs ? If it is not possible, does someone know of a...

Data Engineering

79 Views
2 replies
1 kudos

Friday

View Replies

Latest Reply

erigaud
Honored Contributor

3 hours ago

1 kudos

Hello @Kaniz ,thank you for the very detailed answer. If I understand correctly there is no way to do this using temp views and using a Job Cluster ? I need in the case to use the same All-purpose for all my tasks in order to remain in the same spar...

1 kudos

3 hours ago

1 More Replies

by Mathias_Peters • New Contributor II

Friday

88 Views
1 replies
0 kudos

On the fly transformations on DLT tables

Hi, I am loading data from a kinesis data stream using DLT. CREATE STREAMING TABLE Consumers_kinesis_2 ( ..., unbase64(data) String, ... ) AS SELECT * FROM STREAM read_kinesis (...) Is it possible to directly cast, unbase64, and/or transform the resu...

Data Engineering

88 Views
1 replies
0 kudos

Friday

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

0 kudos

Hi @Mathias_Peters, When working with Amazon Kinesis Data Analytics, you can indeed transform data before writing it into a streaming table. Let’s explore some options: Unbase64 Transformation: To decode Base64-encoded data, you can use the unba...

0 kudos

yesterday

by stevenayers-bge • New Contributor II

a week ago

70 Views
1 replies
0 kudos

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

I'm recieving this error from autoloader. It seems to be stuck on this one file. I don't care when it was read and last modified, I just want to ingest it. Any ideas?java.io.IOException: Read old version of file s3a://<file-path>.json. Read modificat...

Data Engineering

70 Views
1 replies
0 kudos

a week ago

View Replies

Latest Reply

Kaniz
Community Manager

yesterday

0 kudos

Hi @stevenayers-bge, The error message indicates that the file you’re trying to read is an old version, and there’s a discrepancy between the read modification time and the latest modification time. Let’s explore some potential solutions based on ...

0 kudos

yesterday

User

Count

1602

736

344

284

247

Databricks

Forum Posts

Can we add tags to Unity Catalog through Terraform?

Delta Live table - Adding streaming to existing table

Databricks job keep getting failed due to executor lost.

Clearing data stored by pipelines

[Databricks Assets Bundles] no deployment state

Installing Databricks Connect breaks pyspark local cluster mode

Billing of Databricks Job clusters

ODBC PowerBI 2 commands in one query

Bug: Shallow Clone `create or replace` causing [TABLE_OR_VIEW_NOT_FOUND]

Can on-demand clusters be shared across multiple jobs using cluster pool with max capacity ?

BUG: Unity Catalog kills UDF

SQL code against tables with '>' in headers suddenly failing?

Pass Dataframe to child job in "Run Job" task

On the fly transformations on DLT tables

Autoloader: Read old version of file. Read modification time is X, latest modification time is X

Best way to parse Google Analytics data in Databri...

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...