Data Engineering

Forum Posts

Sorted by:

by Kash • Contributor III

01-24-2023 9:45:49 AM

1227 Views
4 replies
0 kudos

Creating a spot only single-node job compute cluster policy

Hi there,I need some help creating a new cluster policy that utilizes a single spot-instnace server to complete a job. I want to set this up as a job-compute to reduce costs and also utilize 1 spot instance.The jobs I need to ETL are very short and c...

Data Engineering

1227 Views
4 replies
0 kudos

01-24-2023 9:45:49 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-24-2023 3:41:13 PM

0 kudos

Hi @Avkash Kana,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

0 kudos

02-24-2023 3:41:13 PM

3 More Replies

by databicky • Contributor II

01-22-2023 4:58:10 AM

1063 Views
4 replies
0 kudos

how to optimize the runtime in 10.4 cluster

i am loading the 1billion data from spark dataframe into target table, but in the 7.3 cluster it takes 3 hours to complete but after migrated to 10.4 cluster its taking 8 hours to complete , how can i reduce the time duration

Data Engineering

1063 Views
4 replies
0 kudos

01-22-2023 4:58:10 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-24-2023 5:33:35 PM

0 kudos

Hi @Mohammed sadamusean,Could you provide more details on what are you doing? What type of transformations/actions are you doing? whats your source and sink? batch or streaming? all that information will help.

0 kudos

01-24-2023 5:33:35 PM

3 More Replies

by RafikiT97 • New Contributor

01-24-2023 11:50:01 AM

2107 Views
3 replies
0 kudos

Query Databricks from Power BI with Row Level Security

I am trying to apply RLS to the solution but Power BI only connects to Databricks(DB) using a token which cant be used in DB groups. Is there no other way to apply Row Level security using Power BI?

Data Engineering

2107 Views
3 replies
0 kudos

01-24-2023 11:50:01 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-24-2023 3:37:49 PM

0 kudos

Hi @Daniel Gomes,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

0 kudos

02-24-2023 3:37:49 PM

2 More Replies

by farooqurrehman • New Contributor

01-23-2023 8:59:17 AM

1152 Views
3 replies
2 kudos

Unable to connect/read files from ADLS Gen2 using account key

It gives error[RequestId=5e57b66f-b69f-4e8b-8706-3fe5baeb77a0 ErrorClass=METASTORE_DOES_NOT_EXIST] No metastore assigned for the current workspace.using the following codespark.conf.set( "fs.azure.account.key.mystorageaccount.dfs.core.windows.net", ...

Data Engineering

1152 Views
3 replies
2 kudos

01-23-2023 8:59:17 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-24-2023 3:36:46 PM

2 kudos

Hi @Farooq ur rehman,Just a friendly follow-up. Did any of the responses help you to resolve your question? if it did, please mark it as best. Otherwise, please let us know if you still need help.

2 kudos

02-24-2023 3:36:46 PM

2 More Replies

by Lizhi_Dong • New Contributor II

01-23-2023 6:27:55 AM

759 Views
3 replies
0 kudos

Tables disappear when I re-start a new cluster on Community Edition

What would be the best plan for independent course creator?Hi folks! I want to use databrick community edition as the platform to teach online courses. As you may know, for community edition, you need to create a new cluster when the old one terminat...

Data Engineering

759 Views
3 replies
0 kudos

01-23-2023 6:27:55 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

01-24-2023 4:29:40 PM

0 kudos

Hi @Lizhi Dong,This might be a limitation from Community Edition. When your cluster gets terminated all your tables will be removed.

0 kudos

01-24-2023 4:29:40 PM

2 More Replies

by SRK • Contributor III

01-23-2023 3:20:31 AM

1460 Views
5 replies
0 kudos

Delta Live Tables data quality rules application.

I have a requirement, where I need to apply inverse DQ rule on a table to track the invalid data. For which I can use the following approach:import dltrules = {}quarantine_rules = {}rules["valid_website"] = "(Website IS NOT NULL)"rules["valid_locatio...

Data Engineering

1460 Views
5 replies
0 kudos

01-23-2023 3:20:31 AM

View Replies

Latest Reply

Hubert-Dudek
Esteemed Contributor III

01-23-2023 5:06:33 AM

0 kudos

You can get additional info from DLT event log which is in delta so you can load it as table https://docs.databricks.com/workflows/delta-live-tables/delta-live-tables-event-log.html#data-quality

0 kudos

01-23-2023 5:06:33 AM

4 More Replies

by Hubert-Dudek • Esteemed Contributor III

01-24-2023 10:59:39 AM

419 Views
1 replies
10 kudos

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete ...

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete all target rows which doesn't match any source.

Data Engineering

419 Views
1 replies
10 kudos

01-24-2023 10:59:39 AM

View Replies

Latest Reply

jose_gonzalez
Moderator

02-24-2023 3:21:00 PM

10 kudos

Thank you for sharing @Hubert Dudek

10 kudos

02-24-2023 3:21:00 PM

by Daba • New Contributor III

06-07-2022 9:16:24 AM

3794 Views
5 replies
5 kudos

DLT streaming table and LEFT JOIN

I'm trying to build gold level streaming live table based on two streaming silver live tables with left join.This attempt fails with the next error:"Append mode error: Stream-stream LeftOuter join between two streaming DataFrame/Datasets is not suppo...

Data Engineering

3794 Views
5 replies
5 kudos

06-07-2022 9:16:24 AM

View Replies

Latest Reply

Daba
New Contributor III

06-12-2022 1:02:58 AM

5 kudos

Thanks Fatma,I do understand the need for watermarks, but I'm just wondering if this supported by SQL syntax?

5 kudos

06-12-2022 1:02:58 AM

4 More Replies

by ackerman_chris • New Contributor III

02-23-2023 7:06:37 AM

1704 Views
1 replies
1 kudos

Resolved! Azure Devops Git sync failed in Azure Databricks

Hello,I am currently attempting to setup a Git Repo within Azure Devops to use on my Azure Databricks Workspace environment for various notebooks. I went through the process of creating a Personal Access Token (PAT) on Devops, and have inputted the t...

Data Engineering

1704 Views
1 replies
1 kudos

02-23-2023 7:06:37 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-24-2023 1:50:19 PM

1 kudos

Hi @Christopher Ackerman, This error message usually occurs when there is an issue with authentication between Azure Databricks and Azure DevOps. One possible reason for this error is that the token was not granted the necessary permissions to acces...

1 kudos

02-24-2023 1:50:19 PM

by phaezel • New Contributor

02-24-2023 7:56:29 AM

592 Views
1 replies
0 kudos

Just wondering if the DLT Unity Catalog integration is available in preview?

Data Engineering

592 Views
1 replies
0 kudos

02-24-2023 7:56:29 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-24-2023 12:41:32 PM

0 kudos

Hi @peter dhaeseleer, As of my current knowledge in February 2023, I am unaware of any official announcement from Databricks regarding the availability of DLT Unity Catalog integration in preview.

0 kudos

02-24-2023 12:41:32 PM

by Mohit_m • Valued Contributor II

06-15-2022 5:23:13 AM

13908 Views
2 replies
4 kudos

Resolved! How to get the Job ID and Run ID and save into a database

We are having Databricks Job running with main class and JAR file in it. Our JAR file code base is in Scala. Now, when our job starts running, we need to log Job ID and Run ID into the database for future purpose. How can we achieve this?

Data Engineering

13908 Views
2 replies
4 kudos

06-15-2022 5:23:13 AM

View Replies

Latest Reply

User16783853961
New Contributor II

02-24-2023 11:16:33 AM

4 kudos

Here is a blog with code and examples on how to achieve this https://medium.com/@canadiandataguy/how-to-get-the-job-id-and-run-id-for-a-databricks-job-b0da484e66f5

4 kudos

02-24-2023 11:16:33 AM

1 More Replies

by zak • New Contributor II

02-16-2023 11:03:51 AM

2496 Views
4 replies
4 kudos

Resolved! add custom metadata to avro file with pyspark

Hello, i need to add a custom metadata into a avro file. The avro file containt data. we have tried to use "option" within the write function but it's not taken without generated any error.df.write.format("avro").option("avro.codec", "snappy").option...

Data Engineering

2496 Views
4 replies
4 kudos

02-16-2023 11:03:51 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-23-2023 4:06:17 AM

4 kudos

Hi @zakaria belamri, You can add custom metadata to an Avro file in PySpark by creating an Avro schema with the custom metadata fields and passing it to the DataFrameWriter as an option. Here's an example code snippet that demonstrates how to do thi...

4 kudos

02-23-2023 4:06:17 AM

3 More Replies

by JordanYaker • Contributor

02-20-2023 1:56:05 PM

963 Views
3 replies
3 kudos

Resolved! What is the maximum number of workspaces per account using Databricks on AWS?

I've been looking through the documentation and I swear this used to be listed somewhere, but for the life of me I can't find it anymore.

Data Engineering

963 Views
3 replies
3 kudos

02-20-2023 1:56:05 PM

View Replies

Latest Reply

JordanYaker
Contributor

02-24-2023 8:19:00 AM

3 kudos

Thanks @Kaniz Fatma

3 kudos

02-24-2023 8:19:00 AM

2 More Replies

by kkawka1 • New Contributor III

02-23-2023 1:21:58 AM

3314 Views
8 replies
10 kudos

Resolved! Removing files saved in the root FileStore

We have just started working with databricks in one of my university modules, and the lecturers gave us a set of commands to practice saving data in the FileStore. One of the commands was the following:dbutils .fs.cp("/ databricks - datasets / weathh...

Data Engineering

3314 Views
8 replies
10 kudos

02-23-2023 1:21:58 AM

View Replies

Latest Reply

Kaniz
Community Manager

02-24-2023 7:09:49 AM

10 kudos

Hi @Konrad Kawka , Are you using the Community Edition?

10 kudos

02-24-2023 7:09:49 AM

7 More Replies

by RamaTeja • New Contributor II

02-23-2023 12:02:24 AM

1068 Views
3 replies
2 kudos

Unity Catalog metastore list is showing empty

Hi ,I am not able to list the meta-stores in databricks cli using the below command :databricks unity-catalog metastores list{}but when I tried databricks unity-catalog metastores get-summary I am able to get the meta-store info .Can anyone help me ...

Data Engineering

1068 Views
3 replies
2 kudos

02-23-2023 12:02:24 AM

View Replies

Latest Reply

RamaTeja
New Contributor II

02-24-2023 6:13:37 AM

2 kudos

Hi @Kaniz Fatma , Unity catalog is enabled in my workspace and i have been assigned metastore admin and account admin also.databricks unity-catalog metastores list --debugHTTP debugging enabledsend: b'GET /api/2.1/unity-catalog/metastores HTTP/1.1\r...

2 kudos

02-24-2023 6:13:37 AM

2 More Replies

User

Count

1601

736

343

284

247

Databricks

Forum Posts

Creating a spot only single-node job compute cluster policy

how to optimize the runtime in 10.4 cluster

Query Databricks from Power BI with Row Level Security

Unable to connect/read files from ADLS Gen2 using account key

Tables disappear when I re-start a new cluster on Community Edition

Delta Live Tables data quality rules application.

Since databricks runtime 12.1 "WHEN NOT MATCHED BY SOURCE" was added to MERGE syntax. For example, using that option, we can quickly delete ...

DLT streaming table and LEFT JOIN

Resolved! Azure Devops Git sync failed in Azure Databricks

Just wondering if the DLT Unity Catalog integration is available in preview?

Resolved! How to get the Job ID and Run ID and save into a database

Resolved! add custom metadata to avro file with pyspark

Resolved! What is the maximum number of workspaces per account using Databricks on AWS?

Resolved! Removing files saved in the root FileStore

Unity Catalog metastore list is showing empty

DELTA_EXCEED_CHAR_VARCHAR_LIMIT

Not able to set run_as service_principal_name

Pyspark operations slowness in CLuster 14.3LTS as ...

[Databricks Assets Bundles] Workflow trigger on fi...

Addressing Pipeline Error Handling in Databricks b...