Data Engineering

Forum Posts

Sorted by:

by Sreyasi_Thakur • New Contributor II

a week ago

188 Views
2 replies
0 kudos

DLT Pipeline on Hive Metastore

I am creating a DLT pipeline on Hive Metastore (destination is Hive Metastore) and using a notebook within the pipeline which reads a unity catalog table. But, I am getting an error- [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.Is it...

Data Engineering

188 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a week ago

0 kudos

Hi @Sreyasi_Thakur, Yes, this is a known limitation. When you define the pipeline destination as Hive Metastore, you cannot read tables from Unity Catalog within the same pipeline. Delta Live Tables (DLT) pipelines can either use the Hive Metastore o...

0 kudos

a week ago

1 More Replies

by Rishabh-Pandey • Honored Contributor III

Sunday

85 Views
0 replies
0 kudos

Welcome to the World's Biggest Databricks Hackathon SPARK-WARS 3.0

Here is the link to registerWelcome to the World's Biggest Databricks Hackathon SPARK-WARS 3.0 @Kaniz_Fatma @Sujitha @Rishabh_Tiwari

Data Engineering

85 Views
0 replies
0 kudos

Sunday

by Sheeraj9191 • New Contributor

06-12-2024 12:42:55 PM

158 Views
1 replies
0 kudos

What system tables provide DBU cost incurred by a common job for each iteration?

Data Engineering

158 Views
1 replies
0 kudos

06-12-2024 12:42:55 PM

View Replies

Latest Reply

brockb
Valued Contributor

Sunday

0 kudos

Hi @Sheeraj9191 , I believe the table you are looking for is `system.billing.usage` (docs: https://docs.databricks.com/en/admin/system-tables/billing.html#billable-usage-table-schema). This table contains information at the job level in field `usage...

0 kudos

Sunday

by NanthakumarYoga • New Contributor II

03-25-2024 5:31:30 AM

3540 Views
2 replies
2 kudos

Partition in Spark

Hi Community, Need your help on understanding below topics.. I have a huge transaction file ( 20GB ) partition by transaction_date column , parquet file. I have evenly distributed data ( no skew ). There are 10 days of data and we have 10 partition f...

Data Engineering

3540 Views
2 replies
2 kudos

03-25-2024 5:31:30 AM

View Replies

Latest Reply

payalbhatia
New Contributor

Sunday

2 kudos

I have follow up questions here :1) OP mentions about the 1 GB of data in each folder. So , the spark will read ~8 partitions on 8 cores(if there ) ?2)what if I get empty partitions after shuffle?

2 kudos

Sunday

1 More Replies

by iamgoda • New Contributor III

2 weeks ago

415 Views
8 replies
3 kudos

Databricks SQL script slow execution in workflows using serverless

I am running a very simple SQL script within a notebook, using an X-Small SQL Serverless warehouse (that is already running). The execution time is different depending on how it's run:4s if run interactively (and through SQL editor)26s if run within ...

Data Engineering

415 Views
8 replies
3 kudos

2 weeks ago

View Replies

Latest Reply

BilalAslamDbrx
Honored Contributor III

2 weeks ago

3 kudos

@iamgoda we are going to look into how to make this faster. There's a poll loop in Databricks Workflows for SQL notebooks (but not for SQL scripts) which causes things to slow down.

3 kudos

2 weeks ago

7 More Replies

by 128941 • New Contributor III

06-29-2023 11:51:05 AM

694 Views
2 replies
1 kudos

What are best practices for the Datatabricks workflow jobs?

Recommendations on how many tables per workflow?inter dependency between the workflows?Custom schedule?Monitoring?Reports?

Data Engineering

694 Views
2 replies
1 kudos

06-29-2023 11:51:05 AM

View Replies

Latest Reply

128941
New Contributor III

a week ago

1 kudos

product max limit and best practices.

1 kudos

a week ago

1 More Replies

by 128941 • New Contributor III

06-12-2024 12:29:39 PM

962 Views
4 replies
0 kudos

Is there any better way to manage workflow dependencies using databricks workflows

Looking for solution on how to manage job dependencies using databricks workflows.

Data Engineering

962 Views
4 replies
0 kudos

06-12-2024 12:29:39 PM

View Replies

Latest Reply

128941
New Contributor III

a week ago

0 kudos

We have not tried Asset Bundles.

0 kudos

a week ago

3 More Replies

by 8b1tz • New Contributor II

a week ago

149 Views
2 replies
0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

Data Engineering

149 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

Kaniz_Fatma
Community Manager

a week ago

0 kudos

Hi @8b1tz, configure ADF to send pipeline logs to an Azure Storage Account, Azure Log Analytics, or Event Hubs. This ensures that logs are persisted and can be accessed by Databricks.If you need more detailed guidance or run into specific issues, fee...

0 kudos

a week ago

1 More Replies

by pankaj30 • New Contributor II

03-19-2024 2:52:54 AM

881 Views
4 replies
3 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

Data Engineering

881 Views
4 replies
3 kudos

03-19-2024 2:52:54 AM

View Replies

Latest Reply

an313x
New Contributor II

a week ago

3 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

3 kudos

a week ago

3 More Replies

by dream • Contributor

4 weeks ago

469 Views
4 replies
2 kudos

Accessing shallow cloned data through an External location fails

I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.I am creating a table on the first external location using the following command:%sqlcreate or replace table delta.`s3://avinashkhamanekar/tmp/test_table_origina...

Data Engineering

469 Views
4 replies
2 kudos

4 weeks ago

View Replies

Latest Reply

raphaelblg
Honored Contributor II

4 weeks ago

2 kudos

Hello , This is an underlying exception that should occur with any SQL statement that require access to this file: part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet It looks like the Delta log is referencing a file that doesn't exi...

2 kudos

4 weeks ago

3 More Replies

by a-sky • New Contributor

a week ago

128 Views
0 replies
0 kudos

Databricks job stalls without error, unable to pin-point error, all compute metrics seem ok

I have a job that gets stuck on "Determining DBIO File fragment" and I have not been able to figure out why this job keeps getting stuck. I monitor the job cluster metrics through out the job and it doesnt seem like its hitting any bottlenecks with m...

Data Engineering

workflow

128 Views
0 replies
0 kudos

a week ago

by hayden_blair • New Contributor III

a week ago

108 Views
2 replies
0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

Data Engineering

108 Views
2 replies
0 kudos

a week ago

View Replies

Latest Reply

hayden_blair
New Contributor III

a week ago

0 kudos

Thank you for the response @Slash. Do you know if single user clusters are inherently less secure? I am still curious about why single user access mode is not allowed for DLT + Unity Catalog.

0 kudos

a week ago

1 More Replies

by Chandru • New Contributor III

11-29-2022 6:08:09 AM

3798 Views
3 replies
7 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

Data Engineering

3798 Views
3 replies
7 kudos

11-29-2022 6:08:09 AM

View Replies

Latest Reply

Flo
New Contributor II

a week ago

7 kudos

If anybody ends up here after 2024: the init file must now be placed in the workspace for the cluster to accept it.So in Workspace, use Create/File to create the init script.Then add it to the cluster config inCompute - Your cluster - Advanced Config...

7 kudos

a week ago

2 More Replies

by sinclair • New Contributor II

2 weeks ago

393 Views
7 replies
1 kudos

Py4JJavaError: An error occurred while calling o465.coun

The following error occured when running .count() on a big sparkDF. Py4JJavaError: An error occurred while calling o465.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failur...

Data Engineering

393 Views
7 replies
1 kudos

2 weeks ago

View Replies

Latest Reply

Rishabh_Tiwari
Community Manager

a week ago

1 kudos

Hi @sinclair , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

1 kudos

a week ago

6 More Replies

by NandaKishoreI • New Contributor II

2 weeks ago

123 Views
2 replies
0 kudos

Resolved! Databricks upon inserting delta table data inserts into folders in Dev

We have a Delta Table in Databricks. When we are inserting data into the Delta Table, in the storage account, it creates folders like: 05, 0H, 0F, 0O, 1T,1W, etc... and adds the parquet files there.We have not defined any partitions. We are inserting...

Data Engineering

123 Views
2 replies
0 kudos

2 weeks ago

View Replies

Latest Reply

Rishabh_Tiwari
Community Manager

a week ago

0 kudos

Hi @NandaKishoreI , Thank you for reaching out to our community! We're here to help you. To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

0 kudos

a week ago

1 More Replies

User

Count

1603

744

348

285

247

Databricks Community

Forum Posts

DLT Pipeline on Hive Metastore

Welcome to the World's Biggest Databricks Hackathon SPARK-WARS 3.0

What system tables provide DBU cost incurred by a common job for each iteration?

Partition in Spark

Databricks SQL script slow execution in workflows using serverless

What are best practices for the Datatabricks workflow jobs?

Is there any better way to manage workflow dependencies using databricks workflows

Data factory logs into databricks delta table

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Accessing shallow cloned data through an External location fails

Databricks job stalls without error, unable to pin-point error, all compute metrics seem ok

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

Py4JJavaError: An error occurred while calling o465.coun

Resolved! Databricks upon inserting delta table data inserts into folders in Dev

Compute Policy Does Not Install Libraries

Is there a way to let the DLT pipeline retry by it...

Can't create Catalog on Databricks on AWS

Executing Notebooks - Run All Cells vs Run All Bel...

getting Status code: 301 Moved Permanently error