cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

Sreyasi_Thakur
by New Contributor II
  • 188 Views
  • 2 replies
  • 0 kudos

DLT Pipeline on Hive Metastore

I am creating a DLT pipeline on Hive Metastore (destination is Hive Metastore) and using a notebook within the pipeline which reads a unity catalog table. But, I am getting an error- [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.Is it...

  • 188 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sreyasi_Thakur, Yes, this is a known limitation. When you define the pipeline destination as Hive Metastore, you cannot read tables from Unity Catalog within the same pipeline. Delta Live Tables (DLT) pipelines can either use the Hive Metastore o...

  • 0 kudos
1 More Replies
Sheeraj9191
by New Contributor
  • 158 Views
  • 1 replies
  • 0 kudos
  • 158 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Valued Contributor
  • 0 kudos

Hi @Sheeraj9191 , I believe the table you are looking for is `system.billing.usage`  (docs: https://docs.databricks.com/en/admin/system-tables/billing.html#billable-usage-table-schema). This table contains information at the job level in field `usage...

  • 0 kudos
NanthakumarYoga
by New Contributor II
  • 3540 Views
  • 2 replies
  • 2 kudos

Partition in Spark

Hi Community, Need your help on understanding below topics.. I have a huge transaction file ( 20GB ) partition by transaction_date column , parquet file. I have evenly distributed data ( no skew ). There are 10 days of data and we have 10 partition f...

  • 3540 Views
  • 2 replies
  • 2 kudos
Latest Reply
payalbhatia
New Contributor
  • 2 kudos

I have follow up questions here :1) OP mentions about the 1 GB of data in each folder. So , the spark will read ~8 partitions on 8 cores(if there ) ?2)what if I get empty partitions after shuffle?

  • 2 kudos
1 More Replies
iamgoda
by New Contributor III
  • 415 Views
  • 8 replies
  • 3 kudos

Databricks SQL script slow execution in workflows using serverless

I am running a very simple SQL script within a notebook, using an X-Small SQL Serverless warehouse (that is already running). The execution time is different depending on how it's run:4s if run interactively (and through SQL editor)26s if run within ...

iamgoda_4-1720697910509.png iamgoda_5-1720697937883.png iamgoda_7-1720698691523.png iamgoda_0-1720701617441.png
  • 415 Views
  • 8 replies
  • 3 kudos
Latest Reply
BilalAslamDbrx
Honored Contributor III
  • 3 kudos

@iamgoda  we are going to look into how to make this faster. There's a poll loop in Databricks Workflows for SQL notebooks (but not for SQL scripts) which causes things to slow down. 

  • 3 kudos
7 More Replies
8b1tz
by New Contributor II
  • 149 Views
  • 2 replies
  • 0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

  • 149 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @8b1tz, configure ADF to send pipeline logs to an Azure Storage Account, Azure Log Analytics, or Event Hubs. This ensures that logs are persisted and can be accessed by Databricks.If you need more detailed guidance or run into specific issues, fee...

  • 0 kudos
1 More Replies
pankaj30
by New Contributor II
  • 881 Views
  • 4 replies
  • 3 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

  • 881 Views
  • 4 replies
  • 3 kudos
Latest Reply
an313x
New Contributor II
  • 3 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

  • 3 kudos
3 More Replies
dream
by Contributor
  • 469 Views
  • 4 replies
  • 2 kudos

Accessing shallow cloned data through an External location fails

I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.I am creating a table on the first external location using the following command:%sqlcreate or replace table delta.`s3://avinashkhamanekar/tmp/test_table_origina...

  • 469 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Honored Contributor II
  • 2 kudos

Hello ,  This is an underlying exception that should occur with any SQL statement that require access to this file: part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet It looks like the Delta log is referencing a file that doesn't exi...

  • 2 kudos
3 More Replies
hayden_blair
by New Contributor III
  • 108 Views
  • 2 replies
  • 0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

  • 108 Views
  • 2 replies
  • 0 kudos
Latest Reply
hayden_blair
New Contributor III
  • 0 kudos

Thank you for the response @Slash. Do you know if single user clusters are inherently less secure? I am still curious about why single user access mode is not allowed for DLT + Unity Catalog.

  • 0 kudos
1 More Replies
Chandru
by New Contributor III
  • 3798 Views
  • 3 replies
  • 7 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

  • 3798 Views
  • 3 replies
  • 7 kudos
Latest Reply
Flo
New Contributor II
  • 7 kudos

If anybody ends up here after 2024: the init file must now be placed in the workspace for the cluster to accept it.So in Workspace, use Create/File to create the init script.Then add it to the cluster config inCompute - Your cluster - Advanced Config...

  • 7 kudos
2 More Replies
sinclair
by New Contributor II
  • 393 Views
  • 7 replies
  • 1 kudos

Py4JJavaError: An error occurred while calling o465.coun

The following error occured when running .count() on a big sparkDF. Py4JJavaError: An error occurred while calling o465.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failur...

  • 393 Views
  • 7 replies
  • 1 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 1 kudos

Hi @sinclair , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 1 kudos
6 More Replies
NandaKishoreI
by New Contributor II
  • 123 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks upon inserting delta table data inserts into folders in Dev

We have a Delta Table in Databricks. When we are inserting data into the Delta Table, in the storage account, it creates folders like: 05, 0H, 0F, 0O, 1T,1W, etc... and adds the parquet files there.We have not defined any partitions. We are inserting...

  • 123 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @NandaKishoreI , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
1 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels
Top Kudoed Authors