cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Sheeraj9191
by New Contributor
  • 1004 Views
  • 1 replies
  • 0 kudos
  • 1004 Views
  • 1 replies
  • 0 kudos
Latest Reply
brockb
Databricks Employee
  • 0 kudos

Hi @Sheeraj9191 , I believe the table you are looking for is `system.billing.usage`  (docs: https://docs.databricks.com/en/admin/system-tables/billing.html#billable-usage-table-schema). This table contains information at the job level in field `usage...

  • 0 kudos
128941
by New Contributor III
  • 2043 Views
  • 2 replies
  • 1 kudos

What are best practices for the Datatabricks workflow jobs?

Recommendations on how many tables per workflow?inter dependency between the workflows?Custom schedule?Monitoring?Reports? 

  • 2043 Views
  • 2 replies
  • 1 kudos
Latest Reply
128941
New Contributor III
  • 1 kudos

product max limit and best practices.

  • 1 kudos
1 More Replies
KosmaS
by New Contributor III
  • 2041 Views
  • 0 replies
  • 0 kudos

Lost Databricks' dependency in a job.

Hey,I had a stable notebook within the whole job. It contains one action defined as dumping data to s3. Currently, it started generating some issues. Maybe someone can suggest either how to investigate it further or what to try to do with such kinds ...

Screenshot 2024-07-19 at 19.55.48.png
  • 2041 Views
  • 0 replies
  • 0 kudos
8b1tz
by Contributor
  • 1108 Views
  • 1 replies
  • 0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

  • 1108 Views
  • 1 replies
  • 0 kudos
pankaj30
by New Contributor II
  • 3456 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

  • 3456 Views
  • 3 replies
  • 2 kudos
Latest Reply
an313x
New Contributor III
  • 2 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

  • 2 kudos
2 More Replies
dream
by Contributor
  • 5155 Views
  • 4 replies
  • 2 kudos

Resolved! Accessing shallow cloned data through an External location fails

I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.I am creating a table on the first external location using the following command:%sqlcreate or replace table delta.`s3://avinashkhamanekar/tmp/test_table_origina...

  • 5155 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 2 kudos

Hello ,  This is an underlying exception that should occur with any SQL statement that require access to this file: part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet It looks like the Delta log is referencing a file that doesn't exi...

  • 2 kudos
3 More Replies
hayden_blair
by New Contributor III
  • 1487 Views
  • 2 replies
  • 0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

  • 1487 Views
  • 2 replies
  • 0 kudos
Latest Reply
hayden_blair
New Contributor III
  • 0 kudos

Thank you for the response @szymon_dybczak. Do you know if single user clusters are inherently less secure? I am still curious about why single user access mode is not allowed for DLT + Unity Catalog.

  • 0 kudos
1 More Replies
Chandru
by New Contributor III
  • 6804 Views
  • 3 replies
  • 7 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

  • 6804 Views
  • 3 replies
  • 7 kudos
Latest Reply
Flo
New Contributor III
  • 7 kudos

If anybody ends up here after 2024: the init file must now be placed in the workspace for the cluster to accept it.So in Workspace, use Create/File to create the init script.Then add it to the cluster config inCompute - Your cluster - Advanced Config...

  • 7 kudos
2 More Replies
sinclair
by New Contributor II
  • 4352 Views
  • 6 replies
  • 1 kudos

Py4JJavaError: An error occurred while calling o465.coun

The following error occured when running .count() on a big sparkDF. Py4JJavaError: An error occurred while calling o465.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failur...

  • 4352 Views
  • 6 replies
  • 1 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 1 kudos

Hi @sinclair , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 1 kudos
5 More Replies
NandaKishoreI
by New Contributor II
  • 1271 Views
  • 1 replies
  • 0 kudos

Databricks upon inserting delta table data inserts into folders in Dev

We have a Delta Table in Databricks. When we are inserting data into the Delta Table, in the storage account, it creates folders like: 05, 0H, 0F, 0O, 1T,1W, etc... and adds the parquet files there.We have not defined any partitions. We are inserting...

  • 1271 Views
  • 1 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 0 kudos

Hi @NandaKishoreI , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
spicysheep
by New Contributor II
  • 2772 Views
  • 3 replies
  • 2 kudos

Where to find comprehensive docs on databricks.yaml / DAB settings options

Where can I find documentation on how to set cluster settings (e.g., AWS instance type, spot vs on-demand, number of machines) in Databricks Asset Bundle databicks.yaml files? The only documentation I've come across mentions these things indirectly, ...

  • 2772 Views
  • 3 replies
  • 2 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 2 kudos

Hi @spicysheep , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 2 kudos
2 More Replies
inagar
by New Contributor
  • 2529 Views
  • 1 replies
  • 0 kudos

Copying file from DBFS to a table of Databricks, Is there a way to get the errors at record level ?

We have file of data to be ingested into a table of Databricks. Following below approach,Uploaded file to DBFSCreating a temporary table and loading above file to the temporary table. CREATE TABLE [USING] Use MERGE INTO to merge temp_table created in...

  • 2529 Views
  • 1 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 0 kudos

Hi @inagar , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback...

  • 0 kudos
Maatari
by New Contributor III
  • 3474 Views
  • 2 replies
  • 0 kudos

Resolved! How to monitor Kafka consumption / lag when working with spark structured streaming?

I have just find out spark structured streaming do not commit offset to kafka but use its internal checkpoint system and that there is no way to visualize its consumption lag in typical kafka UI- https://community.databricks.com/t5/data-engineering/c...

  • 3474 Views
  • 2 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 0 kudos

Hi @Maatari , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
thackman
by New Contributor III
  • 17308 Views
  • 5 replies
  • 0 kudos

Databricks cluster random slow start times.

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python l...

thackman_1-1720639616797.png thackman_0-1720639478363.png
  • 17308 Views
  • 5 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 0 kudos

Hi @thackman , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 0 kudos
4 More Replies
ashraf1395
by Honored Contributor
  • 1044 Views
  • 1 replies
  • 0 kudos

Spark code not running bcz of incorrect compute size

I have a dataset having 260 billion recordsI need to group by 4 columns and find out the sum on four other columnsI increased the memory to e32 for driver and workers nodes, max workers is 40The job still is stuck in this aggregate step where I’m wri...

  • 1044 Views
  • 1 replies
  • 0 kudos
Latest Reply
RishabhTiwari07
Databricks Employee
  • 0 kudos

Hi @ashraf1395 , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 0 kudos
Labels