cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

8b1tz
by Contributor
  • 613 Views
  • 1 replies
  • 0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

  • 613 Views
  • 1 replies
  • 0 kudos
pankaj30
by New Contributor II
  • 2199 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

  • 2199 Views
  • 3 replies
  • 2 kudos
Latest Reply
an313x
New Contributor III
  • 2 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

  • 2 kudos
2 More Replies
dream
by Contributor
  • 3602 Views
  • 4 replies
  • 2 kudos

Resolved! Accessing shallow cloned data through an External location fails

I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.I am creating a table on the first external location using the following command:%sqlcreate or replace table delta.`s3://avinashkhamanekar/tmp/test_table_origina...

  • 3602 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 2 kudos

Hello ,  This is an underlying exception that should occur with any SQL statement that require access to this file: part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet It looks like the Delta log is referencing a file that doesn't exi...

  • 2 kudos
3 More Replies
a-sky
by New Contributor II
  • 1223 Views
  • 0 replies
  • 1 kudos

Databricks job stalls without error, unable to pin-point error, all compute metrics seem ok

I have a job that gets stuck on "Determining DBIO File fragment" and I have not been able to figure out why this job keeps getting stuck. I monitor the job cluster metrics through out the job and it doesnt seem like its hitting any bottlenecks with m...

asky_0-1721405223209.png asky_1-1721404695718.png asky_2-1721404734997.png asky_3-1721404753865.png
  • 1223 Views
  • 0 replies
  • 1 kudos
hayden_blair
by New Contributor III
  • 715 Views
  • 2 replies
  • 0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

  • 715 Views
  • 2 replies
  • 0 kudos
Latest Reply
hayden_blair
New Contributor III
  • 0 kudos

Thank you for the response @szymon_dybczak. Do you know if single user clusters are inherently less secure? I am still curious about why single user access mode is not allowed for DLT + Unity Catalog.

  • 0 kudos
1 More Replies
Chandru
by New Contributor III
  • 5441 Views
  • 3 replies
  • 7 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

  • 5441 Views
  • 3 replies
  • 7 kudos
Latest Reply
Flo
New Contributor III
  • 7 kudos

If anybody ends up here after 2024: the init file must now be placed in the workspace for the cluster to accept it.So in Workspace, use Create/File to create the init script.Then add it to the cluster config inCompute - Your cluster - Advanced Config...

  • 7 kudos
2 More Replies
sinclair
by New Contributor II
  • 2589 Views
  • 6 replies
  • 1 kudos

Py4JJavaError: An error occurred while calling o465.coun

The following error occured when running .count() on a big sparkDF. Py4JJavaError: An error occurred while calling o465.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failur...

  • 2589 Views
  • 6 replies
  • 1 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 1 kudos

Hi @sinclair , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 1 kudos
5 More Replies
NandaKishoreI
by New Contributor II
  • 800 Views
  • 1 replies
  • 0 kudos

Databricks upon inserting delta table data inserts into folders in Dev

We have a Delta Table in Databricks. When we are inserting data into the Delta Table, in the storage account, it creates folders like: 05, 0H, 0F, 0O, 1T,1W, etc... and adds the parquet files there.We have not defined any partitions. We are inserting...

  • 800 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @NandaKishoreI , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
spicysheep
by New Contributor
  • 1386 Views
  • 3 replies
  • 2 kudos

Where to find comprehensive docs on databricks.yaml / DAB settings options

Where can I find documentation on how to set cluster settings (e.g., AWS instance type, spot vs on-demand, number of machines) in Databricks Asset Bundle databicks.yaml files? The only documentation I've come across mentions these things indirectly, ...

  • 1386 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 2 kudos

Hi @spicysheep , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 2 kudos
2 More Replies
inagar
by New Contributor
  • 725 Views
  • 1 replies
  • 0 kudos

Copying file from DBFS to a table of Databricks, Is there a way to get the errors at record level ?

We have file of data to be ingested into a table of Databricks. Following below approach,Uploaded file to DBFSCreating a temporary table and loading above file to the temporary table. CREATE TABLE [USING] Use MERGE INTO to merge temp_table created in...

  • 725 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @inagar , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedback...

  • 0 kudos
Maatari
by New Contributor III
  • 1743 Views
  • 2 replies
  • 0 kudos

Resolved! How to monitor Kafka consumption / lag when working with spark structured streaming?

I have just find out spark structured streaming do not commit offset to kafka but use its internal checkpoint system and that there is no way to visualize its consumption lag in typical kafka UI- https://community.databricks.com/t5/data-engineering/c...

  • 1743 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @Maatari , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
thackman
by New Contributor II
  • 9332 Views
  • 5 replies
  • 0 kudos

Databricks cluster random slow start times.

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python l...

thackman_1-1720639616797.png thackman_0-1720639478363.png
  • 9332 Views
  • 5 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @thackman , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 0 kudos
4 More Replies
ashraf1395
by Valued Contributor III
  • 703 Views
  • 1 replies
  • 0 kudos

Spark code not running bcz of incorrect compute size

I have a dataset having 260 billion recordsI need to group by 4 columns and find out the sum on four other columnsI increased the memory to e32 for driver and workers nodes, max workers is 40The job still is stuck in this aggregate step where I’m wri...

  • 703 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @ashraf1395 , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 0 kudos
seefoods
by New Contributor III
  • 412 Views
  • 1 replies
  • 1 kudos

audit log for workspace users

Hello Everyone, How to retrieve trace execution of a Notebook databricks GCP Users Workspace.  Thanks

  • 412 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @seefoods ,I think you can use system tables to get such information:https://docs.databricks.com/en/admin/system-tables/audit-logs.html

  • 1 kudos
WAHID
by New Contributor II
  • 447 Views
  • 0 replies
  • 0 kudos

GDAL on Databricks serverless compute

I am wondering if it's possible to install and use GDAL on Databricks serverless compute. I couldn't manage to do that using pip install gdal, and I discovered that init scripts are not supported on serverless compute.

  • 447 Views
  • 0 replies
  • 0 kudos

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels