cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

8b1tz
by Contributor
  • 469 Views
  • 1 replies
  • 0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

  • 469 Views
  • 1 replies
  • 0 kudos
Latest Reply
" src="" />
This widget could not be displayed.
This widget could not be displayed.
This widget could not be displayed.
  • 0 kudos

This widget could not be displayed.
Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

This widget could not be displayed.
  • 0 kudos
This widget could not be displayed.
pankaj30
by New Contributor II
  • 1798 Views
  • 3 replies
  • 2 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

  • 1798 Views
  • 3 replies
  • 2 kudos
Latest Reply
an313x
New Contributor III
  • 2 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

  • 2 kudos
2 More Replies
dream
by Contributor
  • 3198 Views
  • 4 replies
  • 2 kudos

Resolved! Accessing shallow cloned data through an External location fails

I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.I am creating a table on the first external location using the following command:%sqlcreate or replace table delta.`s3://avinashkhamanekar/tmp/test_table_origina...

  • 3198 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Databricks Employee
  • 2 kudos

Hello ,  This is an underlying exception that should occur with any SQL statement that require access to this file: part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet It looks like the Delta log is referencing a file that doesn't exi...

  • 2 kudos
3 More Replies
a-sky
by New Contributor II
  • 875 Views
  • 0 replies
  • 0 kudos

Databricks job stalls without error, unable to pin-point error, all compute metrics seem ok

I have a job that gets stuck on "Determining DBIO File fragment" and I have not been able to figure out why this job keeps getting stuck. I monitor the job cluster metrics through out the job and it doesnt seem like its hitting any bottlenecks with m...

asky_0-1721405223209.png asky_1-1721404695718.png asky_2-1721404734997.png asky_3-1721404753865.png
  • 875 Views
  • 0 replies
  • 0 kudos
hayden_blair
by New Contributor III
  • 536 Views
  • 2 replies
  • 0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

  • 536 Views
  • 2 replies
  • 0 kudos
Latest Reply
hayden_blair
New Contributor III
  • 0 kudos

Thank you for the response @szymon_dybczak. Do you know if single user clusters are inherently less secure? I am still curious about why single user access mode is not allowed for DLT + Unity Catalog.

  • 0 kudos
1 More Replies
Chandru
by New Contributor III
  • 4839 Views
  • 3 replies
  • 7 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

  • 4839 Views
  • 3 replies
  • 7 kudos
Latest Reply
Flo
New Contributor III
  • 7 kudos

If anybody ends up here after 2024: the init file must now be placed in the workspace for the cluster to accept it.So in Workspace, use Create/File to create the init script.Then add it to the cluster config inCompute - Your cluster - Advanced Config...

  • 7 kudos
2 More Replies
sinclair
by New Contributor II
  • 1891 Views
  • 6 replies
  • 1 kudos

Py4JJavaError: An error occurred while calling o465.coun

The following error occured when running .count() on a big sparkDF. Py4JJavaError: An error occurred while calling o465.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failur...

  • 1891 Views
  • 6 replies
  • 1 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 1 kudos

Hi @sinclair , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 1 kudos
5 More Replies
NandaKishoreI
by New Contributor II
  • 626 Views
  • 1 replies
  • 0 kudos

Databricks upon inserting delta table data inserts into folders in Dev

We have a Delta Table in Databricks. When we are inserting data into the Delta Table, in the storage account, it creates folders like: 05, 0H, 0F, 0O, 1T,1W, etc... and adds the parquet files there.We have not defined any partitions. We are inserting...

  • 626 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @NandaKishoreI , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
spicysheep
by New Contributor
  • 1013 Views
  • 3 replies
  • 2 kudos

Where to find comprehensive docs on databricks.yaml / DAB settings options

Where can I find documentation on how to set cluster settings (e.g., AWS instance type, spot vs on-demand, number of machines) in Databricks Asset Bundle databicks.yaml files? The only documentation I've come across mentions these things indirectly, ...

  • 1013 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 2 kudos

Hi @spicysheep , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 2 kudos
2 More Replies
Maatari
by New Contributor III
  • 1303 Views
  • 2 replies
  • 0 kudos

Resolved! How to monitor Kafka consumption / lag when working with spark structured streaming?

I have just find out spark structured streaming do not commit offset to kafka but use its internal checkpoint system and that there is no way to visualize its consumption lag in typical kafka UI- https://community.databricks.com/t5/data-engineering/c...

  • 1303 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @Maatari , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedbac...

  • 0 kudos
1 More Replies
thackman
by New Contributor II
  • 6467 Views
  • 5 replies
  • 0 kudos

Databricks cluster random slow start times.

We have a job that runs on single user job compute because we've had compatibility issues switching to shared compute.Normally the cluster (1 driver,1 worker) takes five to six minutes to start. This is on Azure and we only include two small python l...

thackman_1-1720639616797.png thackman_0-1720639478363.png
  • 6467 Views
  • 5 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @thackman , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 0 kudos
4 More Replies
ashraf1395
by Valued Contributor
  • 541 Views
  • 1 replies
  • 0 kudos

Spark code not running bcz of incorrect compute size

I have a dataset having 260 billion recordsI need to group by 4 columns and find out the sum on four other columnsI increased the memory to e32 for driver and workers nodes, max workers is 40The job still is stuck in this aggregate step where I’m wri...

  • 541 Views
  • 1 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Databricks Employee
  • 0 kudos

Hi @ashraf1395 , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 0 kudos
seefoods
by New Contributor III
  • 325 Views
  • 1 replies
  • 1 kudos

audit log for workspace users

Hello Everyone, How to retrieve trace execution of a Notebook databricks GCP Users Workspace.  Thanks

  • 325 Views
  • 1 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Contributor III
  • 1 kudos

Hi @seefoods ,I think you can use system tables to get such information:https://docs.databricks.com/en/admin/system-tables/audit-logs.html

  • 1 kudos
WAHID
by New Contributor II
  • 368 Views
  • 0 replies
  • 0 kudos

GDAL on Databricks serverless compute

I am wondering if it's possible to install and use GDAL on Databricks serverless compute. I couldn't manage to do that using pip install gdal, and I discovered that init scripts are not supported on serverless compute.

  • 368 Views
  • 0 replies
  • 0 kudos
mr_robot
by New Contributor
  • 1252 Views
  • 3 replies
  • 3 kudos

Update datatype of a column in a table

I have a table in databricks with fields name: string, id: string, orgId: bigint, metadata: struct, now i want to rename one of the columns and change it type. In my case I want to update orgId to orgIds and change its type to map<string, string> One...

Data Engineering
tables delta-tables
  • 1252 Views
  • 3 replies
  • 3 kudos
Latest Reply
jacovangelder
Honored Contributor
  • 3 kudos

You can use REPLACE COLUMNS.ALTER TABLE your_table_name REPLACE COLUMNS ( name STRING, id BIGINT, orgIds MAP<STRING, STRING>, metadata STRUCT<...> );

  • 3 kudos
2 More Replies

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group
Labels