cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 
Data + AI Summit 2024 - Data Engineering & Streaming

Forum Posts

adityarai316
by Visitor
  • 32 Views
  • 0 replies
  • 0 kudos

Mount point in unity catalog

Hi Everyone,In my existing notebooks we have used mount points url as /mnt/ and we have more than 200 notebooks where we have used the above url to fetch the data/file from the container. Now as we are upgrading to unity catalog these url will no lon...

  • 32 Views
  • 0 replies
  • 0 kudos
8b1tz
by New Contributor
  • 69 Views
  • 2 replies
  • 0 kudos

Data factory logs into databricks delta table

Hi Databricks Community,I am looking for a solution to efficiently integrate Azure Data Factory pipeline logs with Databricks at minimal cost. Currently, I have a dashboard that consumes data from a Delta table, and I would like to augment this table...

  • 69 Views
  • 2 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @8b1tz, configure ADF to send pipeline logs to an Azure Storage Account, Azure Log Analytics, or Event Hubs. This ensures that logs are persisted and can be accessed by Databricks.If you need more detailed guidance or run into specific issues, fee...

  • 0 kudos
1 More Replies
pankaj30
by New Contributor II
  • 798 Views
  • 4 replies
  • 3 kudos

Resolved! Databricks Pyspark Dataframe error while displaying data read from mongodb

Hi ,We are trying to read data from mongodb using databricks notebook with pyspark connectivity.When we try to display data frame data using show or display method , it gives error "org.bson.BsonInvalidOperationException:Document does not contain key...

  • 798 Views
  • 4 replies
  • 3 kudos
Latest Reply
an313x
New Contributor
  • 3 kudos

UPDATE:Installing mongo-spark-connector_2.12-10.3.0-all.jar from Maven does NOT require the JAR files below to be installed on the cluster to display the dataframebsonmongodb-driver-coremongodb-driver-syncAlso, I noticed that both DBR 13.3 LTS and 14...

  • 3 kudos
3 More Replies
dream
by Contributor
  • 419 Views
  • 4 replies
  • 2 kudos

Accessing shallow cloned data through an External location fails

I have two external locations. On both of these locations I have `ALL PRIVILEGES` access.I am creating a table on the first external location using the following command:%sqlcreate or replace table delta.`s3://avinashkhamanekar/tmp/test_table_origina...

  • 419 Views
  • 4 replies
  • 2 kudos
Latest Reply
raphaelblg
Honored Contributor
  • 2 kudos

Hello ,  This is an underlying exception that should occur with any SQL statement that require access to this file: part-00000-36ee2e95-cfb1-449b-a986-21657cc01b22-c000.snappy.parquet It looks like the Delta log is referencing a file that doesn't exi...

  • 2 kudos
3 More Replies
thiagoawstest
by Contributor
  • 50 Views
  • 0 replies
  • 0 kudos

Error Sent message larger than max

Hello, I'm receiving a large amount of data in a dataframe, when trying to record or display it, I receive the error below. How can I fix it, or where do I change the setting?SparkConnectGrpcException: <_MultiThreadedRendezvous of RPC that terminated...

  • 50 Views
  • 0 replies
  • 0 kudos
Olaoye_Somide
by New Contributor II
  • 44 Views
  • 1 replies
  • 0 kudos

AutoLoader File Notification Setup on AWS

I’m encountering issues setting up Databricks AutoLoader in File Notification mode. The error seems to be related to UC access to the S3 bucket. I have tried running it on a single-node dedicated cluster but no luck.Any guidance or assistance on reso...

  • 44 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Olaoye_Somide,  Ensure that the IAM policy attached to the role or instance profile used by your Databricks cluster allows the necessary S3 actions. Your current policy includes actions like s3:GetObject, s3:PutObject, and s3:ListBucket, which ar...

  • 0 kudos
hayden_blair
by New Contributor II
  • 73 Views
  • 2 replies
  • 0 kudos

Why Shared Access Mode for Unity Catalog enabled DLT pipeline?

Hello all,I am trying to use an RDD API in a Unity Catalog enabled Delta Live Tables pipeline.I am getting an error because Unity Catalog enabled DLT can only run on "shared access mode" compute, and RDD APIs are not supported on shared access comput...

  • 73 Views
  • 2 replies
  • 0 kudos
Latest Reply
hayden_blair
New Contributor II
  • 0 kudos

Thank you for the response @Slash. Do you know if single user clusters are inherently less secure? I am still curious about why single user access mode is not allowed for DLT + Unity Catalog.

  • 0 kudos
1 More Replies
Sreyasi_Thakur
by New Contributor
  • 56 Views
  • 1 replies
  • 0 kudos

DLT Pipeline on Hive Metastore

I am creating a DLT pipeline on Hive Metastore (destination is Hive Metastore) and using a notebook within the pipeline which reads a unity catalog table. But, I am getting an error- [UC_NOT_ENABLED] Unity Catalog is not enabled on this cluster.Is it...

  • 56 Views
  • 1 replies
  • 0 kudos
Latest Reply
Kaniz_Fatma
Community Manager
  • 0 kudos

Hi @Sreyasi_Thakur, Yes, this is a known limitation. When you define the pipeline destination as Hive Metastore, you cannot read tables from Unity Catalog within the same pipeline. Delta Live Tables (DLT) pipelines can either use the Hive Metastore o...

  • 0 kudos
Chandru
by New Contributor III
  • 3732 Views
  • 3 replies
  • 7 kudos

Resolved! Issue in importing librosa library while using databricks runtime engine 11.2

I have installed the library via PyPI on the cluster. When we import the package on notebook, getting the following errorimport librosaOSError: cannot load library 'libsndfile.so': libsndfile.so: cannot open shared object file: No such file or direct...

  • 3732 Views
  • 3 replies
  • 7 kudos
Latest Reply
Flo
New Contributor II
  • 7 kudos

If anybody ends up here after 2024: the init file must now be placed in the workspace for the cluster to accept it.So in Workspace, use Create/File to create the init script.Then add it to the cluster config inCompute - Your cluster - Advanced Config...

  • 7 kudos
2 More Replies
mr_robot
by New Contributor
  • 73 Views
  • 4 replies
  • 4 kudos

Update datatype of a column in a table

I have a table in databricks with fields name: string, id: string, orgId: bigint, metadata: struct, now i want to rename one of the columns and change it type. In my case I want to update orgId to orgIds and change its type to map<string, string> One...

Data Engineering
tables delta-tables
  • 73 Views
  • 4 replies
  • 4 kudos
Latest Reply
jacovangelder
Contributor III
  • 4 kudos

You can use REPLACE COLUMNS.ALTER TABLE your_table_name REPLACE COLUMNS ( name STRING, id BIGINT, orgIds MAP<STRING, STRING>, metadata STRUCT<...> );

  • 4 kudos
3 More Replies
sinclair
by New Contributor
  • 308 Views
  • 7 replies
  • 1 kudos

Py4JJavaError: An error occurred while calling o465.coun

The following error occured when running .count() on a big sparkDF. Py4JJavaError: An error occurred while calling o465.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 6 in stage 3.0 failed 4 times, most recent failur...

  • 308 Views
  • 7 replies
  • 1 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 1 kudos

Hi @sinclair , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feedba...

  • 1 kudos
6 More Replies
NandaKishoreI
by New Contributor II
  • 71 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks upon inserting delta table data inserts into folders in Dev

We have a Delta Table in Databricks. When we are inserting data into the Delta Table, in the storage account, it creates folders like: 05, 0H, 0F, 0O, 1T,1W, etc... and adds the parquet files there.We have not defined any partitions. We are inserting...

  • 71 Views
  • 2 replies
  • 0 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 0 kudos

Hi @NandaKishoreI , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your f...

  • 0 kudos
1 More Replies
spicysheep
by New Contributor
  • 238 Views
  • 3 replies
  • 2 kudos

Where to find comprehensive docs on databricks.yaml / DAB settings options

Where can I find documentation on how to set cluster settings (e.g., AWS instance type, spot vs on-demand, number of machines) in Databricks Asset Bundle databicks.yaml files? The only documentation I've come across mentions these things indirectly, ...

  • 238 Views
  • 3 replies
  • 2 kudos
Latest Reply
Rishabh_Tiwari
Community Manager
  • 2 kudos

Hi @spicysheep , Thank you for reaching out to our community! We're here to help you.  To ensure we provide you with the best support, could you please take a moment to review the response and choose the one that best answers your question? Your feed...

  • 2 kudos
2 More Replies
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!

Labels