cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

User16826994223
by Databricks Employee
  • 2640 Views
  • 2 replies
  • 1 kudos

Does Databricks have a data processing agreement?

Does Databricks have a data processing agreement?

  • 2640 Views
  • 2 replies
  • 1 kudos
Latest Reply
liam_noah
New Contributor II
  • 1 kudos

Yes, Databricks typically provides a Data Processing Agreement (DPA) to comply with data protection regulations like GDPR. It's important for businesses to thoroughly review these agreements to ensure alignment with their data privacy policies. You c...

  • 1 kudos
1 More Replies
hadoan
by New Contributor II
  • 2098 Views
  • 3 replies
  • 1 kudos

How to define DLT table with cyclic reference

 @Dlt.table def table_A(): return ( dlt.read_stream(...) ) @dlt.table def table_join_A_and_C(): df_A = dlt.read_stream(table_A) df_C = dlt.read_stream(table_C) return ( ....df_A.join(df_C) ) @dlt.table def table_C(): return ( ...

  • 2098 Views
  • 3 replies
  • 1 kudos
Latest Reply
dilipdiwakar
New Contributor II
  • 1 kudos

Could you please describe best approach here. Thanks

  • 1 kudos
2 More Replies
Dejian
by New Contributor II
  • 1805 Views
  • 3 replies
  • 0 kudos

DLT Append Flow Parameterization

Hi All,I'm currently using DLT append flow to merge multiple streaming flows into one output.While trying to make the append flow into a dynamic function for scalability, the dlt append flow seem to have some errors.stat_table = f"{catalog}.{bronze_s...

  • 1805 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The error you're encountering occurs because Delta Live Tables (DLT) append flows currently do not support streaming aggregations or other transformations on streaming DataFrames unless a watermark is applied properly to handle late data. Based on yo...

  • 0 kudos
2 More Replies
AntonDBUser
by New Contributor III
  • 4922 Views
  • 1 replies
  • 0 kudos

Oracle Lakehouse Federaton with CA Certificate

Hi!We have been pulling data from Oracle to Databricks by installing Oracle Driver and certificates directly in the cluster. We are now looking into using Lakehouse Federation for Oracle instead, but it seems like the connection doesn't pick up the c...

  • 4922 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hi @AntonDBUser ,Thanks for sharing your experience, we’re looking into using Lakehouse Federation with Oracle too.I haven’t tested this myself, but one idea that came to mind is whether switching from a serverless cluster to a standard (Pro) cluster...

  • 0 kudos
mridultuteja
by New Contributor II
  • 4051 Views
  • 6 replies
  • 1 kudos

external table not being written to data lake

I was following a tutorial to learn databricks from https://youtu.be/7pee6_Sq3VYGreat video btwI am stuck here at 2:52:24I am trying to create an external table directly to data lake but i am facing some weird issue saying no such location exists.I h...

mridultuteja_0-1746405222446.png mridultuteja_1-1746405246157.png
  • 4051 Views
  • 6 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor III
  • 1 kudos

Hey @mridultuteja To register an external location, you have to first create a Storage Credential, and then create the External Location.This process allows Databricks to securely access data stored in Azure Data Lake Storage Gen2 (ADLS Gen2), while ...

  • 1 kudos
5 More Replies
SeekingSolution
by New Contributor II
  • 661 Views
  • 1 replies
  • 0 kudos

Unity Catalog Enablement

Hello,After scouring documentation yesterday, I was finally able to get unity catalog enabled and assigned to my workspace. Or so I thought. When I run the CURRENT METASTORE() command I get the below error:However, when I look at my catalog I can see...

SeekingSolution_0-1746620101890.png SeekingSolution_1-1746620144801.png SeekingSolution_2-1746620282198.png
  • 661 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan
New Contributor III
  • 0 kudos

Hi,Please check if the cluster you are using to run the query as well upgraded to Unity Catalog. Also, follow the best practices outlined here for enablement: https://docs.databricks.com/aws/en/data-governance/unity-catalog/enable-workspacesBest Rega...

  • 0 kudos
vaibhavaher2025
by New Contributor
  • 811 Views
  • 1 replies
  • 0 kudos

How to get response from API call made via executor

Hi Guys,I'm trying to call multiple APIs via executor using foreach partition, However as API response is getting returned at executor level I'm unable to see the response of API weather its 200 or 500.I dont want my APIs to execute on driver so I'm ...

  • 811 Views
  • 1 replies
  • 0 kudos
Latest Reply
sarahbhord
Databricks Employee
  • 0 kudos

Vaibhavaher2025 -  I recommend trying the following:  1. Write logs from executors to persist storage insideprocess_partition. 2. Use mapPartitions instead offoreachPartition to return responses back to the driver as a Dataframe 3. Check executor log...

  • 0 kudos
anmol-aidora
by New Contributor III
  • 3636 Views
  • 6 replies
  • 0 kudos

Resolved! Serverless: ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied

Hello guys!I am getting this error when running a job:ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/some-python-package'I have lis...

  • 3636 Views
  • 6 replies
  • 0 kudos
Latest Reply
anmol-aidora
New Contributor III
  • 0 kudos

Thanks for clarifying Isi, really appreciate it

  • 0 kudos
5 More Replies
soumiknow
by Contributor II
  • 9741 Views
  • 22 replies
  • 1 kudos

Resolved! BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC

We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark bq connector documentat...

  • 9741 Views
  • 22 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@soumiknow , Just checking if there are any further questions, and did my last comment help?

  • 1 kudos
21 More Replies
M_S
by New Contributor II
  • 1471 Views
  • 2 replies
  • 2 kudos

Dataframe is getting empty during execution of daily job with random pattern

Hello, I have a daily ETL job that adds new records to a table for the previous day. However, from time to time, it does not produce any output.After investigating, I discovered that one table is sometimes loaded as empty during execution. As a resul...

M_S_0-1746605849738.png
  • 1471 Views
  • 2 replies
  • 2 kudos
Latest Reply
M_S
New Contributor II
  • 2 kudos

Thank you very much, @Louis_Frolio , for such a detailed and insightful answer!All tables used in this processing are managed Delta tables loaded through Unity Catalog.I will try running it with spark.databricks.io.cache.enabled set to false just to ...

  • 2 kudos
1 More Replies
5UDO
by New Contributor II
  • 2792 Views
  • 6 replies
  • 4 kudos

Databricks warehouse table optimization

Hi everyone,I just started using the Databricks and wanted to evaluate the reading speeds when using the Databricks warehouse.So I've generated the dataset of 100M records, which contains name, surname, date of birth, phone number and an address. Dat...

  • 2792 Views
  • 6 replies
  • 4 kudos
Latest Reply
5UDO
New Contributor II
  • 4 kudos

Hi Brahmareddy and AndrewN,Thank you on your answers.I first need to apologize as I accidentally wrote wrong that I got 270ms with hashing the date of birth, surname and name and then using the z ordering.I actually achieved around 290ms with hashing...

  • 4 kudos
5 More Replies
jtjohnson
by New Contributor II
  • 1654 Views
  • 4 replies
  • 0 kudos

API Definition File

Hello. We are in the process of setting up Azure APIM to Databricks Rest API(s). Is there an official definition file available for download?Any help would be greatly appreciated

  • 1654 Views
  • 4 replies
  • 0 kudos
Latest Reply
jtjohnson
New Contributor II
  • 0 kudos

Thank you for the feedback. The postman collection would be ideal but the link is a no longer active

  • 0 kudos
3 More Replies
harika5991
by New Contributor II
  • 1530 Views
  • 1 replies
  • 0 kudos

Unable to create a metastore for Unity Catalog as I don't have Account Admin rights

Hello guys,I just started learning Databricks. I created a Databricks workspace via the Azure Portal using the Trial (Premium - 14-Days Free DBUs) plan. The workspace name is `easewithdata-adb`.However,I do not currently see the option to create a Un...

  • 1530 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @harika5991 You're right about the root cause of your issue. Creating a Unity Catalog metastore requires Account Admin privileges, which is separate from just creating a workspace in Azure.These are options you can try:When you create a Databricks...

  • 0 kudos
Louis_Frolio
by Databricks Employee
  • 7075 Views
  • 4 replies
  • 4 kudos

Resolved! What are your most impactful use cases for schema evolution in Databricks?

  Data Engineers, Share Your Experiences with Delta Lake Schema Evolution! We're calling on all data engineers to share their experiences with the powerful schema evolution feature in Delta Lake. This feature allows for seamless adaptation to changin...

  • 7075 Views
  • 4 replies
  • 4 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 4 kudos

Outstanding!

  • 4 kudos
3 More Replies
flashmav
by New Contributor II
  • 1037 Views
  • 1 replies
  • 0 kudos

Resolved! ConcurrentDeleteDeleteException in liquid cluster table

I am doing a merge in a table in parallel via 2 jobs.The table is a liquid clustered table with the following properties:delta.enableChangeDataFeed=truedelta.enableDeletionVectors=truedelta.enableRowTracking=truedelta.feature.changeDataFeed=supported...

  • 1037 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hey @flashmav ,  keep in mind that operations in Delta Lake often occur at the file level rather than the row level. For example, if two sessions attempt to update data in the same file (even if they’re not updating the same row), you may encounter a...

  • 0 kudos
Labels