cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

804082
by New Contributor III
  • 5655 Views
  • 8 replies
  • 2 kudos

Resolved! DLT Direct Publishing Mode

Hello,I'm working on a DLT pipeline and have a block of SQL that runs...USE CATALOG catalog_a; USE SCHEMA schema_a; CREATE OR REFRESH MATERIALIZED VIEW table_a AS SELECT ... FROM catalog_b.schema_b.table_b;Executing this block returns the following.....

  • 5655 Views
  • 8 replies
  • 2 kudos
Latest Reply
Dorsey
New Contributor II
  • 2 kudos

I'm in EastUS and i don't have that option on my previews page. Also it only works with serverless?

  • 2 kudos
7 More Replies
moski
by New Contributor II
  • 14491 Views
  • 9 replies
  • 8 kudos

Databricks short cut to split a cell

Is there a shortcut to split a cell into two in Dtabricks notebook as in Jupiter notebook? in jupyter notebook it is Shift/Ctr/-

  • 14491 Views
  • 9 replies
  • 8 kudos
Latest Reply
Harshjot
Contributor III
  • 8 kudos

 Hi @mundy Jim​ / All, Attached are two snapshots so first snapshot with one cell if pressed Ctrl+Alt+Minus split into two.  

  • 8 kudos
8 More Replies
LearnDB1234
by New Contributor III
  • 1699 Views
  • 3 replies
  • 1 kudos

Resolved! How to Update Identity Column for a Databricks Table

Hi All,I have a databricks table with the below DDL:CREATE TABLE default.Test ( ID BIGINT GENERATED ALWAYS AS IDENTITY (START WITH 1 INCREMENT BY 1), StopFromDateTime TIMESTAMP, StopToDateTime TIMESTAMP, User STRING) USING delta TBLPROPERTIE...

  • 1699 Views
  • 3 replies
  • 1 kudos
Latest Reply
pdiamond
Contributor
  • 1 kudos

If you recreate the table using BIGINT GENERATED BY DEFAULT  instead of BIGINT GENERATED ALWAYS you can manipulate the column values."When using the clause GENERATED BY DEFAULT AS IDENTITY, insert operations can specify values for the identity column...

  • 1 kudos
2 More Replies
ramyav7796
by New Contributor II
  • 1510 Views
  • 1 replies
  • 0 kudos

add custom logs and save in a folder logs

Hi,I am trying to add custom logging functionality for my code. Please refer to the code I am using, I am trying to save my log files by creating a logs folder in my users workspace. My intent is to store dynamic custom log files each time I run my n...

  • 1510 Views
  • 1 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Here are some suggestions for your consideration.   The issue with your custom logging setup seems to stem from attempting to save the log files in a path under "/Workspace/Users/ramya.v@point32health.org/CD/", which is not directly writable by your ...

  • 0 kudos
r0nald
by New Contributor II
  • 10192 Views
  • 4 replies
  • 1 kudos

UDF not working inside transform() & lambda (SQL)

Below is toy example of what I'm trying to achieve, but don't understand why it fails. Can anyone explain why, and suggest a fix or not overly bloated workaround?%sqlcreate or replace function status_map(status int)returns stringreturn map(10, "STATU...

  • 10192 Views
  • 4 replies
  • 1 kudos
Latest Reply
DattaWalake
Databricks Employee
  • 1 kudos

Scope of lambda implementation within transform function only ,which fails with udf because lambda variable bindings (e in your case) are not available for use within the UDF's scope. we can use below workaround for above example which generate same ...

  • 1 kudos
3 More Replies
User16826994223
by Databricks Employee
  • 2143 Views
  • 2 replies
  • 1 kudos

Does Databricks have a data processing agreement?

Does Databricks have a data processing agreement?

  • 2143 Views
  • 2 replies
  • 1 kudos
Latest Reply
liam_noah
New Contributor II
  • 1 kudos

Yes, Databricks typically provides a Data Processing Agreement (DPA) to comply with data protection regulations like GDPR. It's important for businesses to thoroughly review these agreements to ensure alignment with their data privacy policies. You c...

  • 1 kudos
1 More Replies
hadoan
by New Contributor II
  • 1932 Views
  • 3 replies
  • 1 kudos

How to define DLT table with cyclic reference

 @Dlt.table def table_A(): return ( dlt.read_stream(...) ) @dlt.table def table_join_A_and_C(): df_A = dlt.read_stream(table_A) df_C = dlt.read_stream(table_C) return ( ....df_A.join(df_C) ) @dlt.table def table_C(): return ( ...

  • 1932 Views
  • 3 replies
  • 1 kudos
Latest Reply
dilipdiwakar
New Contributor II
  • 1 kudos

Could you please describe best approach here. Thanks

  • 1 kudos
2 More Replies
Dejian
by New Contributor II
  • 1466 Views
  • 3 replies
  • 0 kudos

DLT Append Flow Parameterization

Hi All,I'm currently using DLT append flow to merge multiple streaming flows into one output.While trying to make the append flow into a dynamic function for scalability, the dlt append flow seem to have some errors.stat_table = f"{catalog}.{bronze_s...

  • 1466 Views
  • 3 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

The error you're encountering occurs because Delta Live Tables (DLT) append flows currently do not support streaming aggregations or other transformations on streaming DataFrames unless a watermark is applied properly to handle late data. Based on yo...

  • 0 kudos
2 More Replies
AntonDBUser
by New Contributor III
  • 4806 Views
  • 1 replies
  • 0 kudos

Oracle Lakehouse Federaton with CA Certificate

Hi!We have been pulling data from Oracle to Databricks by installing Oracle Driver and certificates directly in the cluster. We are now looking into using Lakehouse Federation for Oracle instead, but it seems like the connection doesn't pick up the c...

  • 4806 Views
  • 1 replies
  • 0 kudos
Latest Reply
Isi
Honored Contributor III
  • 0 kudos

Hi @AntonDBUser ,Thanks for sharing your experience, we’re looking into using Lakehouse Federation with Oracle too.I haven’t tested this myself, but one idea that came to mind is whether switching from a serverless cluster to a standard (Pro) cluster...

  • 0 kudos
mridultuteja
by New Contributor II
  • 3043 Views
  • 6 replies
  • 1 kudos

external table not being written to data lake

I was following a tutorial to learn databricks from https://youtu.be/7pee6_Sq3VYGreat video btwI am stuck here at 2:52:24I am trying to create an external table directly to data lake but i am facing some weird issue saying no such location exists.I h...

mridultuteja_0-1746405222446.png mridultuteja_1-1746405246157.png
  • 3043 Views
  • 6 replies
  • 1 kudos
Latest Reply
Isi
Honored Contributor III
  • 1 kudos

Hey @mridultuteja To register an external location, you have to first create a Storage Credential, and then create the External Location.This process allows Databricks to securely access data stored in Azure Data Lake Storage Gen2 (ADLS Gen2), while ...

  • 1 kudos
5 More Replies
SeekingSolution
by New Contributor II
  • 584 Views
  • 1 replies
  • 0 kudos

Unity Catalog Enablement

Hello,After scouring documentation yesterday, I was finally able to get unity catalog enabled and assigned to my workspace. Or so I thought. When I run the CURRENT METASTORE() command I get the below error:However, when I look at my catalog I can see...

SeekingSolution_0-1746620101890.png SeekingSolution_1-1746620144801.png SeekingSolution_2-1746620282198.png
  • 584 Views
  • 1 replies
  • 0 kudos
Latest Reply
Nivethan
New Contributor III
  • 0 kudos

Hi,Please check if the cluster you are using to run the query as well upgraded to Unity Catalog. Also, follow the best practices outlined here for enablement: https://docs.databricks.com/aws/en/data-governance/unity-catalog/enable-workspacesBest Rega...

  • 0 kudos
vaibhavaher2025
by New Contributor
  • 687 Views
  • 1 replies
  • 0 kudos

How to get response from API call made via executor

Hi Guys,I'm trying to call multiple APIs via executor using foreach partition, However as API response is getting returned at executor level I'm unable to see the response of API weather its 200 or 500.I dont want my APIs to execute on driver so I'm ...

  • 687 Views
  • 1 replies
  • 0 kudos
Latest Reply
sarahbhord
Databricks Employee
  • 0 kudos

Vaibhavaher2025 -  I recommend trying the following:  1. Write logs from executors to persist storage insideprocess_partition. 2. Use mapPartitions instead offoreachPartition to return responses back to the driver as a Dataframe 3. Check executor log...

  • 0 kudos
anmol-aidora
by New Contributor III
  • 2684 Views
  • 6 replies
  • 0 kudos

Resolved! Serverless: ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied

Hello guys!I am getting this error when running a job:ERROR: Could not install packages due to an OSError: [Errno 13] Permission denied: '/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.11/site-packages/some-python-package'I have lis...

  • 2684 Views
  • 6 replies
  • 0 kudos
Latest Reply
anmol-aidora
New Contributor III
  • 0 kudos

Thanks for clarifying Isi, really appreciate it

  • 0 kudos
5 More Replies
soumiknow
by Contributor II
  • 7678 Views
  • 22 replies
  • 1 kudos

Resolved! BQ partition data deleted fully even though 'spark.sql.sources.partitionOverwriteMode' is DYNAMIC

We have a date (DD/MM/YYYY) partitioned BQ table. We want to update a specific partition data in 'overwrite' mode using PySpark. So to do this, I applied 'spark.sql.sources.partitionOverwriteMode' to 'DYNAMIC' as per the spark bq connector documentat...

  • 7678 Views
  • 22 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

@soumiknow , Just checking if there are any further questions, and did my last comment help?

  • 1 kudos
21 More Replies
M_S
by New Contributor II
  • 1132 Views
  • 2 replies
  • 2 kudos

Dataframe is getting empty during execution of daily job with random pattern

Hello, I have a daily ETL job that adds new records to a table for the previous day. However, from time to time, it does not produce any output.After investigating, I discovered that one table is sometimes loaded as empty during execution. As a resul...

M_S_0-1746605849738.png
  • 1132 Views
  • 2 replies
  • 2 kudos
Latest Reply
M_S
New Contributor II
  • 2 kudos

Thank you very much, @Louis_Frolio , for such a detailed and insightful answer!All tables used in this processing are managed Delta tables loaded through Unity Catalog.I will try running it with spark.databricks.io.cache.enabled set to false just to ...

  • 2 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels