cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

HoussemBL
by New Contributor II
  • 21 Views
  • 1 replies
  • 0 kudos

Impact of deleting workspace on associated catalogs

Hello Community,I have a specific scenario regarding Unity Catalog and workspace deletion that I'd like to clarify:Current Setup:Two DataBricks workspaces: W1 and W2Single Unity Catalog instanceCatalog1: Created in W1, shared and accessible in W2Cata...

  • 21 Views
  • 1 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Hi @HoussemBL  When you delete a Databricks workspace, it does not directly impact the Unity Catalog or the data within it. Unity Catalog is a separate entity that manages data access and governance across multiple workspaces. Here’s what happens in ...

  • 0 kudos
minhhung0507
by New Contributor II
  • 66 Views
  • 2 replies
  • 0 kudos

Handling Dropped Records in Delta Live Tables with Watermark - Need Optimization Strategy

Hi Databricks Community,I'm encountering an issue with watermarks in Delta Live Tables that's causing data loss in my streaming pipeline. Let me explain my specific problem:Current SituationI've implemented watermarks for stateful processing in my De...

  • 66 Views
  • 2 replies
  • 0 kudos
Latest Reply
minhhung0507
New Contributor II
  • 0 kudos

 Dear @Walter_C, thank you for your detailed response regarding watermark handling in Delta Live Tables (DLT). I appreciate the guidance provided, but I would like further clarification on a couple of points related to our use case.1. Auto-Saving Dro...

  • 0 kudos
1 More Replies
rt-slowth
by Contributor
  • 1092 Views
  • 1 replies
  • 0 kudos

how to use dlt module in streaming pipeline

If anyone has example code for building a CDC live streaming pipeline generated by AWS DMS using import dlt, I'd love to see it.I'm currently able to see the parquet file starting with Load on the first full load to S3 and the cdc parquet file after ...

  • 1092 Views
  • 1 replies
  • 0 kudos
Latest Reply
cgrant
Databricks Employee
  • 0 kudos

There is a blogpost for this that includes example code that you can find here

  • 0 kudos
GS_S
by New Contributor
  • 159 Views
  • 7 replies
  • 0 kudos

Resolved! Error during merge operation: 'NoneType' object has no attribute 'collect'

Why does merge.collect() not return results in access mode: SINGLE_USER, but it does in USER_ISOLATION? I need to log the affected rows (inserted and updated) and can’t find a simple way to get this data in SINGLE_USER mode. Is there a solution or an...

  • 159 Views
  • 7 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

15.4 does not directly required the serverless but for fine-grained it indeed requires it to run it on Single User as mentioned  This data filtering is performed behind the scenes using serverless compute. In terms of costs:Customers are charged for ...

  • 0 kudos
6 More Replies
JothyGanesan
by New Contributor II
  • 66 Views
  • 2 replies
  • 0 kudos

DLT Merge tables into Delta

We are trying to load a Delta table from streaming tables using DLT. This target table needs a MERGE of 3 source tables. But when we use the DLT command with merge it says Merge is not supported. Is this anything related to DLT version? Please help u...

  • 66 Views
  • 2 replies
  • 0 kudos
Latest Reply
JothyGanesan
New Contributor II
  • 0 kudos

@Alberto_Umana Thank you for the quick reply. But how are we to use the above, this looks like structured streaming with CDF mode.But currently our tables being in Unity catalog, finding the start version and end version is taking huge time as the ta...

  • 0 kudos
1 More Replies
eballinger
by New Contributor II
  • 81 Views
  • 2 replies
  • 1 kudos

Resolved! Any way to ignore DLT tables in pipeline

Hello,In our testing environment we would like to be able to only update the DLT tables we are testing for our pipeline. This would help speed up the testing. We currently have the pipeline code being generated dynamically based on how many tables th...

  • 81 Views
  • 2 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @eballinger. To address your requirement of updating only specific Delta Live Tables (DLT) in your testing environment without removing the others, you can leverage the @dlt.table decorator and the temporary parameter in your Python code. This app...

  • 1 kudos
1 More Replies
eballinger
by New Contributor II
  • 149 Views
  • 1 replies
  • 1 kudos

Resolved! Check for row level security and column masking

Hi All,We have sensitive tables and have applied row level security and column masking. I would like to build into our job a check to make sure these tables still have the row filters and column masks applied. This would help ensure these security fi...

  • 149 Views
  • 1 replies
  • 1 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 1 kudos

Hi @eballinger. Have you tried using DESCRIBE TABLE EXTENDED on the table, that will give you details about filters applied to a table.

  • 1 kudos
zed
by New Contributor III
  • 120 Views
  • 2 replies
  • 0 kudos

ConcurrentAppendException in Feature Engineering write_table

I am using the Feature Engineering client when writing to a time series feature table. Then I have cried two data bricks jobs with the below code. I am running with different run_dates (e.g. '2016-01-07' and '2016-01-08'). When they run concurrently,...

  • 120 Views
  • 2 replies
  • 0 kudos
Latest Reply
Walter_C
Databricks Employee
  • 0 kudos

Modify your write_table operations to ensure they are as specific as possible about the data being written. This might involve adding more granular conditions to your data filtering and writing logic. Here is an example adjustment to your code to han...

  • 0 kudos
1 More Replies
SteveC527
by New Contributor
  • 105 Views
  • 2 replies
  • 0 kudos

Medallion Architecture and Databricks Assistant

I am in the process of rebuilding the data lake at my current company with databricks and I'm struggling to find comprehensive best practices for naming conventions and structuring medallion architecture to work optimally with the Databricks assistan...

  • 105 Views
  • 2 replies
  • 0 kudos
Latest Reply
Alberto_Umana
Databricks Employee
  • 0 kudos

Unity Catalog Setup: Catalog and Schema Levels: Use Unity Catalog to manage and organize your tables. Create separate catalogs or schemas for each layer of the medallion architecture. This way, the assistant can interpret the context based on the cat...

  • 0 kudos
1 More Replies
infinitylearnin
by New Contributor II
  • 56 Views
  • 0 replies
  • 0 kudos

Digital Natives on Databricks our Experience.

They say we should craft solutions tailored to the unique journeys of those we aim to support, paving the way for their success. Digital Native enterprises, with their distinct needs and ambitions in Data and AI, often seek clarity on where to start ...

  • 56 Views
  • 0 replies
  • 0 kudos
infinitylearnin
by New Contributor II
  • 72 Views
  • 0 replies
  • 0 kudos

Learn Data Engineering on Databricks Step By Step.

They say we should build bridges along the paths we’ve already traveled, making it easier for others to follow. Learning Data Engineering has often been a confusing journey for many, especially when it comes to figuring out where to start.I faced thi...

  • 72 Views
  • 0 replies
  • 0 kudos
ashraf1395
by Valued Contributor
  • 162 Views
  • 3 replies
  • 0 kudos

Getting error while using Live.target_table in dlt pipeline

I have created a target table in the same dlt pipeline. But when I read that table in different block of notebook with Live.table_path. It is not able to read it Here is my code block 1 Creating a streaming table # Define metadata tables catalog = sp...

  • 162 Views
  • 3 replies
  • 0 kudos
Latest Reply
ashraf1395
Valued Contributor
  • 0 kudos

Cant we use Live.table_name on a target dlt table with @Dlt.append_flow decorator.If yes can you share the code bcz when I tried I am getting error.

  • 0 kudos
2 More Replies
Abdul-Mannan
by New Contributor III
  • 291 Views
  • 14 replies
  • 2 kudos

Autoloader with file notification mode sleeps for 5000ms multiple times

Using DBR 15.4, i'm ingesting streaming data from adls using autoloader with file notification mode enabled. This is an older code which is using foreachbatch sink to process the data before merging with tables in delta lake. IssueStreaming job, is u...

AbdulMannan_0-1733760650416.png
  • 291 Views
  • 14 replies
  • 2 kudos
Latest Reply
Abdul-Mannan
New Contributor III
  • 2 kudos

@VZLA I just tested it and it seems this autoloader behaviour with available now trigger & file notification enabled, would remain the same with DLT pipeline, it sleeps 7 times each time sleeping for 5000ms before finally closing the stream, even tho...

  • 2 kudos
13 More Replies
sakuraDev
by New Contributor II
  • 352 Views
  • 1 replies
  • 0 kudos

I keep on getting Parse_syntax_error on autoloader run foreachbatch

Hey guys, I keep on getting this error message when trying to call a function with soda DQ's: [PARSE_SYNTAX_ERROR] Syntax error at or near '{'. SQLSTATE: 42601 File <command-81221799516900>, line 4 1 dfBronze.writeStream \ 2 .foreachB...

  • 352 Views
  • 1 replies
  • 0 kudos
Latest Reply
VZLA
Databricks Employee
  • 0 kudos

Hi @sakuraDev , this looks like a Soda syntax issue. Try fixing the "fail" and "warn" fields in your Soda checks. For example, instead of writing:   - missing_count(site) = 0: name: Ensure no null values fail: 1 warn: 0   Use Soda's thres...

  • 0 kudos
minhhung0507
by New Contributor II
  • 362 Views
  • 5 replies
  • 1 kudos

Resolved! Delta Log Files in GCS Not Deleting Automatically Despite Configuration

Hello Databricks Community,I am experiencing an issue with Delta Lake where the _delta_log files are not being deleted automatically in GCS bucket, even though I have set the table properties to enable this behavior. Here is the configuration I used:...

  • 362 Views
  • 5 replies
  • 1 kudos
Latest Reply
VZLA
Databricks Employee
  • 1 kudos

Glad it helps, and agree to monitoring this behaviour closely. Should you need further assistance, please don't hesitate to reach out.

  • 1 kudos
4 More Replies
Labels