cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

maikel
by Contributor II
  • 134 Views
  • 2 replies
  • 0 kudos

Job tasks monitoring

Hello Community,We have a case in our project that we would like to solve in an elegant and scalable manner. As always, I would really appreciate your suggestions and experience.In short:We have a multi-step job consisting of 4 stages. In one of the ...

  • 134 Views
  • 2 replies
  • 0 kudos
Latest Reply
MoJaMa
Databricks Employee
  • 0 kudos

I don't think there is anything native for this in Databricks. The closest match would have been system tables (system.lakeflow.job_run_timeline / job_task_run_timeline) but I don't think it will have the necessary grain for what your pattern. There'...

  • 0 kudos
1 More Replies
lrm_data
by New Contributor III
  • 446 Views
  • 3 replies
  • 2 kudos

Resolved! **Lakeflow Connect SQL Server — Snapshots Firing Outside Configured Full Refresh Window?**

Has anyone else seen full refresh snapshots trigger outside of their configured refresh window in Lakeflow Connect?Here's our situation:- We have a full refresh window configured to restrict snapshot operations to off-hours- On at least one occasion,...

  • 446 Views
  • 3 replies
  • 2 kudos
Latest Reply
lrm_data
New Contributor III
  • 2 kudos

Hello @Sumit_7 ,I have tested a few scenarios and logged a ticket with Databricks and discovered the following:Common MisconceptionThe start_window setting does not define a bounded time window during which full refreshesare contained. It is simply a...

  • 2 kudos
2 More Replies
lrm_data
by New Contributor III
  • 573 Views
  • 4 replies
  • 0 kudos

Resolved! Lakeflow Connect - SQL Server - Issues restarting after failure

Has anyone else run into a situation where a breaking schema change on a SQL Server source table leaves their Lakeflow Connect pipeline in a state it can't recover from — even after destroying and recreating the pipeline?Here's what happened to us:- ...

  • 573 Views
  • 4 replies
  • 0 kudos
Latest Reply
lrm_data
New Contributor III
  • 0 kudos

Hey all,Following up.I was able to recover. The one step I was missing is resetting CDC in the source side. After that, I was able to destroy and recreate the bundle and successfully refresh all tables. Thanks!

  • 0 kudos
3 More Replies
Avinash_Narala
by Databricks Partner
  • 587 Views
  • 2 replies
  • 2 kudos

Resolved! Data Loss in Incremental Batch Jobs Due to Latency in delta file write to blob

Hi everyone,I am facing a data consistency issue in my Databricks incremental pipeline where records are being skipped because of a time gap between when a record is processed and when the physical file is finalized in Azure Blob Storage (ABFS).Our A...

  • 587 Views
  • 2 replies
  • 2 kudos
Latest Reply
balajij8
Contributor III
  • 2 kudos

You can handle it as belowFix the Bronze Write - The 20+ minutes commit gap suggests metadata contention or "Small File Issues" in the bronze delta tables. You can optimize tables manually or enable Optimized Write and Auto Optimize if feasible. This...

  • 2 kudos
1 More Replies
harisrinivasay
by New Contributor II
  • 472 Views
  • 4 replies
  • 1 kudos

Resolved! Unable to View Tables While Setting Up PostgreSQL CDC via Lakeflow Connect

Dear Experts,I have a requirement to implement PostgreSQL CDC using Databricks Lakeflow Connect. While setting up the tables, I am unable to see the list of available tables, even though the connection settings appear to be correct.Could you please s...

  • 472 Views
  • 4 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @harisrinivasay, @szymon_dybczak is correct. You must enter the database name. Lakeflow Connect can only connect to and query that database, and list the schemas and tables if you provide the correct name. If the name is incorrect or if you don’t ...

  • 1 kudos
3 More Replies
Raj_DB
by Contributor
  • 959 Views
  • 7 replies
  • 11 kudos

Resolved! Designing Reliable Data Versioning Strategies in Databricks

Hi everyone,I’m working on a use case where I need to retain 30 days of historical data in a Delta table and use it to build trend reports.I’m looking for the best approach to reliably maintain this historical data while also making it suitable for r...

  • 959 Views
  • 7 replies
  • 11 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 11 kudos

Hey @Raj_DB , The TLDR is  time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table. Docs 1,2 explicitly say De...

  • 11 kudos
6 More Replies
abhijit007
by Databricks Partner
  • 432 Views
  • 2 replies
  • 2 kudos

Resolved! Redshift to Databricks Migration with Lakebridge

We are currently performing an assessment for a client’s Redshift to Databricks migration, and we would like to better understand the enhanced capabilities of Lakebridge for this use case.We would appreciate clarification on the following points:Scop...

  • 432 Views
  • 2 replies
  • 2 kudos
Latest Reply
pradeep_singh
Contributor III
  • 2 kudos

There is a nice course on Partner Academy as well . It uses SQL Server as a target system for migration but you can follow the same steps for Redshift as well . https://partner-academy.databricks.com/learn/courses/4326/lakebridge-for-sql-source-syste...

  • 2 kudos
1 More Replies
helius_205
by New Contributor II
  • 358 Views
  • 1 replies
  • 0 kudos

Resolved! Does a delta live table automatically perform increments without needing timestamp columns?

The code : import dltfrom pyspark.sql.functions import col@dlt.table(    name="silver_customers",    comment="Cleaned customers data from bronze")@dlt.expect("valid_email", "email IS NOT NULL")@dlt.expect("valid_customer_id", "customer_id IS NOT NULL...

  • 358 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Honored Contributor III
  • 0 kudos

@helius_205 I doubt, do check the execution mode ~ should be triggered. Also it's a normal read instead of readStream. Read Docs for better understanding.

  • 0 kudos
AngelShrestha
by Databricks Partner
  • 542 Views
  • 5 replies
  • 2 kudos

Error updating schema: SCHEMA_FOREIGN_SQLSERVER update_mask requirement.

What I tried:Updating the description via UI (AI Suggested Description / manual editI’m running into an issue while trying to update the description for the schema.Context:Type: SCHEMA_FOREIGN_SQLSERVERError message:Failed to save description. Please...

  • 542 Views
  • 5 replies
  • 2 kudos
Latest Reply
emma_s
Databricks Employee
  • 2 kudos

Hi, Yes 100%, if you use Lakeflow connect, it will ingest the data and they will become managed tables. Which will support the descriptions and comments. You should also get some query improvement as you're actually moving the data rather than queryi...

  • 2 kudos
4 More Replies
ittzzmalind
by New Contributor III
  • 477 Views
  • 1 replies
  • 1 kudos

Resolved! Delta Sharing with Materialized View - recepient data not refreshing when using Open Protocol

Scenario: Delta Sharing with Materialized ViewProvider Side Setup :->A Delta Share was created.->A materialized view was added to the share.->Recipients Created-> 1). Open Delta Sharing recipient       Accessed using Python (import delta_sharing)->2)...

  • 477 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ittzzmalind, This is expected behaviour and is mainly due to how Delta Sharing handles materialized views for open (non-Databricks) recipients versus Databricks-to-Databricks recipients. For Databricks-to-Databricks recipients, the shared materia...

  • 1 kudos
databrciks
by New Contributor III
  • 680 Views
  • 3 replies
  • 1 kudos

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

  • 680 Views
  • 3 replies
  • 1 kudos
Latest Reply
databrciks
New Contributor III
  • 1 kudos

Hi Ashwin Thanks for the quick response. Yes I want to pass all the tables through config parameter/param file and load that into bronze layerI will try this approach. Thanks 

  • 1 kudos
2 More Replies
demo-user
by New Contributor III
  • 702 Views
  • 2 replies
  • 0 kudos

S3A Connector Trying to Use AWS STS on Non-AWS S3 Endpoint

Hi everyone,I’m trying to write Delta tables to my S3-compatible (non-AWS) endpoint, and it was writing perfectly fine last week with the same setup. Now, without any changes on my end, it’s failing and giving me anUnknownException: (com.amazonaws.se...

  • 702 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @demo-user , Can you share more information about your setup: Cluster type and DBR versionS3-compatible storage implementation (MinIO / something else?) AFAIK this is not supposed to work as Delta client in DBR relies on AWS STS to perform S3 comm...

  • 0 kudos
1 More Replies
BennyBoyW
by New Contributor III
  • 720 Views
  • 4 replies
  • 3 kudos

Resolved! How to Convert a Lateral View to a Table Reference

Hi AllI have a view creation script in DataBricks which uses a lateral view to access columns in a structure held within an array field. It is working fine but I have noted that the LATERAL VIEW is now depracated and that I should be using a TABLE RE...

  • 720 Views
  • 4 replies
  • 3 kudos
Latest Reply
balajij8
Contributor III
  • 3 kudos

You can useCREATE OR REPLACE VIEW newview  AS    SELECT      t1.field1,      item.field2,      item.field3    FROM table1 AS t1    INNER JOIN table2 AS t2 ON t1.id = t2.id    , LATERAL EXPLODE(t1.structure) AS structureitem(item)

  • 3 kudos
3 More Replies
sreya_sahithi
by Databricks Partner
  • 471 Views
  • 1 replies
  • 0 kudos

Resolved! Column Tags Not Accessible in Genie (Azure Databricks)

Hi Team,We’ve applied column-level tags to a table in Azure Databricks and attached the table in our Genie workspace. However, when querying via Genie, the column tag information is not being returned correctly (missing/incomplete results), despite t...

sreya_sahithi_2-1774591739004.png
  • 471 Views
  • 1 replies
  • 0 kudos
Latest Reply
Ale_Armillotta
Valued Contributor II
  • 0 kudos

Hi @sreya_sahithi,This is an important distinction about how Genie works: Genie queries the actual data rows in the tables attached to its space — it does not natively query Unity Catalog metadata such as column-level tags. Column tags live in INFORM...

  • 0 kudos
Labels