cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

vedanth
by New Contributor
  • 61 Views
  • 1 replies
  • 0 kudos

Salesforce Connector - Lakeflow Connect 400 Error

HI All,I have been trying to setup Salesforce using Lakeflow Connect and followed instructions on the docshttps://docs.databricks.com/aws/en/connect/managed-ingestion#sfdcHowever I face into invalid_grant error  However login history on salesforce sh...

vedanth_0-1779009668052.png
  • 61 Views
  • 1 replies
  • 0 kudos
Latest Reply
GaneshI
New Contributor
  • 0 kudos

Hi Vedanth,The invalid_grant error usually occurs due to authentication or OAuth configuration issues between Salesforce and Databricks Lakeflow Connect.Could you please verify the following points:Ensure the Salesforce user account is not locked and...

  • 0 kudos
aonurdemir
by Contributor
  • 294 Views
  • 2 replies
  • 2 kudos

Liquid Clustering file pruning breaks when filtering on a high NULL numeric column in dataSkipping

EnvironmentCloud: AWSCompute: ServerlessTable: a_big_tableTable type: Streaming Table (SDP pipeline)Table size: 641 GB, 6,210 filesLiquid Clustering columns: [event_time, integer_userId]delta.dataSkippingStatsColumns:event_time, integer_userId, integ...

  • 294 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Hello @aonurdemir , I looked into your query and have compiled some helpful tips: I don't have direct access to your workspace internals, so I can't prove this definitively. But what you're seeing is consistent with how Delta's stats-based data skipp...

  • 2 kudos
1 More Replies
kcyugesh
by New Contributor II
  • 310 Views
  • 2 replies
  • 0 kudos

Unity Catalog storage credential fails although same Access Connector works in another credential

  In Azure Databricks Unity Catalog, I have two storage credentials that use the same connector_id / Azure Databricks Access Connector.One credential works and can access ADLS Gen2 successfully, but the other fails with: Failed to access cloud storag...

  • 310 Views
  • 2 replies
  • 0 kudos
Latest Reply
zoe_unifeye
Databricks Partner
  • 0 kudos

Hi @kcyugesh How are you getting on so far?It might also be worth checking the privileges associated with each credential to see if they differ.And secondly check the credential type on the credential, as a manaded identity in comparison to a service...

  • 0 kudos
1 More Replies
Avinash_Narala
by Databricks Partner
  • 566 Views
  • 2 replies
  • 2 kudos

Resolved! Data Loss in Incremental Batch Jobs Due to Latency in delta file write to blob

Hi everyone,I am facing a data consistency issue in my Databricks incremental pipeline where records are being skipped because of a time gap between when a record is processed and when the physical file is finalized in Azure Blob Storage (ABFS).Our A...

  • 566 Views
  • 2 replies
  • 2 kudos
Latest Reply
balajij8
Contributor III
  • 2 kudos

You can handle it as belowFix the Bronze Write - The 20+ minutes commit gap suggests metadata contention or "Small File Issues" in the bronze delta tables. You can optimize tables manually or enable Optimized Write and Auto Optimize if feasible. This...

  • 2 kudos
1 More Replies
DavidKxx
by Contributor
  • 348 Views
  • 2 replies
  • 1 kudos

Resolved! Data in Unity Catalog that can't be previewed

This is a small deficiency, but a fix would be nice to have.For a long time now, the Sample Data previewer in the Unity Catalog explorer has been unable to show tables that contain a certain kind of column.  Instead of showing sample rows of the tabl...

  • 348 Views
  • 2 replies
  • 1 kudos
Latest Reply
DavidKxx
Contributor
  • 1 kudos

Yes, my vector space is commonly of dimension 4000 or 8000.I don't write any dense vectors to table; can't speak to what happens previewing that type.Thanks for taking up the issue!

  • 1 kudos
1 More Replies
tsam
by New Contributor II
  • 428 Views
  • 4 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 428 Views
  • 4 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re seeing (a monotonic / stair‑step climb in driver RAM over thousands of DEEP CLONE statements) is a very common pattern when the driver is not “holding data”, but holding metadata, query artifacts, and per‑command state that accumulates faster ...

  • 0 kudos
3 More Replies
kevinzhang29
by New Contributor III
  • 345 Views
  • 2 replies
  • 1 kudos

Resolved! Auto CDC fLow without CDF?

Auto CDC flow works with source table CDF enabled, but fails when CDF is disabled.The source table is updated via INSERT OVERWRITE.IS CDF mandatory?  

  • 345 Views
  • 2 replies
  • 1 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 1 kudos

Yes, @kevinzhang29 . For Auto CDC with a Delta source table, a change data feed (CDF) (i.e., a CDC feed) is required. AUTO CDC is explicitly designed to read from a CDC/change feed source such as Delta CDF, not from plain snapshots. When you don’t ha...

  • 1 kudos
1 More Replies
Raj_DB
by Contributor
  • 916 Views
  • 7 replies
  • 11 kudos

Resolved! Designing Reliable Data Versioning Strategies in Databricks

Hi everyone,I’m working on a use case where I need to retain 30 days of historical data in a Delta table and use it to build trend reports.I’m looking for the best approach to reliably maintain this historical data while also making it suitable for r...

  • 916 Views
  • 7 replies
  • 11 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 11 kudos

Hey @Raj_DB , The TLDR is  time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table. Docs 1,2 explicitly say De...

  • 11 kudos
6 More Replies
200649021
by New Contributor II
  • 438 Views
  • 1 replies
  • 1 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 438 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor III
  • 1 kudos

That's cool ! why not git it ?

  • 1 kudos
databrciks
by New Contributor III
  • 484 Views
  • 2 replies
  • 2 kudos

Resolved! Delta table update

Hi Experts I have around 100 table in the bronze layer (DLT pipeline). We have created silver layer based on some logic around 20 silver layer tables.How to run the specific pipeline in silver layer when ever there is some update happens in the bronz...

  • 484 Views
  • 2 replies
  • 2 kudos
Latest Reply
databrciks
New Contributor III
  • 2 kudos

Thanks @anuj_lathi  for the Detailed explanation. This helps a lot .

  • 2 kudos
1 More Replies
AnandGNR
by New Contributor III
  • 1562 Views
  • 7 replies
  • 2 kudos

Unable to create secret scope -"Fetch request failed due expired user session"

Hi everyone,I’m trying to create an Azure Key Vault-backed secret scope in a Databricks Premium workspace, but I keep getting this error: Fetch request failed due expired user sessionSetup details:Databricks workspace: PremiumAzure Key Vault: Owner p...

  • 1562 Views
  • 7 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @AnandGNR ,Try to do following. Go to your KeyVault, then in Firewalls and virtual networks set:"Allow trusted Microsoft services to bypass this firewall."

  • 2 kudos
6 More Replies
Phani1
by Databricks MVP
  • 1372 Views
  • 6 replies
  • 4 kudos

Resolved! Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

 Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

  • 1372 Views
  • 6 replies
  • 4 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 4 kudos

Here is my action plan if it helps! Phase 1: Foundation ☐ Migrate to UC managed tables (if not already) ☐ Enable Predictive Optimization at catalog level ☐ Set delta.deletedFileRetentionDuration per layer Phase 2: Retention Policies ☐ Enab...

  • 4 kudos
5 More Replies
abhishek13
by New Contributor II
  • 368 Views
  • 2 replies
  • 1 kudos

Connection reset error from Databricks notebook but works via curl (GCP)

Hi everyone,I’m facing a connectivity issue in my Databricks workspace on GCP and would appreciate any guidance. ProblemWhen I run commands from a Databricks notebook, I see intermittent errors like:Connection reset Retrying request to https://us-eas...

  • 368 Views
  • 2 replies
  • 1 kudos
Latest Reply
abhishek13
New Contributor II
  • 1 kudos

can someone help on this

  • 1 kudos
1 More Replies
sai_sakhamuri
by Databricks Partner
  • 1993 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks optimization for query perfomance and pipeline run

I am currently working on optimizing several Spark pipelines and wanted to gather community insights on advanced performance tuning. Typically, my workflow for traditional SQL optimization involves a deep dive into the execution plan to identify bott...

  • 1993 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @sai_sakhamuri You're clearly past the basics. Let me give you a practitioner-level breakdown of each layer you mentioned, plus a few things that often get overlooked.Spark Catalyst Optimizer — Working With the Rules EngineCatalyst operates in fou...

  • 1 kudos
databrciks
by New Contributor III
  • 673 Views
  • 3 replies
  • 1 kudos

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

  • 673 Views
  • 3 replies
  • 1 kudos
Latest Reply
databrciks
New Contributor III
  • 1 kudos

Hi Ashwin Thanks for the quick response. Yes I want to pass all the tables through config parameter/param file and load that into bronze layerI will try this approach. Thanks 

  • 1 kudos
2 More Replies
Labels