cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ccsalt
by New Contributor II
  • 382 Views
  • 4 replies
  • 1 kudos

Inconsistent Cluster Log Persistence to Volume/S3 (stderr, stdout, log4j-active.log)

Saving logs from an all-purpose cluster to Volume or S3 is not consistent, because stderr, stdout, and log4j-active.log get overwritten when the cluster is restarted between minutes 01 and 59.Tested case:A job is configured to start every 20 minutes,...

  • 382 Views
  • 4 replies
  • 1 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 1 kudos

Hi @ccsalt , This is a known limitation. Log rotation (renaming to log4j-YYYY-MM-DD-HH.log.gz) only happens on the hour boundary. The active log file log4j-active.log has always the same name and is overwritten if a cluster restart happens within one...

  • 1 kudos
3 More Replies
Alessio_F
by New Contributor
  • 173 Views
  • 1 replies
  • 0 kudos

Extract SQL function in SQL Server federated database

Hi everyone,I'm using Azure Databricks with a customer who has a SQL Server database federated on the Unity Catalog.It seems that, while converting some date functions to the SQL Server dialect, Databricks uses the function "extract", which is not re...

  • 173 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @Alessio_F ,This happens because in Databricks SQL both year and month functions are just aliases over following patterns:- extract (YEAR FROM expr)- extract(MONTH FROM expr) When Databricks pushes a predicate or expression down to the remote SQL ...

  • 0 kudos
Raj_DB
by Contributor
  • 181 Views
  • 1 replies
  • 1 kudos

Resolved! Automating Job Permission Updates in Databricks Using a Notebook

Hi everyone,I am looking to create a notebook that, when executed by a user, performs the following actions:Retrieves all Databricks jobs created by the current userChecks whether a specific role already has permissions on those jobsAutomatically add...

  • 181 Views
  • 1 replies
  • 1 kudos
Latest Reply
ziafazal
Databricks Partner
  • 1 kudos

Hi @Raj_DB You can use databricks SDK to retrieve all jobs filter them by selecting only those where owner is current usersomething like thisfrom databricks.sdk import WorkspaceClient w = WorkspaceClient() # Specify the user email/username you want...

  • 1 kudos
vedanth
by New Contributor
  • 178 Views
  • 1 replies
  • 0 kudos

Salesforce Connector - Lakeflow Connect 400 Error

HI All,I have been trying to setup Salesforce using Lakeflow Connect and followed instructions on the docshttps://docs.databricks.com/aws/en/connect/managed-ingestion#sfdcHowever I face into invalid_grant error  However login history on salesforce sh...

vedanth_0-1779009668052.png
  • 178 Views
  • 1 replies
  • 0 kudos
Latest Reply
GaneshI
New Contributor II
  • 0 kudos

Hi Vedanth,The invalid_grant error usually occurs due to authentication or OAuth configuration issues between Salesforce and Databricks Lakeflow Connect.Could you please verify the following points:Ensure the Salesforce user account is not locked and...

  • 0 kudos
aonurdemir
by Contributor
  • 366 Views
  • 2 replies
  • 2 kudos

Liquid Clustering file pruning breaks when filtering on a high NULL numeric column in dataSkipping

EnvironmentCloud: AWSCompute: ServerlessTable: a_big_tableTable type: Streaming Table (SDP pipeline)Table size: 641 GB, 6,210 filesLiquid Clustering columns: [event_time, integer_userId]delta.dataSkippingStatsColumns:event_time, integer_userId, integ...

  • 366 Views
  • 2 replies
  • 2 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 2 kudos

Hello @aonurdemir , I looked into your query and have compiled some helpful tips: I don't have direct access to your workspace internals, so I can't prove this definitively. But what you're seeing is consistent with how Delta's stats-based data skipp...

  • 2 kudos
1 More Replies
kcyugesh
by New Contributor II
  • 390 Views
  • 2 replies
  • 0 kudos

Unity Catalog storage credential fails although same Access Connector works in another credential

  In Azure Databricks Unity Catalog, I have two storage credentials that use the same connector_id / Azure Databricks Access Connector.One credential works and can access ADLS Gen2 successfully, but the other fails with: Failed to access cloud storag...

  • 390 Views
  • 2 replies
  • 0 kudos
Latest Reply
zoe_unifeye
Databricks Partner
  • 0 kudos

Hi @kcyugesh How are you getting on so far?It might also be worth checking the privileges associated with each credential to see if they differ.And secondly check the credential type on the credential, as a manaded identity in comparison to a service...

  • 0 kudos
1 More Replies
Avinash_Narala
by Databricks Partner
  • 721 Views
  • 2 replies
  • 2 kudos

Resolved! Data Loss in Incremental Batch Jobs Due to Latency in delta file write to blob

Hi everyone,I am facing a data consistency issue in my Databricks incremental pipeline where records are being skipped because of a time gap between when a record is processed and when the physical file is finalized in Azure Blob Storage (ABFS).Our A...

  • 721 Views
  • 2 replies
  • 2 kudos
Latest Reply
balajij8
Contributor III
  • 2 kudos

You can handle it as belowFix the Bronze Write - The 20+ minutes commit gap suggests metadata contention or "Small File Issues" in the bronze delta tables. You can optimize tables manually or enable Optimized Write and Auto Optimize if feasible. This...

  • 2 kudos
1 More Replies
DavidKxx
by Contributor
  • 463 Views
  • 2 replies
  • 1 kudos

Resolved! Data in Unity Catalog that can't be previewed

This is a small deficiency, but a fix would be nice to have.For a long time now, the Sample Data previewer in the Unity Catalog explorer has been unable to show tables that contain a certain kind of column.  Instead of showing sample rows of the tabl...

  • 463 Views
  • 2 replies
  • 1 kudos
Latest Reply
DavidKxx
Contributor
  • 1 kudos

Yes, my vector space is commonly of dimension 4000 or 8000.I don't write any dense vectors to table; can't speak to what happens previewing that type.Thanks for taking up the issue!

  • 1 kudos
1 More Replies
tsam
by New Contributor II
  • 519 Views
  • 4 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 519 Views
  • 4 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re seeing (a monotonic / stair‑step climb in driver RAM over thousands of DEEP CLONE statements) is a very common pattern when the driver is not “holding data”, but holding metadata, query artifacts, and per‑command state that accumulates faster ...

  • 0 kudos
3 More Replies
kevinzhang29
by New Contributor III
  • 419 Views
  • 2 replies
  • 1 kudos

Resolved! Auto CDC fLow without CDF?

Auto CDC flow works with source table CDF enabled, but fails when CDF is disabled.The source table is updated via INSERT OVERWRITE.IS CDF mandatory?  

  • 419 Views
  • 2 replies
  • 1 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 1 kudos

Yes, @kevinzhang29 . For Auto CDC with a Delta source table, a change data feed (CDF) (i.e., a CDC feed) is required. AUTO CDC is explicitly designed to read from a CDC/change feed source such as Delta CDF, not from plain snapshots. When you don’t ha...

  • 1 kudos
1 More Replies
Raj_DB
by Contributor
  • 1276 Views
  • 7 replies
  • 11 kudos

Resolved! Designing Reliable Data Versioning Strategies in Databricks

Hi everyone,I’m working on a use case where I need to retain 30 days of historical data in a Delta table and use it to build trend reports.I’m looking for the best approach to reliably maintain this historical data while also making it suitable for r...

  • 1276 Views
  • 7 replies
  • 11 kudos
Latest Reply
DivyaandData
Databricks Employee
  • 11 kudos

Hey @Raj_DB , The TLDR is  time travel is great for short-term ops and debugging, but brittle as your primary reporting history, and its cost profile is harder to control and reason about than a purpose-built history table. Docs 1,2 explicitly say De...

  • 11 kudos
6 More Replies
200649021
by New Contributor II
  • 487 Views
  • 1 replies
  • 1 kudos

Data System & Architecture - PySpark Assignment

Title: Spark Structured Streaming – Airport Counts by CountryThis notebook demonstrates how to set up a Spark Structured Streaming job in Databricks Community Edition.It reads new CSV files from a Unity Catalog volume, processes them to count airport...

  • 487 Views
  • 1 replies
  • 1 kudos
Latest Reply
amirabedhiafi
Contributor
  • 1 kudos

That's cool ! why not git it ?

  • 1 kudos
databrciks
by New Contributor III
  • 597 Views
  • 2 replies
  • 2 kudos

Resolved! Delta table update

Hi Experts I have around 100 table in the bronze layer (DLT pipeline). We have created silver layer based on some logic around 20 silver layer tables.How to run the specific pipeline in silver layer when ever there is some update happens in the bronz...

  • 597 Views
  • 2 replies
  • 2 kudos
Latest Reply
databrciks
New Contributor III
  • 2 kudos

Thanks @anuj_lathi  for the Detailed explanation. This helps a lot .

  • 2 kudos
1 More Replies
AnandGNR
by New Contributor III
  • 1967 Views
  • 7 replies
  • 2 kudos

Unable to create secret scope -"Fetch request failed due expired user session"

Hi everyone,I’m trying to create an Azure Key Vault-backed secret scope in a Databricks Premium workspace, but I keep getting this error: Fetch request failed due expired user sessionSetup details:Databricks workspace: PremiumAzure Key Vault: Owner p...

  • 1967 Views
  • 7 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @AnandGNR ,Try to do following. Go to your KeyVault, then in Firewalls and virtual networks set:"Allow trusted Microsoft services to bypass this firewall."

  • 2 kudos
6 More Replies
Labels