cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Darshan137
by New Contributor II
  • 150 Views
  • 2 replies
  • 1 kudos

Transitioning from ADF to Databricks Workflows: Best Practices in a Multi-Workspace (dev-prod)

Hi Community,We have a data processing framework running on Azure Databricks with Unity Catalog, and we're evaluating options to consolidate our orchestration entirely within the Databricks ecosystem.CURRENT ARCHITECTURE:~20 use cases, each containin...

  • 150 Views
  • 2 replies
  • 1 kudos
Latest Reply
amirabedhiafi
New Contributor II
  • 1 kudos

Hello @Darshan137  !Few things I will add to @Lu_Wang_ENB_DBX  answer that I did on a similar project.If ADF currently passes values such as environment, run date, catalog, schema, or business domain, define a clear parameter contract in Lakeflow Job...

  • 1 kudos
1 More Replies
Avinash_Narala
by Databricks Partner
  • 350 Views
  • 2 replies
  • 0 kudos

Data Loss in Incremental Batch Jobs Due to Latency in delta file write to blob

Hi everyone,I am facing a data consistency issue in my Databricks incremental pipeline where records are being skipped because of a time gap between when a record is processed and when the physical file is finalized in Azure Blob Storage (ABFS).Our A...

  • 350 Views
  • 2 replies
  • 0 kudos
Latest Reply
balajij8
Contributor III
  • 0 kudos

You can handle it as belowFix the Bronze Write - The 20+ minutes commit gap suggests metadata contention or "Small File Issues" in the bronze delta tables. You can optimize tables manually or enable Optimized Write and Auto Optimize if feasible. This...

  • 0 kudos
1 More Replies
Oumeima
by New Contributor III
  • 855 Views
  • 5 replies
  • 2 kudos

Resolved! Lakeflow Connect - SQL Server - Database Setup step keeps failing

Hello,I am trying to ingest data from an Azure SQL Database using lakeflow connect.- I'm using a service principle for authentication (created the login and user in the DB am trying to ingest)- The utility script was executed by a DB owner=== Install...

  • 855 Views
  • 5 replies
  • 2 kudos
Latest Reply
Oumeima
New Contributor III
  • 2 kudos

We figured out the issue finally! We checked the database sql audit logs and noticed that there was a particular query that was taking too long (4min) for the ingestion user. This was causing a timeout. This query is very simple and takes usually a c...

  • 2 kudos
4 More Replies
tsam
by New Contributor II
  • 284 Views
  • 4 replies
  • 0 kudos

Driver memory utilization grows continuously during job

I have a batch job that runs thousands of Deep Clone commands, it uses a ForEach task to run multiple Deep Clones in parallel. It was taking a very long time and I realized that the Driver was the main culprit since it was using up all of its memory ...

tsam_2-1776095245905.png
  • 284 Views
  • 4 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor II
  • 0 kudos

You’re seeing (a monotonic / stair‑step climb in driver RAM over thousands of DEEP CLONE statements) is a very common pattern when the driver is not “holding data”, but holding metadata, query artifacts, and per‑command state that accumulates faster ...

  • 0 kudos
3 More Replies
ittzzmalind
by New Contributor III
  • 132 Views
  • 1 replies
  • 0 kudos

Azure Databricks Serverless – SFTP Connectivity (external provider)

Hi,To establish connectivity from Azure Databricks serverless compute  to an external SFTP provider hosted outside organization (external provider).when i searched i figured out one way is whitelisting ip,1). The SFTP provider requires IP whitelistin...

  • 132 Views
  • 1 replies
  • 0 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 0 kudos

Hi @ittzzmalind ,How to do it is described in following section in docs:IP addresses and domains for Azure Databricks services and assets - Azure Databricks | Microsoft LearnKeep in mind that Azure Databricks might update outbound IPs as often as onc...

  • 0 kudos
MyProfile
by New Contributor
  • 235 Views
  • 1 replies
  • 0 kudos

Disable Public Network Access on Databricks Managed Storage Account - Deny Assignment

Issue Description:I am attempting to disable public network access on the Azure Databricks managed storage account. However, I am encountering the following error:Failed to save resource settings — access is denied due to a deny assignment created by...

  • 235 Views
  • 1 replies
  • 0 kudos
Latest Reply
Sumit_7
Honored Contributor III
  • 0 kudos

@MyProfile This would be helpful, check once - https://learn.microsoft.com/en-us/answers/questions/1707749/managed-storage-accounts-compliance

  • 0 kudos
abhijit007
by Databricks Partner
  • 204 Views
  • 2 replies
  • 2 kudos

Resolved! Redshift to Databricks Migration with Lakebridge

We are currently performing an assessment for a client’s Redshift to Databricks migration, and we would like to better understand the enhanced capabilities of Lakebridge for this use case.We would appreciate clarification on the following points:Scop...

  • 204 Views
  • 2 replies
  • 2 kudos
Latest Reply
pradeep_singh
Contributor III
  • 2 kudos

There is a nice course on Partner Academy as well . It uses SQL Server as a target system for migration but you can follow the same steps for Redshift as well . https://partner-academy.databricks.com/learn/courses/4326/lakebridge-for-sql-source-syste...

  • 2 kudos
1 More Replies
ittzzmalind
by New Contributor III
  • 885 Views
  • 1 replies
  • 1 kudos

Resolved! Accessing Azure Databricks Workspace via Private Endpoint and On-Premises Proxy

Public access to the Azure Databricks workspace is currently disabled. Access is required through a Private Link (private endpoint – api_ui).A private endpoint has already been configured successfully:Virtual Network: Vnet-PE-ENDPOINTSubnet: Snet-PE-...

  • 885 Views
  • 1 replies
  • 1 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 1 kudos

This is a classic hub-spoke + on-premises hybrid networking scenario. Here's how to architect it end-to-end. Architecture Overview The traffic flow will be: VM (VNet-App) --> ExpressRoute/VPN Gateway --> On-Prem Proxy Server --> ExpressRoute/VPN Gate...

  • 1 kudos
ittzzmalind
by New Contributor III
  • 284 Views
  • 1 replies
  • 1 kudos

Resolved! Delta Sharing with Materialized View - recepient data not refreshing when using Open Protocol

Scenario: Delta Sharing with Materialized ViewProvider Side Setup :->A Delta Share was created.->A materialized view was added to the share.->Recipients Created-> 1). Open Delta Sharing recipient       Accessed using Python (import delta_sharing)->2)...

  • 284 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ittzzmalind, This is expected behaviour and is mainly due to how Delta Sharing handles materialized views for open (non-Databricks) recipients versus Databricks-to-Databricks recipients. For Databricks-to-Databricks recipients, the shared materia...

  • 1 kudos
ittzzmalind
by New Contributor III
  • 320 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks Workspace - Unknow IP access

Azure monitor log showing unknow ip authentication requests to Databricks workspace . -- When searched ip below url, result showing its from AZURE CLOUD : <Region> (the region is same as workspace)https://azureipranges.azurewebsites.net/SearchFor -- ...

  • 320 Views
  • 1 replies
  • 1 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 1 kudos

Hi @ittzzmalind, Because the IP is in the same Azure region but not listed in the Azure Databricks control plane ranges, it’s very likely not a Databricks owned control plane IP. It’s typically either a user or service coming from another Azure resou...

  • 1 kudos
sai_sakhamuri
by Databricks Partner
  • 1202 Views
  • 1 replies
  • 1 kudos

Resolved! Databricks optimization for query perfomance and pipeline run

I am currently working on optimizing several Spark pipelines and wanted to gather community insights on advanced performance tuning. Typically, my workflow for traditional SQL optimization involves a deep dive into the execution plan to identify bott...

  • 1202 Views
  • 1 replies
  • 1 kudos
Latest Reply
lingareddy_Alva
Esteemed Contributor
  • 1 kudos

Hi @sai_sakhamuri You're clearly past the basics. Let me give you a practitioner-level breakdown of each layer you mentioned, plus a few things that often get overlooked.Spark Catalyst Optimizer — Working With the Rules EngineCatalyst operates in fou...

  • 1 kudos
databrciks
by New Contributor III
  • 546 Views
  • 3 replies
  • 1 kudos

Resolved! Parametrize the DLT pipeline for dynamic loading of many tables

I need to load many tables into Bronze layer connecting to sql server DB. How can i pass the tables names dynamically in DLT. Means one code pass many tables and load into bronze layer

  • 546 Views
  • 3 replies
  • 1 kudos
Latest Reply
databrciks
New Contributor III
  • 1 kudos

Hi Ashwin Thanks for the quick response. Yes I want to pass all the tables through config parameter/param file and load that into bronze layerI will try this approach. Thanks 

  • 1 kudos
2 More Replies
ittzzmalind
by New Contributor III
  • 248 Views
  • 2 replies
  • 0 kudos

DLT Pipeline Error -key not found: all_info_dlt_cx_utils_cod resulting in a NoSuchElementException.

Databricks ETL pipeline, specifically an error with the @DP.expectorfail decorator causing the pipeline update to fail. The error message indicated a 'key not found: all_info_dlt_cx_utils_cod ' resulting in a NoSuchElementException.Note: if we commen...

  • 248 Views
  • 2 replies
  • 0 kudos
Latest Reply
ittzzmalind
New Contributor III
  • 0 kudos

@MoJaMa Thanks for the reply, The issue was in the code, corrected code worked

  • 0 kudos
1 More Replies
demo-user
by New Contributor III
  • 528 Views
  • 2 replies
  • 0 kudos

S3A Connector Trying to Use AWS STS on Non-AWS S3 Endpoint

Hi everyone,I’m trying to write Delta tables to my S3-compatible (non-AWS) endpoint, and it was writing perfectly fine last week with the same setup. Now, without any changes on my end, it’s failing and giving me anUnknownException: (com.amazonaws.se...

  • 528 Views
  • 2 replies
  • 0 kudos
Latest Reply
aleksandra_ch
Databricks Employee
  • 0 kudos

Hi @demo-user , Can you share more information about your setup: Cluster type and DBR versionS3-compatible storage implementation (MinIO / something else?) AFAIK this is not supposed to work as Delta client in DBR relies on AWS STS to perform S3 comm...

  • 0 kudos
1 More Replies
rwhitepwt
by New Contributor III
  • 569 Views
  • 4 replies
  • 2 kudos

Resolved! Netsuite Data Connector Not Available

I see that the Azure Databricks Data Connector for Netsuite is in Public Preview. Unfortunately I am unable to see it in my instance. I have gone into Preview and selected it as enabled, have downloaded the JAR file from Netsuite and have setup the i...

  • 569 Views
  • 4 replies
  • 2 kudos
Latest Reply
Ashwin_DSA
Databricks Employee
  • 2 kudos

Hi @rwhitepwt, From what I can see, having the NetSuite connector in Public Preview doesn’t automatically guarantee that the tile appears in every workspace. In addition to enabling the preview and creating the UC connection + uploading the SuiteAnal...

  • 2 kudos
3 More Replies
Labels