cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

subray
by New Contributor II
  • 1257 Views
  • 3 replies
  • 0 kudos

Resolved! databricks-connect serverless GRPC issue

Queries executed via Databricks Connect v17 (Spark Connect / gRPC) onserverless compute COMPLETE SUCCESSFULLY on the server side (Spark tasksfinish, results are produced), but the Spark Connect gRPC channel FAILSTO DELIVER results back to the client ...

  • 1257 Views
  • 3 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

This is a well-known class of issue with gRPC/HTTP2 long-lived streams being killed by network intermediaries. The fact that the Databricks SQL Connector (poll-based HTTP/1.1) works perfectly while Spark Connect (gRPC/HTTP2 streaming) fails is the ke...

  • 0 kudos
2 More Replies
ittzzmalind
by New Contributor III
  • 3851 Views
  • 1 replies
  • 1 kudos

Resolved! Accessing Azure Databricks Workspace via Private Endpoint and On-Premises Proxy

Public access to the Azure Databricks workspace is currently disabled. Access is required through a Private Link (private endpoint – api_ui).A private endpoint has already been configured successfully:Virtual Network: Vnet-PE-ENDPOINTSubnet: Snet-PE-...

  • 3851 Views
  • 1 replies
  • 1 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 1 kudos

This is a classic hub-spoke + on-premises hybrid networking scenario. Here's how to architect it end-to-end. Architecture Overview The traffic flow will be: VM (VNet-App) --> ExpressRoute/VPN Gateway --> On-Prem Proxy Server --> ExpressRoute/VPN Gate...

  • 1 kudos
FAHADURREHMAN
by New Contributor III
  • 1178 Views
  • 2 replies
  • 2 kudos

Resolved! DELTA Merge taking too much Time

Hi Legends, I have a timeseries DELTA table having 707.1GiB, 7702 files, 262 Billion rows. (Mainly its timeseries data). This table is clustered on 2 columns (Timestamp col & 2nd one is descriptive column)I have designed a pipeline which runs every w...

  • 1178 Views
  • 2 replies
  • 2 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 2 kudos

Great question -- slow MERGE is one of the most common Delta Lake performance issues. Here's a systematic checklist: 1. Partition Pruning in the MERGE Condition The #1 cause of slow MERGEs is missing the partition column in your ON clause. If your ta...

  • 2 kudos
1 More Replies
shan-databricks
by Databricks Partner
  • 920 Views
  • 3 replies
  • 0 kudos

Resolved! Invoking one job from another to execute a specific task

I have multiple tasks, each working with different tables. Each table has dependencies across Bronze, Silver, and Gold layers. I want to trigger and run a specific task independently, instead of running all tasks in the job. How can I do this? Also, ...

  • 920 Views
  • 3 replies
  • 0 kudos
Latest Reply
rohan22sri
New Contributor III
  • 0 kudos

1. Go to job and left click on task you want to run .2. Click on play button(highlighted in yellow in attachment )3. This make sure that you run only 1 task at a time and not the whole job . 

  • 0 kudos
2 More Replies
AanchalSoni
by Databricks Partner
  • 1951 Views
  • 7 replies
  • 6 kudos

Resolved! Primary key constraint not working

I've created a Lakeflow job to run 5 notebook tasks, one for each silver table- Customers, Accounts, Transactions, Loans and Branches.In Customers notebook, after writing the data to delta table using auto loader, I'm applying the non null and primar...

  • 1951 Views
  • 7 replies
  • 6 kudos
Latest Reply
balajij8
Contributor III
  • 6 kudos

@AanchalSoni Capturing the columns as Primary key helps users and tools understand relationships in the data. You can create Primary Key with RELY for optimization in some cases by skipping redundant operations.Distinct EliminationWhen you apply a DI...

  • 6 kudos
6 More Replies
AnandGNR
by New Contributor III
  • 2971 Views
  • 7 replies
  • 2 kudos

Unable to create secret scope -"Fetch request failed due expired user session"

Hi everyone,I’m trying to create an Azure Key Vault-backed secret scope in a Databricks Premium workspace, but I keep getting this error: Fetch request failed due expired user sessionSetup details:Databricks workspace: PremiumAzure Key Vault: Owner p...

  • 2971 Views
  • 7 replies
  • 2 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 2 kudos

Hi @AnandGNR ,Try to do following. Go to your KeyVault, then in Firewalls and virtual networks set:"Allow trusted Microsoft services to bypass this firewall."

  • 2 kudos
6 More Replies
Phani1
by Databricks MVP
  • 2380 Views
  • 6 replies
  • 4 kudos

Resolved! Best Practices for Implementing Automated, Scalable, and Auditable Purge Mechanism on Azure Databric

 Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...

  • 2380 Views
  • 6 replies
  • 4 kudos
Latest Reply
AbhaySingh
Databricks Employee
  • 4 kudos

Here is my action plan if it helps! Phase 1: Foundation ☐ Migrate to UC managed tables (if not already) ☐ Enable Predictive Optimization at catalog level ☐ Set delta.deletedFileRetentionDuration per layer Phase 2: Retention Policies ☐ Enab...

  • 4 kudos
5 More Replies
jayhcunningham
by New Contributor
  • 547 Views
  • 1 replies
  • 0 kudos

Does anyone know the Databricks-specific Python syntax highlight rules?

The documentation on databricks.com says the following, in the context of configuring Python linting via pyproject.toml:You can also disable Databricks-written syntax highlighting rules with a block such as:[tool.databricks]disabled_rules = ['DB01', ...

  • 547 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — you're right that these Databricks-specific rule codes (DB01, DB03, etc.) are not documented anywhere publicly. The notebook editor docs only mention them as a configuration example without explaining what each rule checks. What We Know The DB* ...

  • 0 kudos
norbitek
by New Contributor II
  • 652 Views
  • 1 replies
  • 1 kudos

Resolved! variant_explode_outer stop working after the last DBX runtime patch

Hi All,I import following JSON to delta table into VARIANT column:{ "data": [ { "group": 1, "manager": "no", "firstname": "John", "lastname": "Smith", "active": "false", ...

  • 652 Views
  • 1 replies
  • 1 kudos
Latest Reply
emma_s
Databricks Employee
  • 1 kudos

Hi,  I've been testing this on a workspace at my end and see exactly the same thing. I'd first recommend raising a support ticket for this.  In the meantime you can use the following workaround: I reproduced it on DBR 18.0 using readStream + cloudFil...

  • 1 kudos
mordex
by New Contributor III
  • 574 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks workflows for APIs with different frequencies (cluster keeps restarting)

  Title: Databricks workflows for APIs with different frequencies (cluster keeps restarting)Hey everyone,I’m stuck with a Databricks workflow design and could use some advice.Currently, we are calling 70+ APIs Right now the workflow looks something l...

  • 574 Views
  • 2 replies
  • 0 kudos
Latest Reply
emma_s
Databricks Employee
  • 0 kudos

You're right that job clusters are the wrong fit here. The cold start time (including serverless, which is still 25-50s) makes anything under 5 minutes impractical when the cluster terminates between runs. The simplest approach: all-purpose cluster +...

  • 0 kudos
1 More Replies
holychs
by Databricks Partner
  • 1165 Views
  • 2 replies
  • 0 kudos

Resolved! Run failed with error message Cluster was terminated. Reason: JOB_FINISHED (SUCCESS)

I am running a notebook through workflow using all purpose cluster("data_security_mode": "USER_ISOLATION"). I am seeing some strange behaviour with the cluster during the run. While the job is still running cluster gets terminated with the Reason: Re...

Data Engineering
clusterds
clusters
jobs
Workflows
  • 1165 Views
  • 2 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — the JOB_FINISHED (SUCCESS) termination reason is the key clue here. It means another job that was using the same all-purpose cluster finished, and its completion triggered the cluster termination — taking your still-running job down with it. Mos...

  • 0 kudos
1 More Replies
vamsi_simbus
by Databricks Partner
  • 1407 Views
  • 2 replies
  • 1 kudos

Resolved! Drill-down support in Databricks SQL (Lakeview) Dashboards

Hi All,Does Databricks SQL (Lakeview) Dashboards support native drill-down functionality (for example: Category → Subcategory → SKU)?Currently, we see support for cross-filtering, parameters, and drill-through within the same dataset, but hierarchica...

  • 1407 Views
  • 2 replies
  • 1 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 1 kudos

Hi — good question. You're right that Lakeview doesn't have native hierarchical drill-down (click Category → auto-expand to Subcategory → SKU). But you can get fairly close by combining the features you mentioned. Here are the practical patterns: 1. ...

  • 1 kudos
1 More Replies
fdubourdeau
by New Contributor
  • 784 Views
  • 1 replies
  • 0 kudos

Resolved! Querying CDF on a Delta-Sharing table after data type change in the Table (INT to DECIMAL)

Hi,I am trying to query the CDF of a Delta-Sharing table that have had a change in data type of one its columns. The change was from an INT to a DECIMAL. When reading the specific version where the schema change happened, I am receiving an error ment...

  • 784 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — this is a known limitation of Change Data Feed. Here's what's happening and your options. Why This Happens Changing a column from INT to DECIMAL is a non-additive schema change. When reading CDF in batch mode, Delta Lake applies a single schema ...

  • 0 kudos
GvReddy
by New Contributor
  • 1322 Views
  • 1 replies
  • 0 kudos

Resolved! Guidance on App Deployment in Databricks Public Marketplace

Hello Team,Hope you are doing well.I am currently learning Databricks and have developed an application in my local workspace under a Databricks Partner account, where I also have Marketplace Admin access. However, I am unsure about the process of pu...

  • 1322 Views
  • 1 replies
  • 0 kudos
Latest Reply
anuj_lathi
Databricks Employee
  • 0 kudos

Hi — great question! Here's what you need to know. Key Thing to Know First Currently, Databricks Apps (Streamlit, Dash, Gradio, etc.) listed on the Marketplace are first-party Databricks-owned apps only. External/partner app publishing is not yet sup...

  • 0 kudos
NW1000
by New Contributor III
  • 1149 Views
  • 6 replies
  • 0 kudos

Shorten Classic Cluster start up time

We use R notebooks to generate workflow. Thus we have to use classic clusters. And we need roughly 10 additional R packages in addition to 2 pyPI packages. It takes at least 10-20 min to start the cluster. We found the most time taken were the packag...

  • 1149 Views
  • 6 replies
  • 0 kudos
Latest Reply
Louis_Frolio
Databricks Employee
  • 0 kudos

Hi @NW1000 , Glad you tried my suggestion, and thanks for sharing the details. 1. Why the init script failed This message: Init script failure: Cluster scoped init script ... failed: Script exit status is non-zero really just means that something ins...

  • 0 kudos
5 More Replies
Labels