Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
Hi Legends, I have a timeseries DELTA table having 707.1GiB, 7702 files, 262 Billion rows. (Mainly its timeseries data). This table is clustered on 2 columns (Timestamp col & 2nd one is descriptive column)I have designed a pipeline which runs every w...
Hi there, I'm finding this a bit trickier than originally expected and am hoping someone can help me understand if I'm missing something.I have 3 jobs:One orchestrator job (tasks are type run_job)Two "Parent" jobs (tasks are type notebook)parent1 run...
Hi, I'm wondering if there is an easier way to accomplish this.I can use Dynamic Value reference to pull the run_id of Parent 1 into Parent 2, however, what I'm looking for is for Child 1's task run_id to be referenced within Parent 2.Currently I am ...
Hi @ChristianRRL you're absolutely right, and I apologize for the earlier suggestion. I've verified that task values from child jobs are not propagated back through run_job tasks.
Your instinct about the REST API was correct. Here's the fix:
Solutio...
I have multiple tasks, each working with different tables. Each table has dependencies across Bronze, Silver, and Gold layers. I want to trigger and run a specific task independently, instead of running all tasks in the job. How can I do this? Also, ...
1. Go to job and left click on task you want to run .2. Click on play button(highlighted in yellow in attachment )3. This make sure that you run only 1 task at a time and not the whole job .
Queries that previously worked started failing in SQL Warehouse (Dashboards) without any changes on our side.The query succeeds, but fails to render results with error:"Cannot read properties of undefined (reading 'data')"This happens with:- system.b...
Same problem here. I have previously reported this issue, and it had been resolved at the time. However, the problem has now reoccurred.When ingesting large tables (over 100k rows), the system is unable to properly render the data, preventing the tab...
Public access to the Azure Databricks workspace is currently disabled. Access is required through a Private Link (private endpoint – api_ui).A private endpoint has already been configured successfully:Virtual Network: Vnet-PE-ENDPOINTSubnet: Snet-PE-...
I've created a Lakeflow job to run 5 notebook tasks, one for each silver table- Customers, Accounts, Transactions, Loans and Branches.In Customers notebook, after writing the data to delta table using auto loader, I'm applying the non null and primar...
@AanchalSoni Capturing the columns as Primary key helps users and tools understand relationships in the data. You can create Primary Key with RELY for optimization in some cases by skipping redundant operations.Distinct EliminationWhen you apply a DI...
Hi all,I have a table called classes, which is already partitioned on three different columns. I want to create a Liquid Clustered Table, but as far as I understand from the documentation—and from Dany Lee and his team—it was not possible as of 2024 ...
Is there a plan to implement a way to migrate to liquid clustering for an existing table that has traditional partitioning and that is quite large (over 4 TB)? Re-creating such tables from scratch is not always ideal.
Hi everyone,I’m trying to create an Azure Key Vault-backed secret scope in a Databricks Premium workspace, but I keep getting this error: Fetch request failed due expired user sessionSetup details:Databricks workspace: PremiumAzure Key Vault: Owner p...
Hi @AnandGNR ,Try to do following. Go to your KeyVault, then in Firewalls and virtual networks set:"Allow trusted Microsoft services to bypass this firewall."
I’m trying to create an Azure Key Vault-backed Secret Scope in Databricks, but when I click Create, I get this error:Fetch request failed due to expired user sessionI’ve already verified my login, permissions. I also tried refreshing and re-signing i...
Hi @SuMiT1 ,Certainly seems to be a networking issue but not able to zero down on what precisely needs to be done. I added the control plane ips to the firewall but still no luck.How do we use Databricks Access Connector to create scopes. Could you ...
Hi,Can we create a file based trigger from sharepoint location for excel files from databricks. So my need is to copy the excel files from sharepoint to external volumes in databricks so can it be done using a trigger that whenever the file drops in ...
HI,
You could possibly achieve something near to this using the lakeflow connect Sharepoint connector. It's currently in beta so it would need to be enabled in your workspace. Although it isn't triggered on file updates, because it only ingests incre...
When I saw the news that Matei Zaharia received the 2025 ACM Prize in Computing, I felt genuinely happy. It was not just another award announcement. It felt like a proud moment for the whole data engineering community. His work has helped shape the w...
@Brahmareddy, what a beautiful tribute! It’s so inspiring to hear how that meeting at the Summit stayed with you.We’re so lucky to have contributors like you who recognize the heart behind the tech. Cheers to Matei and the whole Databricks family!
Hi @IM_01,
You can’t change the UI to break out those numbers, but you can get per-expectation counts from the DLT (Lakeflow) event log. Each expectation entry records passed_records and failed_records; for EXPECT rules failed_records = warned rows, ...
In today’s hyper-connected world, Data Privacy has become a critical concern for individuals and businesses alike. Every time we browse a website, use an app, or make an online purchase, we leave behind a trail of personal information. This data can ...
Hi All, I'm looking to implement an automated, scalable, and auditable purge mechanism on Azure Databricks to manage data retention, deletion and archival policies across our Unity Catalog-governed Delta tables.I've come across various approaches, s...
Here is my action plan if it helps!
Phase 1: Foundation
☐ Migrate to UC managed tables (if not already)
☐ Enable Predictive Optimization at catalog level
☐ Set delta.deletedFileRetentionDuration per layer
Phase 2: Retention Policies
☐ Enab...