cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

Ashok_Vengala
by Visitor
  • 11 Views
  • 1 replies
  • 0 kudos

Unable to Add Multiple Columns in Single ALTER TABLE Statement on Iceberg Table via Unity REST Catal

Hello Databricks Team,I have implemented code to integrate the Iceberg Unity REST Catalog with the Teradata OTF engine and successfully performed read and write operations, following the documentation at https://docs.databricks.com/aws/en/external-ac...

  • 11 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 0 kudos

This error stems from the Iceberg table metadata update constraints enforced by the Unity Catalog's REST API. Specifically, the Iceberg REST Catalog currently does not support multiple schema changes in a single commit. Each ALTER TABLE operation tha...

  • 0 kudos
TejeshS
by Contributor
  • 3084 Views
  • 3 replies
  • 1 kudos

How to identify which columns we need to consider for liquid clustering from a table of 200+ columns

In Databricks, when working with a table that has a large number of columns (e.g., 200), it can be challenging to determine which columns are most important for liquid clustering.Objective: The goal is to determine which columns to select based on th...

  • 3084 Views
  • 3 replies
  • 1 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 1 kudos

@Alberto_Umana is it possible to get from system table the columns used in joins & filters of a table being queried?

  • 1 kudos
2 More Replies
Alby091
by New Contributor
  • 1245 Views
  • 2 replies
  • 0 kudos

Multiple schedules in workflow with different parameters

I have a notebook that takes a file from the landing, processes it and saves a delta table.This notebook contains a parameter (time_prm) that allows you to do this option for the different versions of files that arrive every day.Specifically, for eac...

Data Engineering
parameters
Workflows
  • 1245 Views
  • 2 replies
  • 0 kudos
Latest Reply
ImranA
Contributor
  • 0 kudos

You can do multiple schedules with Cron expression. If you are using a Cron expression in Databricks asset bundle YAML, but the limitation is you can't have one running at 0 past the hour and another at 25 past.i.e: quartz_cron_expression: 0 45 9,23 ...

  • 0 kudos
1 More Replies
Spenyo
by New Contributor II
  • 1602 Views
  • 1 replies
  • 1 kudos

Delta table size not shrinking after Vacuum

Hi team.Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.What we done:SET spark.databricks.delta.retentionDurationCheck.enabled = false ALT...

chrome_KZMxPl8x1d.png
  • 1602 Views
  • 1 replies
  • 1 kudos
Latest Reply
pabloaschieri
  • 1 kudos

Hi, any update on this? Thanks

  • 1 kudos
vamsi_simbus
by New Contributor III
  • 80 Views
  • 2 replies
  • 1 kudos

Migrating Talend ETL Jobs to Databricks – Best Practices & Challenges

Hi All,I’m currently working on a Proof of Concept (POC) to migrate existing Talend ETL jobs to Databricks. The goal is to leverage Databricks for data processing and orchestration while moving away from Talend.I’d appreciate insights on the followin...

Data Engineering
migration
Talend
  • 80 Views
  • 2 replies
  • 1 kudos
Latest Reply
vamsi_simbus
New Contributor III
  • 1 kudos

@AbhaySingh Thank you for your insights.

  • 1 kudos
1 More Replies
turagittech
by Contributor
  • 2358 Views
  • 1 replies
  • 0 kudos

Batch reading from sql server tables with cdc on ssql server tables

Hi all,I need to do a batch load from sql server into Databricks. I have CC enabled on some tables. The simple appears to be union CDC and regular table to get a single set of records to load, but this appears to be fraught with risk of out of sequen...

  • 2358 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
New Contributor II
  • 0 kudos

For Q1, Consider this approach or a watermark based approach:https://learn.microsoft.com/en-us/azure/databricks/ldp/cdcFor Q2,You have few options: Pushdown via JDBC Query, Use a SQL View instead if you want to avoid replicating the logic in spark. 

  • 0 kudos
gudurusreddy99
by Visitor
  • 13 Views
  • 1 replies
  • 0 kudos

Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records per batch

Databricks DLT Joins: Streaming table join with Delta table is reading 2 Billion records from Delta Table for each and every Micro batch.How to overcome this issue to not to read 2 Billion records for every micro batch.Your suggestions and feedback w...

  • 13 Views
  • 1 replies
  • 0 kudos
Latest Reply
ManojkMohan
Honored Contributor
  • 0 kudos

@gudurusreddy99   Root CauseWhen performing stream-static joins, which means joining a streaming source with a static Delta table, Spark will scan the entire Delta table for each micro-batch if the static table is not filtered or incrementally joined...

  • 0 kudos
fjrodriguez
by New Contributor III
  • 376 Views
  • 2 replies
  • 1 kudos

Resolved! Ingestion Framework

I would to like to update my ingestion framework that is orchestrated by ADF, running couples Databricks notebook and copying the data to DB afterwards. I want to rely everything on Databricks i though this could be the design:Step 1. Expose target t...

  • 376 Views
  • 2 replies
  • 1 kudos
Latest Reply
fjrodriguez
New Contributor III
  • 1 kudos

Hey @saurabh18cs , It is taking longer than expected to expose Azure SQL tables in UC. I can do that through Foreign Catalog but this is not what i want due to is read-only. As far i can see external connection is for cloud object storage paths (ADLS...

  • 1 kudos
1 More Replies
saicharandeepb
by New Contributor III
  • 99 Views
  • 4 replies
  • 1 kudos

How to Retrieve DBU Count per Compute Type for Accurate Cost Calculation?

Hello Everyone,We are currently working on a cost analysis initiative to gain deeper insights into our Databricks usage. As part of this effort, we are trying to calculate the hourly cost of each Databricks compute instance by utilizing the Azure Ret...

  • 99 Views
  • 4 replies
  • 1 kudos
Latest Reply
saicharandeepb
New Contributor III
  • 1 kudos

Hi everyone, just to clarify my question — I’m looking for the DBU count per compute type (per instance type), not the total DBU consumption per workload.In other words, I want to know the fixed DBU rate assigned to each compute SKU (for example, DS3...

  • 1 kudos
3 More Replies
Jpeterson
by New Contributor III
  • 5521 Views
  • 8 replies
  • 4 kudos

Databricks SQL Warehouse, Tableau and spark.driver.maxResultSize error

I'm attempting to create a tableau extract on tableau server with a connection to databricks large sql warehouse. The extract process fails due to spark.driver.maxResultSize error.Using a databricks interactive cluster in the data science & engineer...

  • 5521 Views
  • 8 replies
  • 4 kudos
Latest Reply
Oliverarson
  • 4 kudos

It sounds like you're running into quite a frustrating issue with Databricks and Tableau! Adjusting the spark.driver.maxResultSize is a good idea, but if you're still facing challenges, consider streamlining your data selections or aggregating your r...

  • 4 kudos
7 More Replies
janglais
by Visitor
  • 28 Views
  • 0 replies
  • 0 kudos

DLT Pipeline with unknown deleted source data

Hello.. I need help. So the context is : - ERP data for company in my group is stored in sql tables - Currently, once per day we copy the last 2 months of data (creation date) from each table into our datalake landing zone (we can however do full cop...

  • 28 Views
  • 0 replies
  • 0 kudos
Mits11
by New Contributor
  • 33 Views
  • 0 replies
  • 0 kudos

Community edition cluster - UI shows incorrect cores

Hi,I am a community edition user which gives me cluster ( as per below image)15GB of memory and 2 cores with one driver node ONLY.However,when I read a csv file of 181MB size,1) it generates 8 partitiones.As per default maxPartitionBytes is set to 12...

Mits11_1-1761165245208.png Mits11_3-1761165673802.png Mits11_2-1761165566839.png
  • 33 Views
  • 0 replies
  • 0 kudos
Rjdudley
by Honored Contributor
  • 275 Views
  • 3 replies
  • 0 kudos

Resolved! AUTO CDC API and sequence column

The docs for AUTO CDC API stateYou must specify a column in the source data on which to sequence records, which Lakeflow Declarative Pipelines interprets as a monotonically increasing representation of the proper ordering of the source data.Can this ...

  • 275 Views
  • 3 replies
  • 0 kudos
Latest Reply
Rjdudley
Honored Contributor
  • 0 kudos

Thanks Szymon, I'm familiar with the Postgre SQL implementation and was hoping Databricks would behave the same.

  • 0 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels