cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

ManojkMohan
by Honored Contributor
  • 3 Views
  • 0 replies
  • 0 kudos

Exposing Databricks API in Salesforce

Use Case:I want to expose a data bricks API URL in Salesforce, Salesforce will hit that exposed end point every time a record is created and data will be transferred from Salesforce to DatabricksWhen i try creating a serving end pointI am unable to s...

ManojkMohan_0-1761252978733.png
  • 3 Views
  • 0 replies
  • 0 kudos
AshMod
by Visitor
  • 28 Views
  • 2 replies
  • 1 kudos

Job runs on serverless eventhough Job config has cluster definitions

Hi,I am defining the job along with job cluster specification using python sdk. But when the job runs it is using the serverless compute, instead of the defined cluster. I can say the job uses serverless from the job_run log and also from the system....

AshMod_0-1761230207138.png AshMod_1-1761230258750.png AshMod_2-1761231900872.png
  • 28 Views
  • 2 replies
  • 1 kudos
Latest Reply
AshMod
Visitor
  • 1 kudos

Thanks for checking @ManojkMohan. I found the issue in the job task definition. There is a job_clusters list in the job definition, where I provide the cluster config details. But this alone is not sufficient to have the task use the cluster. The job...

  • 1 kudos
1 More Replies
QuanSun
by New Contributor II
  • 1316 Views
  • 5 replies
  • 2 kudos

How to select performance mode for Databricks Delta Live Tables

Hi everyone,Based on the official link,For triggered pipelines, you can select the serverless compute performance mode using the Performance optimized setting in the pipeline scheduler. When this setting is disabled, the pipeline uses standard perfor...

  • 1316 Views
  • 5 replies
  • 2 kudos
Latest Reply
BF7
Contributor
  • 2 kudos

I have learned that this parameter is not governed in the pipeline configuration itself, but in the job task that runs the pipeline. This is confusing to me and I don't like it. 

  • 2 kudos
4 More Replies
saab123
by New Contributor II
  • 3186 Views
  • 1 replies
  • 0 kudos

Not able to connect to Neo4j Aura Db from databricks

I am trying to connect to a Neo4j AuraDb instance-f9374927. Created a free professional instance of Neo4j. I am able to connect to this instance, add nodes and relationships.   Created a Databricks shared cluster 14.3 LTS (includes Apache Spark 3.5.0...

  • 3186 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The connection issue between your Databricks cluster and Neo4j AuraDB instance (f9374927) with the ServiceUnavailableException: No routing server available message is tied to network-level SSL configuration and connectivity rather than incorrect code...

  • 0 kudos
dbxlearner
by New Contributor II
  • 2749 Views
  • 3 replies
  • 0 kudos

Deploying using Databricks asset bundles (DABs) in a closed network

Hello, I'm trying to deploy DBX workflows using DABs using an Azure DevOps pipeline, in a network that cannot download the required terraform databricks provider package online, due to firewall/network restrictions.I have followed this post: https://...

  • 2749 Views
  • 3 replies
  • 0 kudos
Latest Reply
dbxlearner
New Contributor II
  • 0 kudos

Another thing I noticed is, when running the 'databricks bundle debug terraform' command, it mentions these variables:I have tried setting these variables as environment variables in my ADO pipeline, specially the databricks terraform provider variab...

  • 0 kudos
2 More Replies
ticuss
by New Contributor
  • 45 Views
  • 1 replies
  • 0 kudos

Lakebase / Feature Store error: “Failed to get identity details for username” (service principal)

Hello,I’m running into a Lakebase / Feature Store issue related to service principal authentication when trying to log or read from the Databricks Feature Store. Migrating from the legacy online tables.  Here’s the exact error:psycopg2.OperationalErr...

  • 45 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

The error you’re encountering —psycopg2.OperationalError: FATAL: Failed to get identity details for username: "user_uuid" — typically arises from an OAuth identity mismatch or invalid token scope when a Databricks service principal is used to authent...

  • 0 kudos
janglais
by Visitor
  • 47 Views
  • 2 replies
  • 0 kudos

DLT Pipeline with unknown deleted source data

Hello.. I need help. So the context is : - ERP data for company in my group is stored in sql tables - Currently, once per day we copy the last 2 months of data (creation date) from each table into our datalake landing zone (we can however do full cop...

  • 47 Views
  • 2 replies
  • 0 kudos
Latest Reply
madams
Contributor III
  • 0 kudos

Your solution #1 is very frustrating to me as well, for a number of reasons.  Simply put, we have to be able to compare incoming data to target data for normal ETL operations. One way around this is to create a view of your target silver table, outsi...

  • 0 kudos
1 More Replies
kevinzhang29
by New Contributor
  • 29 Views
  • 1 replies
  • 0 kudos

DLT pipeline failed: streaming table query reading from an unexpected Delta table ID

Hi everyone,I'm running a DLT pipeline that loads data from Bronze to Silver using dlt.apply_changes(SCD type 2)The first run of the pipeline worked fine -- data was written successfully into the target Silver tables.However, when I ingested new data...

  • 29 Views
  • 1 replies
  • 0 kudos
Latest Reply
mark_ott
Databricks Employee
  • 0 kudos

This “unexpected Delta table ID” error typically means your Delta Live Tables (DLT) pipeline detected that the underlying Delta table it was reading from has changed since the last checkpoint. When you use dlt.apply_changes() (for SCD Type 2), this i...

  • 0 kudos
Mits11
by New Contributor II
  • 47 Views
  • 2 replies
  • 0 kudos

Community edition cluster - UI shows incorrect cores

Hi,I am a community edition user which gives me cluster ( as per below image)15GB of memory and 2 cores with one driver node ONLY.However,when I read a csv file of 181MB size,1) it generates 8 partitiones.As per default maxPartitionBytes is set to 12...

Mits11_1-1761165245208.png Mits11_3-1761165673802.png Mits11_2-1761165566839.png
  • 47 Views
  • 2 replies
  • 0 kudos
Latest Reply
Mits11
New Contributor II
  • 0 kudos

Thank you Louis for detailed explaination.Including notifying me about CE updates.However, I have noticed this ( below is the screenshot)spark.sql.files.minPartitionNum does not restun any result.Its wierd.Am I missing anything?Thanks 

  • 0 kudos
1 More Replies
Ashok_Vengala
by Visitor
  • 25 Views
  • 1 replies
  • 0 kudos

Unable to Add Multiple Columns in Single ALTER TABLE Statement on Iceberg Table via Unity REST Catal

Hello Databricks Team,I have implemented code to integrate the Iceberg Unity REST Catalog with the Teradata OTF engine and successfully performed read and write operations, following the documentation at https://docs.databricks.com/aws/en/external-ac...

  • 25 Views
  • 1 replies
  • 0 kudos
Latest Reply
nayan_wylde
Honored Contributor III
  • 0 kudos

This error stems from the Iceberg table metadata update constraints enforced by the Unity Catalog's REST API. Specifically, the Iceberg REST Catalog currently does not support multiple schema changes in a single commit. Each ALTER TABLE operation tha...

  • 0 kudos
TejeshS
by Contributor
  • 3128 Views
  • 3 replies
  • 1 kudos

How to identify which columns we need to consider for liquid clustering from a table of 200+ columns

In Databricks, when working with a table that has a large number of columns (e.g., 200), it can be challenging to determine which columns are most important for liquid clustering.Objective: The goal is to determine which columns to select based on th...

  • 3128 Views
  • 3 replies
  • 1 kudos
Latest Reply
noorbasha534
Valued Contributor II
  • 1 kudos

@Alberto_Umana is it possible to get from system table the columns used in joins & filters of a table being queried?

  • 1 kudos
2 More Replies
Alby091
by New Contributor
  • 1259 Views
  • 2 replies
  • 0 kudos

Multiple schedules in workflow with different parameters

I have a notebook that takes a file from the landing, processes it and saves a delta table.This notebook contains a parameter (time_prm) that allows you to do this option for the different versions of files that arrive every day.Specifically, for eac...

Data Engineering
parameters
Workflows
  • 1259 Views
  • 2 replies
  • 0 kudos
Latest Reply
ImranA
Contributor
  • 0 kudos

You can do multiple schedules with Cron expression. If you are using a Cron expression in Databricks asset bundle YAML, but the limitation is you can't have one running at 0 past the hour and another at 25 past.i.e: quartz_cron_expression: 0 45 9,23 ...

  • 0 kudos
1 More Replies
Spenyo
by New Contributor II
  • 1607 Views
  • 1 replies
  • 1 kudos

Delta table size not shrinking after Vacuum

Hi team.Everyday once we overwrite the last X month data in tables. So it generate a every day a larger amount of history. We don't use time travel so we don't need it.What we done:SET spark.databricks.delta.retentionDurationCheck.enabled = false ALT...

chrome_KZMxPl8x1d.png
  • 1607 Views
  • 1 replies
  • 1 kudos
Latest Reply
pabloaschieri
  • 1 kudos

Hi, any update on this? Thanks

  • 1 kudos
vamsi_simbus
by New Contributor III
  • 99 Views
  • 2 replies
  • 1 kudos

Migrating Talend ETL Jobs to Databricks – Best Practices & Challenges

Hi All,I’m currently working on a Proof of Concept (POC) to migrate existing Talend ETL jobs to Databricks. The goal is to leverage Databricks for data processing and orchestration while moving away from Talend.I’d appreciate insights on the followin...

Data Engineering
migration
Talend
  • 99 Views
  • 2 replies
  • 1 kudos
Latest Reply
vamsi_simbus
New Contributor III
  • 1 kudos

@AbhaySingh Thank you for your insights.

  • 1 kudos
1 More Replies
turagittech
by Contributor
  • 2365 Views
  • 1 replies
  • 0 kudos

Batch reading from sql server tables with cdc on ssql server tables

Hi all,I need to do a batch load from sql server into Databricks. I have CC enabled on some tables. The simple appears to be union CDC and regular table to get a single set of records to load, but this appears to be fraught with risk of out of sequen...

  • 2365 Views
  • 1 replies
  • 0 kudos
Latest Reply
AbhaySingh
New Contributor II
  • 0 kudos

For Q1, Consider this approach or a watermark based approach:https://learn.microsoft.com/en-us/azure/databricks/ldp/cdcFor Q2,You have few options: Pushdown via JDBC Query, Use a SQL View instead if you want to avoid replicating the logic in spark. 

  • 0 kudos

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels