Data Engineering

Forum Posts

Sorted by:

by Dhruv-22 • Contributor

24m ago

4 Views
0 replies
0 kudos

Can't mergeSchema handle int and bigint?

I have a table which has a column of data type 'bigint'. While overwriting it with new data, given that I do full loads, I used 'mergeSchema' to handle schema changes. The new data's datatype was int. I thought mergeSchema can easily handle that, but...

Data Engineering

4 Views
0 replies
0 kudos

24m ago

by saicharandeepb • New Contributor III

3 hours ago

18 Views
1 replies
1 kudos

How to Retrieve DBU Count per Compute Type for Accurate Cost Calculation?

Hello Everyone,We are currently working on a cost analysis initiative to gain deeper insights into our Databricks usage. As part of this effort, we are trying to calculate the hourly cost of each Databricks compute instance by utilizing the Azure Ret...

Data Engineering

18 Views
1 replies
1 kudos

3 hours ago

View Replies

Latest Reply

BS_THE_ANALYST
Esteemed Contributor II

55m ago

1 kudos

@saicharandeepb have you looked at the system billing tables in Databricks yet? https://learn.microsoft.com/en-us/azure/databricks/admin/system-tables/billing There seems to be a field that can display the unit usage in DBU. Same in this table aswel...

1 kudos

55m ago

by vamsi_simbus • New Contributor III

3 hours ago

14 Views
0 replies
0 kudos

Migrating Talend ETL Jobs to Databricks – Best Practices & Challenges

Hi All,I’m currently working on a Proof of Concept (POC) to migrate existing Talend ETL jobs to Databricks. The goal is to leverage Databricks for data processing and orchestration while moving away from Talend.I’d appreciate insights on the followin...

Data Engineering

migration

Talend

14 Views
0 replies
0 kudos

3 hours ago

by Dhruv-22 • Contributor

04-03-2024 3:03:26 AM

1659 Views
2 replies
0 kudos

Resolved! Understanding least common type in databricks

I was reading the data type rules and found about least common type.I have a doubt. What is the least common type of STRING and INT? The referred link gives the following example saying the least common type is BIGINT.-- The least common type between...

Data Engineering

1659 Views
2 replies
0 kudos

04-03-2024 3:03:26 AM

View Replies

Latest Reply

Dhruv-22
Contributor

3 hours ago

0 kudos

The question is solved here - link

0 kudos

3 hours ago

1 More Replies

by Dhruv-22 • Contributor

yesterday

51 Views
4 replies
2 kudos

Resolved! Least Common Type is different in Serverless and All Purpose Cluster.

The following statement gives different outputs in different computes.In Databricks, 15.4 LTS%sqlSELECT typeof(coalesce(5, '6'));-- OutputstringIn Serverless, environment version 4%sqlSELECT typeof(coalesce(5, '6'));-- OutputbigintThere are other cas...

Data Engineering

51 Views
4 replies
2 kudos

yesterday

View Replies

Latest Reply

MuthuLakshmi
Databricks Employee

4 hours ago

2 kudos

@Dhruv-22 Regarding your 1st question, I'm not sureYou can refer to https://docs.databricks.com/aws/en/sql/language-manual/parameters/ansi_mode#system-default to understand what happens when ansi mode is disabled

2 kudos

4 hours ago

3 More Replies

by anusha98 • Visitor

yesterday

40 Views
2 replies
2 kudos

Regarding : How to use Row_number() in dlt pipelines

We have two streaming tables : customer_info and customer_info_history and we joined them using full join to create temp table in pyspark and now we want to eliminate the de-duped records from this temp table. Tried using row_number() but facing bel...

Data Engineering

40 Views
2 replies
2 kudos

yesterday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

4 hours ago

2 kudos

Hello @anusha98 , You’re hitting a real limitation of Structured Streaming: non-time window functions (like row_number() over (...)) aren’t allowed on streaming DFs. You need to use agg().max() to get the “latest value per key” @dlt.table(name="temp_...

2 kudos

4 hours ago

1 More Replies

by AmarKap • Visitor

yesterday

55 Views
1 replies
0 kudos

Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure

Trying to read a accented file(French characters) but the spark.readStream function is not working and special characters turn into something strange(ex. �) spark.readStream .format("cloudfiles") .option("cloudFiles....

Data Engineering

55 Views
1 replies
0 kudos

yesterday

View Replies

Latest Reply

K_Anudeep
Databricks Employee

4 hours ago

0 kudos

Hello @AmarKap , When Spark decodes CP1252 bytes as UTF-8/ISO-8859-1, you’ll see the replacement char like � Can you read the file as : df = (spark.readStream.format("cloudFiles").option("cloudFiles.format", "text").option("encoding", "windows-1252")...

0 kudos

4 hours ago

by Gustavo_Az • Contributor

06-24-2025 2:02:40 AM

1602 Views
1 replies
0 kudos

Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE

HelloIm optmizing a big notebook and have encountered many times the tip from databricks that says "Unused range join hints". Reading the documentation for reference, I have been able to supress that warning in almost all cells, but some of then rema...

Data Engineering

1602 Views
1 replies
0 kudos

06-24-2025 2:02:40 AM

View Replies

Latest Reply

Prajapathy_NKR
New Contributor

yesterday

0 kudos

Hi @Gustavo_AzTry to use explain to understand what's happening. https://spark.apache.org/docs/latest/sql-ref-syntax-qry-explain.html

0 kudos

yesterday

by Jonathan_ • New Contributor II

Tuesday

263 Views
6 replies
6 kudos

Slow PySpark operations after long DAG that contains many joins and transformations

We are using PySpark and notice that when we are doing many transformations/aggregations/joins of the data then at some point the execution time of simple task (count, display, union of 2 tables, ...) become very slow even if we have a small data (ex...

Data Engineering

263 Views
6 replies
6 kudos

Tuesday

View Replies

Latest Reply

Jonathan_
New Contributor II

yesterday

6 kudos

Hi,We forgot to say that we were using a single node cluster (E class with 16 cores). Often in our projects we need to used library that works mainly with data in memory. We also need to remember that here we are not referring to a large data.When we...

6 kudos

yesterday

5 More Replies

by Marthinus • New Contributor III

yesterday

63 Views
3 replies
1 kudos

[Databricks Asset Bundles] Bug: driver_node_type_id not updated

Working with databricks asset bundles (using the new python-based definition), if you have a job_cluster defined using driver_node_type_id, and then update it to no longer have it defined, but only node_type_id, the driver node_type never gets update...

Data Engineering

63 Views
3 replies
1 kudos

yesterday

View Replies

Latest Reply

dkushari
Databricks Employee

yesterday

1 kudos

Thanks for the details. The way it works is once you set it when you first define the job cluster it does not change when you later remove driver_node_type_id from the spec (i.e., omit it), Databricks does not automatically revert it to match the nod...

1 kudos

yesterday

2 More Replies

by EndreM • New Contributor III

06-05-2025 7:10:56 AM

2130 Views
1 replies
0 kudos

Replay stream to migrate to liquid cluster

The documentation is sparse about how to migrate a partition table to a liquid cluster as the Alter table suggested in the documentation doesnt work when its a partitioned table.The comments on this forum suggest replaying the stream. And this is wha...

Data Engineering

2130 Views
1 replies
0 kudos

06-05-2025 7:10:56 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

0 kudos

Greetings @EndreM , I did some digging internally and I have come up with some helpful tips/tricks to help guide you through this issue: Based on your situation, you're encountering several common challenges when migrating a partitioned table to liqu...

0 kudos

yesterday

by soumiknow • Contributor II

07-02-2025 1:19:59 AM

2000 Views
1 replies
0 kudos

Unable to create databricks group and add permission via terraform

I have the following terraform code to create a databricks group and add permission to a workflow: resource "databricks_group" "dbx_group" { display_name = "ENV_MONITORING_TEAM" } resource "databricks_permissions" "workflow_permission" { job_id ...

Data Engineering

databricks groups

Terraform

2000 Views
1 replies
0 kudos

07-02-2025 1:19:59 AM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

yesterday

0 kudos

Greetings @soumiknow , I did some digging internally and found something that may help: Based on the information gathered, I can now draft a comprehensive response to this Databricks Community question about the Terraform authentication issue. ## Dra...

0 kudos

yesterday

by smoortema • New Contributor III

2 weeks ago

173 Views
2 replies
2 kudos

Resolved! How to make FOR cycle and dynamic SQL and variables work together

I am working on a testing notebook where the table that is tested can be given as a widget. I wanted to write it in SQL. The notebook does the following steps in a cycle that should run 10 times:1. Store the starting version of a delta table in a var...

Data Engineering

173 Views
2 replies
2 kudos

2 weeks ago

View Replies

Latest Reply

smoortema
New Contributor III

yesterday

2 kudos

Thank you! I realised that the example I gave was bad. However, what I was missing is that I did not know how to set a variable in SQL scripting. Including the SET command within the sql string does not work, you have to use the EXECUTE IMMEDIATE ......

2 kudos

yesterday

1 More Replies

by AbhishekNakka • New Contributor II

Sunday

56 Views
1 replies
0 kudos

Databricks professional data engineer

Hi, i wanted to know i anyone has given databricks professional data engineering exam recently after oct 2025. I wanted to know if the syllabus has been updated or not ?

Data Engineering

56 Views
1 replies
0 kudos

Sunday

View Replies

Latest Reply

szymon_dybczak
Esteemed Contributor III

yesterday

0 kudos

Hi @AbhishekNakka ,Yes, the syllabus has been updated. The current exam objectives you can find at below link:Databricks Certified Data Engineer Professional September 2025 - Exam Guide.docxDatabricks Certified Data Engineer Professional | Databricks

0 kudos

yesterday

by DatabricksEngi1 • New Contributor III

Saturday

133 Views
4 replies
0 kudos

Resolved! Problem in VS Code Extention

Until a few days ago, I was working with Databricks Connect using the VS Code extension, and everything worked perfectly.In my .databrickscfg file, I had authentication configured like this: [name]host:token: When I ran my code, everything worked fi...

Data Engineering

133 Views
4 replies
0 kudos

Saturday

View Replies

Latest Reply

dkushari
Databricks Employee

Sunday

0 kudos

Hi @DatabricksEngi1 - Please ensure you have a Python Venv set up for each Python version that you use with Databricks Connect. Also, I have given step-by-step ways to debug the issue, clear the cache, etc [Read the files and instructions carefully b...

0 kudos

Sunday

3 More Replies

Databricks Community

Forum Posts

Can't mergeSchema handle int and bigint?

How to Retrieve DBU Count per Compute Type for Accurate Cost Calculation?

Migrating Talend ETL Jobs to Databricks – Best Practices & Challenges

Resolved! Understanding least common type in databricks

Resolved! Least Common Type is different in Serverless and All Purpose Cluster.

Regarding : How to use Row_number() in dlt pipelines

Lakeflow Pipelines Trying to Read accented file with spark.readStream but failure

Doubt with range_join hints optimization, using INSERT INTO REPLACE WHERE

Slow PySpark operations after long DAG that contains many joins and transformations

[Databricks Asset Bundles] Bug: driver_node_type_id not updated

Replay stream to migrate to liquid cluster

Unable to create databricks group and add permission via terraform

Resolved! How to make FOR cycle and dynamic SQL and variables work together

Databricks professional data engineer

Resolved! Problem in VS Code Extention

Join Us as a Local Community Builder!

Understanding least common type in databricks

Least Common Type is different in Serverless and A...

Figure out stale tables/folders being loaded by au...

Cannot import pyspark.pipelines module

How to make FOR cycle and dynamic SQL and variable...