Data Engineering

Forum Posts

Sorted by:

by Gecofer • Contributor II

06-09-2025 3:12:53 AM

1378 Views
2 replies
1 kudos

Resolved! Inconsistent query results between dbt ETL run and SQL editor in Databricks

Hi everyone,I’m running into a strange issue in one of my ETL pipelines using dbt on Databricks, and I’d appreciate any insights or ideas. I have a query that is part of my dbt model. When I run the ETL process, the results from this query are incorr...

Data Engineering

1378 Views
2 replies
1 kudos

06-09-2025 3:12:53 AM

View Replies

Latest Reply

Gecofer
Contributor II

06-09-2025 5:31:01 AM

1 kudos

Hi @Isi Thanks so much for your insight!It turned out to be a combination of the two things you mentioned:There was a data masking policy applied to one of the columns, and while I had permissions to view the unmasked data, the service principal runn...

1 kudos

06-09-2025 5:31:01 AM

1 More Replies

by Debashisrajib • New Contributor II

06-07-2025 1:45:59 PM

907 Views
1 replies
0 kudos

Resolved! 65 technical questions.

I recently gave Databricks Data Engineer professional exam and I got really lengthy 65 technical questions. These questions are different from statistical questions.65 lengthy technical questions for 120 minutes is too much and this number is not men...

Data Engineering

907 Views
1 replies
0 kudos

06-07-2025 1:45:59 PM

View Replies

Latest Reply

Advika
Community Manager

06-09-2025 3:53:20 AM

0 kudos

Hello @Debashisrajib! I’m sorry to hear that.As outlined in the exam details on Webassessor, Databricks certification exams include 60 scored questions, along with additional unscored questions that appear like regular ones but do not affect your fin...

0 kudos

06-09-2025 3:53:20 AM

by Abhimanyu • Databricks Partner

05-20-2025 2:39:44 AM

1170 Views
2 replies
0 kudos

Why does df.dropna(how="all") fail when there is a . in a column name?

I'm working in a Databricks notebook and using Spark to query a Delta table. Here's the code I ran: df = spark.sql("select * from catalog.schema.table") df = df.dropna(how="all") display(df)This works fine unless the DataFrame has a column name that ...

Data Engineering

1170 Views
2 replies
0 kudos

05-20-2025 2:39:44 AM

View Replies

Latest Reply

MujtabaNoori
New Contributor III

05-20-2025 5:55:54 AM

0 kudos

Hi @Abhimanyu ,Yeah Actually in spark, '.'(dot) in columns is used for StructType by which the nested object can be accessed.But definitely You can rename thE columns dynamically whichever has '.' in it.Attached few screenshots for your reference tha...

0 kudos

05-20-2025 5:55:54 AM

1 More Replies

by chsoni12 • New Contributor II

06-08-2025 11:50:55 AM

1273 Views
1 replies
0 kudos

Resolved! Limitation in Managed Volumes Recovery — UNDROP Should Be Supported

Hello Databricks Community,While reviewing the Databricks official documentation and performing a POC on managed volumes, I observed that volumes cannot be recovered using the UNDROP command if accidentally deleted — unlike managed tables.Technically...

Data Engineering

@databricks

1273 Views
1 replies
0 kudos

06-08-2025 11:50:55 AM

View Replies

Latest Reply

Vidhi_Khaitan
Databricks Employee

06-09-2025 2:11:25 AM

0 kudos

Thank you for highlighting this issue!Databricks is already working on implementing this in the future.

0 kudos

06-09-2025 2:11:25 AM

by farazahmad372 • New Contributor II

06-08-2025 12:35:55 AM

2711 Views
3 replies
0 kudos

TypeError: 'JavaPackage' object is not callable

from pyspark.sql import *if __name__ == "__main__": spark = SparkSession.builder \ .appName("hello Spark") \ .master("local[2]") \ .getOrCreate() data_list = [("Ravi",28), ("David",45), ("Abd...

Data Engineering

2711 Views
3 replies
0 kudos

06-08-2025 12:35:55 AM

View Replies

Latest Reply

nikhilj0421
Databricks Employee

06-09-2025 12:57:15 AM

0 kudos

@farazahmad372 May I know the DBR version and type of cluster?Are you using serverless?

0 kudos

06-09-2025 12:57:15 AM

2 More Replies

by adurand-accure • Databricks Partner

12-12-2024 3:42:53 AM

3733 Views
5 replies
2 kudos

Serverless job error - spark.rpc.message.maxSize

Hello, I am facing this error when moving a Workflow to serverless modeERROR : SparkException: Job aborted due to stage failure: Serialized task 482:0 was 269355219 bytes, which exceeds max allowed: spark.rpc.message.maxSize (268435456 bytes). Consid...

Data Engineering

3733 Views
5 replies
2 kudos

12-12-2024 3:42:53 AM

View Replies

Latest Reply

adurand-accure
Databricks Partner

12-12-2024 12:36:56 PM

2 kudos

Hello PiotrMi,We found out that the problem was caused by a collect() and managed to fix it by changing some codeThanks for your quick repliesBest regards,Antoine

2 kudos

12-12-2024 12:36:56 PM

4 More Replies

by SakthiGanesh • New Contributor II

05-15-2025 8:30:55 PM

2902 Views
1 replies
0 kudos

Unable to run python script from Azure DevOps git repo in Databricks Workflow job

Hi, I'm getting an issue while running a python script from Azure DevOps git repo in Databricks Workflow job task. The error stating internal commit path issue. But I referred the Source as Azure DevOps Services and I gave the branch name when settin...

Data Engineering

2902 Views
1 replies
0 kudos

05-15-2025 8:30:55 PM

View Replies

Latest Reply

niteshm
New Contributor III

06-07-2025 1:03:57 AM

0 kudos

@SakthiGanesh This is a known type of issue when running Databricks Workflows with Azure DevOps Git-backed repos.Did you try, Workspace Path Instead of Internal Git Path?If possible, use a .ipynb notebook-based task rather than a raw .py script, note...

0 kudos

06-07-2025 1:03:57 AM

by AgusBudianto • Contributor

06-04-2025 9:17:10 PM

2402 Views
8 replies
1 kudos

Resolved! Is it possible for Store Procedure to be in Unity Catalog Dataricks

I got information that the latest release of Unity Catalog already supports Store Procedure, but I have searched from several sources that Unity catalog does not support Store Procedure, according to the following post: https://community.databricks.c...

Data Engineering

2402 Views
8 replies
1 kudos

06-04-2025 9:17:10 PM

View Replies

Latest Reply

Louis_Frolio
Databricks Employee

06-05-2025 9:01:29 AM

1 kudos

Yes, you can attend virtually and it is free. What I don't know is what is free. I believe the keynotes are free and some sessions. You should definitely register and check it out.

1 kudos

06-05-2025 9:01:29 AM

7 More Replies

by CJOkpala • New Contributor II

05-28-2025 7:13:13 AM

1624 Views
4 replies
1 kudos

Databricks DLT execution issue

I am having an issue when trying to do a full refresh of a DLT pipeline. I am getting the following error below: com.databricks.sql.managedcatalog.UnityCatalogServiceException: [RequestId=97d4fe52-b185-4757-b0b1-113cb96ae0bb ErrorClass=TABLE_ALREADY_...

Data Engineering

1624 Views
4 replies
1 kudos

05-28-2025 7:13:13 AM

View Replies

Latest Reply

nikhilj0421
Databricks Employee

06-06-2025 11:14:30 AM

1 kudos

Are you facing the same issue, If you give a different name in the dlt decorator for the table?

1 kudos

06-06-2025 11:14:30 AM

3 More Replies

by oneill • New Contributor II

05-18-2025 9:11:28 AM

4663 Views
3 replies
0 kudos

SQL - Dynamic overwrite + overwrite schema

Hello,Let say we have an empty table S that represents the schema we want to keepABCDEWe have another table T partionned by column A with a schema that depends on the file we have load into. Say :ABCF1b1c1f12b2c2f2Now to make T having the same schema...

Data Engineering

4663 Views
3 replies
0 kudos

05-18-2025 9:11:28 AM

View Replies

Latest Reply

oneill
New Contributor II

05-30-2025 2:54:32 AM

0 kudos

Hi, thanks for the reply. I've already looked at the documentation on this point, which actually states that dynamic overwrite doesn't work with schema overwrite, while the instructions described above seem to indicate the opposite.

0 kudos

05-30-2025 2:54:32 AM

2 More Replies

by andreapeterson • Contributor

06-06-2025 8:10:28 AM

618 Views
1 replies
0 kudos

Question about which tags appear in drop down

Hi there, I have a question regarding the appearance of tags in the drop down when adding a tag to a resource (catalog, schema, table, column - level). When does a tag get populated in a drop down? I noticed when I created a column level tag, and wan...

Data Engineering

618 Views
1 replies
0 kudos

06-06-2025 8:10:28 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-06-2025 9:12:17 AM

0 kudos

Hello @andreapeterson Yes, your understanding of Databricks tag behavior is correct. In Databricks Unity Catalog, tags follow a hierarchical inheritance pattern:Downward inheritance: Tags applied at higher levels (catalog → schema → table) become ava...

0 kudos

06-06-2025 9:12:17 AM

by sparklez • New Contributor III

06-03-2025 9:50:12 AM

2056 Views
3 replies
2 kudos

Resolved! Creating Cluster configuration with library dependency using DABS

I am trying to create a cluster configuration using DABS and defining library dependencies.My yaml file looks like this: resources: clusters: project_Job_Cluster: cluster_name: "Project Cluster" spark_version: "16.3.x-cpu-ml-scala2.12" node_type_id: ...

Data Engineering

2056 Views
3 replies
2 kudos

06-03-2025 9:50:12 AM

View Replies

Latest Reply

lingareddy_Alva
Esteemed Contributor

06-03-2025 2:11:49 PM

2 kudos

Hi @sparklez You're encountering this issue because the libraries field is not valid in the cluster configuration.Libraries need to be specified at the job level, not the cluster level.Option 1: Job-Level Libraries (Recommended)Move the libraries sec...

2 kudos

06-03-2025 2:11:49 PM

2 More Replies

by Pratikmsbsvm • Contributor

06-03-2025 12:32:24 AM

3606 Views
5 replies
7 kudos

Resolved! Migrating From Azure to Databricks

Hi Techie,May someone please help me with Pros and Cons from migrating my Realtime streaming solution from Azure to Databricks.which component I can replaced with Databricks and what benefit I can get out of it.Current Architecture:- Many Thanks

Data Engineering

3606 Views
5 replies
7 kudos

06-03-2025 12:32:24 AM

View Replies

Latest Reply

vaibhavs120
Contributor

06-06-2025 1:06:59 AM

7 kudos

I completely agree with @lingareddy_Alva on the costing part. One small point I would like to mention is We should only enable SPOT instances (60-90% cost savings) in Development/non-critical(PROD) environment. This option works great and is indeed c...

7 kudos

06-06-2025 1:06:59 AM

4 More Replies

by anil_reddaboina • New Contributor II

06-05-2025 10:13:38 PM

1858 Views
2 replies
0 kudos

Slow running Spark job issue - due to the unknown spark stages created by Databircks Compute cluster

Hi Team,Recently we migrated the spark jobs from self hosted spark(YARN) Cluster to Databricks.Currently we are using the Databricks workflows with Job_Compute clusters and the Job Type - Spark JAR type execution, so when we run the job in databricks...

Data Engineering

1858 Views
2 replies
0 kudos

06-05-2025 10:13:38 PM

View Replies

Latest Reply

anil_reddaboina
New Contributor II

06-06-2025 5:33:43 AM

0 kudos

Hey Brahma,Thanks for your reply. As a first step I will disable AQE config and test it. We are using the node pools with job_compute cluster type so that its not spinning up a new cluster for each Job. I'm configuring the below two configs also, do ...

0 kudos

06-06-2025 5:33:43 AM

1 More Replies

by chsoni12 • New Contributor II

06-05-2025 1:08:42 PM

1330 Views
1 replies
0 kudos

Legacy Autoscaling(workflow) VS Enhanced Autoscaling(DLT)

I conducted a proof of concept (POC) to compare the performance of the DLT pipeline and Databricks Workflow using the same workload, task, code, and cluster configuration. Both configurations were set with autoscaling enabled, with a minimum of 1 wor...

Data Engineering

1330 Views
1 replies
0 kudos

06-05-2025 1:08:42 PM

View Replies

Latest Reply

Brahmareddy
Esteemed Contributor

06-06-2025 4:35:25 AM

0 kudos

Hi chsoni12,How are you doing today?, As per my understanding, That's a great observation, and it's awesome that you're testing performance and cost between DLT and regular workflows. The key difference here lies in how autoscaling works. DLT pipelin...

0 kudos

06-06-2025 4:35:25 AM

Databricks Community

Forum Posts

Resolved! Inconsistent query results between dbt ETL run and SQL editor in Databricks

Resolved! 65 technical questions.

Why does df.dropna(how="all") fail when there is a . in a column name?

Resolved! Limitation in Managed Volumes Recovery — UNDROP Should Be Supported

TypeError: 'JavaPackage' object is not callable

Serverless job error - spark.rpc.message.maxSize

Unable to run python script from Azure DevOps git repo in Databricks Workflow job

Resolved! Is it possible for Store Procedure to be in Unity Catalog Dataricks

Databricks DLT execution issue

SQL - Dynamic overwrite + overwrite schema

Question about which tags appear in drop down

Resolved! Creating Cluster configuration with library dependency using DABS

Resolved! Migrating From Azure to Databricks

Slow running Spark job issue - due to the unknown spark stages created by Databircks Compute cluster

Legacy Autoscaling(workflow) VS Enhanced Autoscaling(DLT)

Databricks to Salesforce Core (Not cloud)

Databricks optimization for query perfomance and p...

Parametrize the DLT pipeline for dynamic loading o...

File Arrival Trigger - Multiple tables

Issue while handling Deletes and Inserts in Struct...