cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

briancuster63
by New Contributor II
  • 2769 Views
  • 4 replies
  • 0 kudos

Asset Bundle .py files being converted to notebooks when deployed to Databricks

Hi everyone, I'm finding a particularly frustrating issue whenever I try to run some python code in an asset bundle on my workspace. The code and notebooks deploy fine but once deployed, the code files get converted to notebooks and I'm no longer abl...

  • 2769 Views
  • 4 replies
  • 0 kudos
Latest Reply
olivier-soucy
Contributor
  • 0 kudos

I came here looking for a solution to the opposite problem: I was hoping my .py files to be available as a notebook (without adding extra headers). Unfortunately, this does not seem to be possible with DABs.@facebiranhari if you have not solved your ...

  • 0 kudos
3 More Replies
Sen
by New Contributor
  • 16815 Views
  • 10 replies
  • 2 kudos

Resolved! Performance enhancement while writing dataframes into Parquet tables

Hi,I am trying to write the contents of a dataframe into a parquet table using the command below.df.write.mode("overwrite").format("parquet").saveAsTable("sample_parquet_table")The dataframe contains an extract from one of our source systems, which h...

  • 16815 Views
  • 10 replies
  • 2 kudos
Latest Reply
BobClarke
New Contributor II
  • 2 kudos

I am Bob Clarke marketing manager of virtual assistants Pakistan and I help companies hire amazon virtual assistants who manage product listings order processing and inventory updates. Our trained staff improves efficiency and boosts sales. We suppor...

  • 2 kudos
9 More Replies
shubham7
by New Contributor II
  • 907 Views
  • 2 replies
  • 0 kudos

reading XML file of mutiple row Tags

I have multiple xml files in a folder. i am reading into dataframe in a databricks cell. It has one rootTag and multiple rowTags. can i read into single spark dataframe (pyspark) for all the rowTags. Any reference for this or approach would greatly a...

  • 907 Views
  • 2 replies
  • 0 kudos
Latest Reply
shubham7
New Contributor II
  • 0 kudos

you are correct, but i have N number of different rowTags. how to read in a dataframe.

  • 0 kudos
1 More Replies
jordan72
by New Contributor III
  • 1923 Views
  • 8 replies
  • 2 kudos

Resolved! German Umlauts wrong via JDBC

Hi,I have the issue that German Umlauts are not getting retrieved correctly via the JDBC driver.It shows M�nchen instead of München.I load the driver in my java app via:<groupId>com.databricks</groupId><artifactId>databricks-jdbc</artifactId><version...

  • 1923 Views
  • 8 replies
  • 2 kudos
Latest Reply
jordan72
New Contributor III
  • 2 kudos

ok, so it seems that it has something to do with the newly introduced native.encoding system property.So In Netbeans you have to provide -Dstdout.encoding=utf-8 to the vm if you are using JDK21.

  • 2 kudos
7 More Replies
JCooke
by New Contributor II
  • 1865 Views
  • 3 replies
  • 1 kudos

Deploying Metastore with Terraform

my goal is to be able to enable unity catalog on a clean Azure deployment of databricks with absolutely no history of databricks. I know I need to create a metastore for the Azure Region. And to do this I know I need Account Admin from the accounts p...

  • 1865 Views
  • 3 replies
  • 1 kudos
Latest Reply
szymon_dybczak
Esteemed Contributor III
  • 1 kudos

Hi @JCooke ,The first assignment of the Databricks Account Admin role is a bit of a special case. There is always a manual step required to assign the first Account Admin in a new Databricks account on Azure. This step cannot be fully automated via T...

  • 1 kudos
2 More Replies
pooja_bhumandla
by New Contributor III
  • 1451 Views
  • 2 replies
  • 1 kudos

Why is Merge with Deletion Vectors Slower Than Full File Rewrite on the Same Table?

I've run two MERGE INTO operations on the same Delta table—one with Deletion Vectors enabled (Case 1), and one without (Case 2).In Case 1 (with Deletion Vectors):  executionTimeMs: 106,708  materializeSourceTimeMs: 24,344 numTargetRowsUpdated: 22  nu...

  • 1451 Views
  • 2 replies
  • 1 kudos
Latest Reply
saurabh18cs
Honored Contributor II
  • 1 kudos

Hi Poojalets understand DV first -  This avoid rewriting entire files by marking rows as deleted/updated via a bitmap (the deletion vector), which should, in theory, be faster for small updates.but DV introduces new overhead:1) Writing and updating t...

  • 1 kudos
1 More Replies
dbr_data_engg
by New Contributor III
  • 2954 Views
  • 3 replies
  • 3 kudos

Resolved! Unable to deploy Databricks Asset Bundle

Hi Team,I created a workflow/job and was able to deploy to Dev and Prod workspaces. But now I am unable to deploy job to "Dev" workspace and getting below error, [Also unable to see this job on Databricks UI]Deploying resources...Updating deployment ...

  • 2954 Views
  • 3 replies
  • 3 kudos
Latest Reply
fabiobeider
New Contributor II
  • 3 kudos

Hey, I'm facing the same issueDid you ever get a chance to solve it?

  • 3 kudos
2 More Replies
Yuki
by Contributor
  • 1998 Views
  • 1 replies
  • 0 kudos

Is it possible to retain original deltatable data with Unity Catalog?

Hi everyone,I have a question regarding data retention in Unity Catalog. In the pre–Unity Catalog setup, I believe that even if we dropped an external table, the underlying data files remained intact.However, in the current best practices for Unity C...

  • 1998 Views
  • 1 replies
  • 0 kudos
Latest Reply
mani_22
Databricks Employee
  • 0 kudos

Hi @Yuki,  If you drop an external table, the underlying data remains accessible even now. Only the table definition is removed from the metastore, while the actual data is retained. The UNDROP command for an EXTERNAL table simply recreates the table...

  • 0 kudos
SakthiGanesh
by New Contributor II
  • 857 Views
  • 2 replies
  • 0 kudos

Delta table partition folder names is getting changed

I am facing an issue where the expected date partition folder should be named in format like "campaign_created_date=2024-01-17", but instead it is writing as "ad", "8B" looks like a random folder names.Usually it will be like below:Now it changed lik...

SakthiGanesh_0-1751013736357.png SakthiGanesh_1-1751013840570.png
  • 857 Views
  • 2 replies
  • 0 kudos
Latest Reply
Krishnamatta
Contributor
  • 0 kudos

Hi Satish,This is due to the column mapping enabled on the tableFrom Databricks Docs:When you enable column mapping for a Delta table, random prefixes replace column names in partition directories for Hive-style partitioning. See Rename and drop colu...

  • 0 kudos
1 More Replies
Ganeshch
by New Contributor III
  • 2072 Views
  • 6 replies
  • 0 kudos

No option to create cluster

I don't see any option to create cluster inside compute .How to create cluster ? Please help me

  • 2072 Views
  • 6 replies
  • 0 kudos
Latest Reply
nayan_wylde
Esteemed Contributor
  • 0 kudos

Yes if you are using legacy community version you will be able to create clusters but with free edition it is limited serverless compute

  • 0 kudos
5 More Replies
Dewlap
by New Contributor II
  • 1091 Views
  • 1 replies
  • 1 kudos

How to handle exploded records with overwrite-by-key logic in Delta Live Tables

 I'm using Delta Live Tables (DLT) with the apply_changes API to manage SCD Type 1 on a source table. However, I’ve run into a limitation.Context:After apply_changes, I have a derived view that:Flattens and explodes a JSON array field in the source d...

  • 1091 Views
  • 1 replies
  • 1 kudos
Latest Reply
Brahmareddy
Esteemed Contributor
  • 1 kudos

Hi Dewlap,How are you doing today? As per my understanding, You're right to notice that apply_changes in DLT works best for one-row-per-key updates and doesn't fit well when you need to replace multiple rows for the same key, especially after explodi...

  • 1 kudos
Sreejuv
by New Contributor
  • 1479 Views
  • 1 replies
  • 0 kudos

Lakebridge code conversion

m currently working on a proof of concept to convert Oracle & Synapse procedures into Databricks SQL  and none of these are getting converted. followed the steps mentioned in documentation . Wanted to check any one able to sucvessfuly convert and exe...

  • 1479 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @Sreejuv You're encountering a very common challenge. Oracle and Synapse to Databricks SQL procedure conversion is notoriously difficult, and many organizations struggle with this.Common Issues with Automated ConversionWhy procedures often fail to...

  • 0 kudos
AdamIH123
by New Contributor II
  • 1984 Views
  • 1 replies
  • 0 kudos

Resolved! Agg items in a map

What is the best way to aggregate a map across rows? In the below, The agg results would be red: 4, green 7, blue: 10. This can be achieved using explode wondering if there is a better way. %sql with cte as ( select 1 as id , map('red', 1, 'green...

  • 1984 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @AdamIH123 ,The explode-based approach is widely used and remains the most reliable and readable method.But if you're looking for an alternative without using explode, you can try the REDUCE + MAP_FILTER approach. It lets you aggregate maps across...

  • 0 kudos
seefoods
by Valued Contributor
  • 2066 Views
  • 1 replies
  • 0 kudos

asset bundle

Hello Guys, Actually, i build a custom asset bundle  config, but i have a issue when i create several sub directories inside resources directory. After running the command databricks bundle summary, databricks librairies mentionned that resources its...

  • 2066 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi @seefoods, databricks asset bundles don’t automatically detect resources in subdirectories unless they’re explicitly listed or a recursive pattern is used in the config.To resolve this, you can update the include section with a pattern like resour...

  • 0 kudos
frosti_pro
by New Contributor II
  • 1159 Views
  • 3 replies
  • 1 kudos

UC external tables to managed tables

Dear community, I would like to know if there are any procedure and/or recommendation to safely and efficiently migrate UC external tables to managed tables (in a production context with high volume of data)? Thank you for your support!

  • 1159 Views
  • 3 replies
  • 1 kudos
Latest Reply
ElizabethB
Databricks Employee
  • 1 kudos

Please check out our new docs page! This has some information which may help you, including information about our new SET MANAGED command. We are also looking to make this process smoother over time, so if you have any feedback, please let us know. h...

  • 1 kudos
2 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels