cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Forum Posts

AdamIH123
by New Contributor II
  • 1918 Views
  • 1 replies
  • 0 kudos

Resolved! Agg items in a map

What is the best way to aggregate a map across rows? In the below, The agg results would be red: 4, green 7, blue: 10. This can be achieved using explode wondering if there is a better way. %sql with cte as ( select 1 as id , map('red', 1, 'green...

  • 1918 Views
  • 1 replies
  • 0 kudos
Latest Reply
SP_6721
Honored Contributor II
  • 0 kudos

Hi @AdamIH123 ,The explode-based approach is widely used and remains the most reliable and readable method.But if you're looking for an alternative without using explode, you can try the REDUCE + MAP_FILTER approach. It lets you aggregate maps across...

  • 0 kudos
seefoods
by Valued Contributor
  • 2034 Views
  • 1 replies
  • 0 kudos

asset bundle

Hello Guys, Actually, i build a custom asset bundle  config, but i have a issue when i create several sub directories inside resources directory. After running the command databricks bundle summary, databricks librairies mentionned that resources its...

  • 2034 Views
  • 1 replies
  • 0 kudos
Latest Reply
Renu_
Valued Contributor II
  • 0 kudos

Hi @seefoods, databricks asset bundles don’t automatically detect resources in subdirectories unless they’re explicitly listed or a recursive pattern is used in the config.To resolve this, you can update the include section with a pattern like resour...

  • 0 kudos
frosti_pro
by New Contributor II
  • 1113 Views
  • 3 replies
  • 1 kudos

UC external tables to managed tables

Dear community, I would like to know if there are any procedure and/or recommendation to safely and efficiently migrate UC external tables to managed tables (in a production context with high volume of data)? Thank you for your support!

  • 1113 Views
  • 3 replies
  • 1 kudos
Latest Reply
ElizabethB
Databricks Employee
  • 1 kudos

Please check out our new docs page! This has some information which may help you, including information about our new SET MANAGED command. We are also looking to make this process smoother over time, so if you have any feedback, please let us know. h...

  • 1 kudos
2 More Replies
suk
by New Contributor II
  • 1034 Views
  • 1 replies
  • 0 kudos

Databricks pipeline script is not creating schema before creation of table

Hello We are facing some issue while executing the databricks pipeline i.e it takes all scripts in random sequence, and if no schema was created before it scheduled a job to create table it'll failAs a alternative we are executing  schema pipeline fi...

  • 1034 Views
  • 1 replies
  • 0 kudos
Latest Reply
lingareddy_Alva
Honored Contributor III
  • 0 kudos

Hi @suk This is a common issue with Databricks pipelines where dependencies aren't properly managed, causing scripts to execute in random order. Use Databricks Workflows with Task DependenciesConfigure explicit task dependencies: 

  • 0 kudos
Ganeshch
by New Contributor III
  • 1291 Views
  • 3 replies
  • 0 kudos

No option to create cluster

I don't see any option to create cluster inside compute in community edition, is it disable? .How to create cluster ? Please help me

  • 1291 Views
  • 3 replies
  • 0 kudos
Latest Reply
Ganeshch
New Contributor III
  • 0 kudos

If i create notebook and run it , explicitly cluster will not be created but it will work in the backend , am i right?

  • 0 kudos
2 More Replies
ghilage
by New Contributor III
  • 1053 Views
  • 4 replies
  • 0 kudos

Not able to write to dbfs from workflow

Hi All,I am facing below issue while writing to dbfs.I have a pyspark code in which I am writing a dataframe to dbfs using below code :dbfs_path.mkdir(parents=True, exist_ok=True)my_df.write.format("parquet").mode("overwrite").save(f"{dbfs_path}/my_d...

  • 1053 Views
  • 4 replies
  • 0 kudos
Latest Reply
ghilage
New Contributor III
  • 0 kudos

looks like some problem withing my dataframe itself.If i skip some of the expensive field calculations then it is able to write to dbfs.

  • 0 kudos
3 More Replies
Davila
by New Contributor II
  • 1211 Views
  • 2 replies
  • 2 kudos

Resolved! Asset Bundle Validation Not Completing – Stuck on files_to_sync

I have a Databricks asset bundle with the following structure:bundle: name: <some value here> uuid: <some value here> include: - resources/*.yml variables: catalog_bronze: {} catalog_silver: {} user_name: {} targets: dev: mode: ...

  • 1211 Views
  • 2 replies
  • 2 kudos
Latest Reply
Renu_
Valued Contributor II
  • 2 kudos

Hi @Davila, Validation can be slow if your bundle root includes a large number of files. However, since your bundle contains only a few files, the delay may be due to the root_path pointing to a broader directory structure in the Databricks workspace...

  • 2 kudos
1 More Replies
cloudengineer
by New Contributor
  • 785 Views
  • 2 replies
  • 0 kudos
  • 785 Views
  • 2 replies
  • 0 kudos
Latest Reply
MadhuB
Valued Contributor
  • 0 kudos

@cloudengineer By default workspace admins can create interactive clusters. Non-admin users should be granted access either to a compute policy or should be provisioned access onto a existing clusters. If there is a requirement to enable cluster crea...

  • 0 kudos
1 More Replies
ashraf1395
by Honored Contributor
  • 2057 Views
  • 6 replies
  • 2 kudos

Empty Streaming tables in dlt

I want to create empty streaming tables in dlt with only schema specified. Is it possible ?I want to do it in dlt python.

  • 2057 Views
  • 6 replies
  • 2 kudos
Latest Reply
brunoillipronti
New Contributor II
  • 2 kudos

I confirm that ashraf1395 solution works. All approaches I tried (of creating an empty table) created a materialized view (which you can't merge). It's disappointing though, since there is no quick param in a "create_table" command to create a simple...

  • 2 kudos
5 More Replies
Zachary_Higgins
by Contributor
  • 13962 Views
  • 9 replies
  • 13 kudos

ignoreDeletes' option with Delta Live Table streaming source

We have a delta streaming source in our delta live table pipelines that may have data deleted from time to time. The error message is pretty self explanatory:...from streaming source at version 191. This is currently not supported. If you'd like to i...

  • 13962 Views
  • 9 replies
  • 13 kudos
Latest Reply
IanB_Argento
New Contributor II
  • 13 kudos

I had this same issue whilst doing some POC work. I was able to overcome it as follows:Navigate to Workflows | Jobs & pipelines.Select your pipeline.Click the drop-down next to the Start button.Choose "Full refresh all".That resets it all and fixes t...

  • 13 kudos
8 More Replies
Pavankumar7
by New Contributor III
  • 2582 Views
  • 6 replies
  • 4 kudos

Resolved! Error in connecting serverless compute in free edition

I am unable to connect serverless compute under Free edition of DB, also in compute tab, I can see only the 3 tabs (SQL warehouses, Vector search, apps) not able to create new compute as we used to create in community edition  

Pavankumar7_0-1750675424179.png Pavankumar7_1-1750675457641.png
  • 2582 Views
  • 6 replies
  • 4 kudos
Latest Reply
Thomas_W
New Contributor III
  • 4 kudos

@Pavankumar7 - are you experiencing this issue for existing/imported notebooks, or for brand new notebooks too?If it's the former, the notebook may be using an old serverless environment version. When Databricks updates the Serverless environment, ex...

  • 4 kudos
5 More Replies
pacman
by New Contributor
  • 16013 Views
  • 7 replies
  • 0 kudos

How to run a saved query from a Notebook (PySpark)

Hi Team! Noob to Databricks, so apologies if I ask a dumb question.I have created a relatively large series of queries that fetch and organize the data I want.  I'm ready to drive all of these from a Notebook (likely PySpark).An example query is save...

  • 16013 Views
  • 7 replies
  • 0 kudos
Latest Reply
aethorimn_cgr
New Contributor II
  • 0 kudos

@uday_satapathy Hi Uday. Do you know if this method works for many users? In case I need to share the script so a teammate may use it.

  • 0 kudos
6 More Replies
Pratikmsbsvm
by Contributor
  • 1330 Views
  • 2 replies
  • 2 kudos

Resolved! Data Lakehouse architecture with Azure Databricks and Unity Catalog

I am Creating a Data lakehouse solution on Azure Databricks.Source : SAP, SALESFORCE, AdobeTarget: Hightouch (External Application), Mad Mobile (External Application)The datalake house also have transactional records which should be store in ACID pro...

  • 1330 Views
  • 2 replies
  • 2 kudos
Latest Reply
KaranamS
Contributor III
  • 2 kudos

Hi @Pratikmsbsvm , from what I understand, you have a lakehouse on Azure databricks and would like to share this data with another databricks account or workspace. If Unity Catalog is enabled on your Azure databricks account, you can leverage Delta S...

  • 2 kudos
1 More Replies
data_learner1
by New Contributor II
  • 1035 Views
  • 4 replies
  • 1 kudos

Need to track the schema changes/column renames/column drops in Data bricks Unity Catalog

Hi Team, We are getting data from third party vendor to the databricks unity Catalog. They are doing schema changes frequently and we would like to track that. Just wanted to know if I can do this using audit table on the system catalog. As we only h...

  • 1035 Views
  • 4 replies
  • 1 kudos
Latest Reply
CURIOUS_DE
Valued Contributor
  • 1 kudos

@data_learner1  Unity Catalog logs all data access and metadata operations (including schema changes) into the audit logs — which are stored in the system catalog tables, such as:system.access.auditYou mentioned you only have read access — and likely...

  • 1 kudos
3 More Replies
NikosLoutas
by New Contributor III
  • 2108 Views
  • 2 replies
  • 0 kudos

Resolved! Databricks Full Refresh of DLT Pipeline

Hello, I have a question regarding the full refresh of a DLT pipeline, where the data source is an external table. When running the pipeline without a full refresh, then the streaming will pull data which are currently present in the external source ...

  • 2108 Views
  • 2 replies
  • 0 kudos
Latest Reply
seeyesbee
New Contributor II
  • 0 kudos

Hi @paolajara — in your point 5 you mentioned using Delta Lake for tracking changes. Could you point me to any official docs or examples that walk through enabling CDC / row-tracking on a Delta table?I pull data from SharePoint via its REST endpoint,...

  • 0 kudos
1 More Replies

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now
Labels