โ04-06-2024 01:01 PM
โ04-06-2024 01:04 PM
Just a note:
I'm constantly having error when trying to post with correct code formatted text. My apologize, but I can't at all post correctly.
โ04-16-2024 07:00 AM - edited โ04-16-2024 07:08 AM
Hi,
According to When to partition tables on Databricks :
Instead of partitions, take a look at :
It is highly possible you do not need to rewrite the whole dataset but rather use the MERGE operation.
While running the processes that may work with the same partitions please make the separation as explicit as possible in the operation condition (see ConcurrentAppendException).
You can find here a repository with demos that can contain useful hints and that you can install in your workspace (maybe the one on Delta Lake would be the most relevant for you at the current stage ? click on the "View the Notebooks" button to access to codes and run the pip command to play with the content)
Hope it helps,
Best,
โ04-16-2024 09:16 AM
Thanks @artsheiko !
Well I definitely do not meet all the requirements to build partitioned tables.
The biggest table I have so far has a miserable size of ~60MB just for one partition, and It will increase its space and records, but not enough to reach 1TB or even one 1GB (and this is probably the biggest source I'll have so far).
So, I'll need to review the approach to not have partitions.
Working with the Merge statement, seems the approach to follow.
Just a question about the demos you shared:
They're only available on a databricks environment, right?
Thanks for your help!
โ04-16-2024 09:21 AM
Hi @databird ,
You can review the code of each demo by opening the content via "View the Notebooks" or by exploring the following repo : https://github.com/databricks-demos (you can try to search for "merge" to see all the occurrences, for example)
To install the demo, indeed, you need a workspace - the installation process may bring not only the notebooks, but also workflows, DLTs and eventually dashboards. Another reason is that each demo is an independent asset. So, it should operate on top of some demo data - it's generated also during the installation.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group