cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Overwrite still saves numerous parquet files in storage container

Paully
New Contributor

I inherited this environment and my question is we have a job that mines the the data lake and creates a table that's is grouped by unit number and their data points. The job runs every 10 minutes. We then connect to that table direct query power bi and raise alarms in a model we have built in the app space. We are trying to optimize it we have an overwrite function but there are 100's parquet files in the container for each individual job runs equaling over 100gigs. Why? Wouldn't overwrite just recreate the same table or do we need to do a 'drop table if exist' in the script.

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group