Overwrite still saves numerous parquet files in storage container

Data Engineering

Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.

I inherited this environment and my question is we have a job that mines the the data lake and creates a table that's is grouped by unit number and their data points. The job runs every 10 minutes. We then connect to that table direct query power bi and raise alarms in a model we have built in the app space. We are trying to optimize it we have an overwrite function but there are 100's parquet files in the container for each individual job runs equaling over 100gigs. Why? Wouldn't overwrite just recreate the same table or do we need to do a 'drop table if exist' in the script.

0 REPLIES 0

Photos

Upload Upload
URL URL
Saved Photos Saved Photos

Upload location

Upload location

Add Photos to Album:

New Album

Drag here to start uploading

Drag photos here or

Tap for upload options

You must install or upgrade to the latest version of Adobe Flash Player before you can upload images.