cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta file question

apiury
New Contributor III

Hi! Im using Autoloader to ingest Binary files into delta format. I have 7 binary files but delta generate 3 files and the format is part-0000, part-0001... Why generate this files with format part-000...

image

4 REPLIES 4

Lakshay
Databricks Employee
Databricks Employee

Hi @Alejandro Piury Pinzón​ , The Delta table manages the size of the file being written to the table. The no. of files being written in the Delta table depends upon the total volume of the data being written to the table and not the no. of files at the source location.

The file format part-000 is generated because of the use of a hash algorithm to divide the no. of rows into different files.

apiury
New Contributor III

But i don't understand. For example, i have 3 files:

image 

When i upload the files using autoloader, 3 files are generated:

imageWhy databricks doesn't put them all into 1 file?

Lakshay
Databricks Employee
Databricks Employee

As spark processes the data by dividing the data into multiple partitions, so when writing the data no. of part files created will be equal to no. of partitions. If you are doing this outside Autoloader, you can use coalesce to control the no. of partitions but in Autolader, I am not sure if we can use coalesce.

However, you can run optimize command on the delta table to compact the file.

Anonymous
Not applicable

Hi @Alejandro Piury Pinzón​ 

We haven't heard from you since the last response from @Lakshay Goel​ r​, and I was checking back to see if her suggestions helped you.

Or else, If you have any solution, please share it with the community, as it can be helpful to others. 

Also, Please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group