cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Order of delta table after read not as expected

felix_counter
New Contributor III

Dear Databricks Community,

I am performing three consecutive 'append' writes to a delta table, whereas the first append creates the table. Each append consists of two rows, which are ordered by column 'id' (see example in the attached screenshot). When I read in the dataframe after all appends have completed, the rows are ordered with respect to column 'id' in the following order: '1, 2, 5, 6, 3, 4'. My expectation is '1, 2, 3, 4, 5, 6', as the original data has been ordered by 'id', and the appends to the delta table happened in order '1, 2', '3, 4', and '5,6'.

Is this behavior expected? 

Is there a way to obtain the same order upon read in which the data has been appended to the table? 

Thanks a lot for your consideration and help.

1 ACCEPTED SOLUTION

Accepted Solutions

Lakshay
Databricks Employee
Databricks Employee

@felix_counter 

While fetching the data from a delta table, the order is not maintained as the data is fetched parallely. This is an expected scenario. If you want to display the data in a certain order, it is advised to query the data with the order by or sort by clause.

View solution in original post

3 REPLIES 3

Lakshay
Databricks Employee
Databricks Employee

@felix_counter 

While fetching the data from a delta table, the order is not maintained as the data is fetched parallely. This is an expected scenario. If you want to display the data in a certain order, it is advised to query the data with the order by or sort by clause.

Tharun-Kumar
Databricks Employee
Databricks Employee

@felix_counter 

Adding on to Lakshay's answer, you have to rewrite your query as

from pyspark.sql.functions import col
df.orderBy(col("id")).show()

felix_counter
New Contributor III

Thanks a lot @Lakshay and @Tharun-Kumar for your valued contributions!

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโ€™t want to miss the chance to attend and share knowledge.

If there isnโ€™t a group near you, start one and help create a community that brings people together.

Request a New Group