cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

What does "Determining location of DBIO file fragments..." mean, and how do I speed it up?

Ajay-Pandey
Esteemed Contributor III

Determining location of DBIO file fragments. This operation can take some time.

What does this mean, and how do I prevent it from having to perform this apparently-expensive operation every time? This happens even when all the underlying tables are Delta tables.

1 ACCEPTED SOLUTION

Accepted Solutions

LandanG
Honored Contributor
Honored Contributor

Hey @Ajay Pandey​ ,

That message is related to delta caching, basically if a cluster is constantly scaling up or down then occasionally you might lose delta cache pieces. Determining the location of DBIO file fragments is the operation determining which executors the files were cached.

This is something that can be helped by trying a newer DBR such as 11.3 or 12.X. You could also try turning off the cache by setting the below configuration in the notebook and observing the behaviour:

spark.conf.set("spark.databricks.io.cache.enabled", "false")

You could also try optimizing the table(s)

%sql Optimize [table name]

View solution in original post

6 REPLIES 6

LandanG
Honored Contributor
Honored Contributor

Hey @Ajay Pandey​ ,

That message is related to delta caching, basically if a cluster is constantly scaling up or down then occasionally you might lose delta cache pieces. Determining the location of DBIO file fragments is the operation determining which executors the files were cached.

This is something that can be helped by trying a newer DBR such as 11.3 or 12.X. You could also try turning off the cache by setting the below configuration in the notebook and observing the behaviour:

spark.conf.set("spark.databricks.io.cache.enabled", "false")

You could also try optimizing the table(s)

%sql Optimize [table name]

Ajay-Pandey
Esteemed Contributor III

Thanks

AdrianLobacz
Contributor

That is a message about the delta cache. It’s determines on which executors it has what cached, to route tasks for best cached locality. Optimizing your table more frequently so there are fewer files will make this better

U can try:

%sql Optimize [table name]

Ajay-Pandey
Esteemed Contributor III

Thanks

Christianben9
New Contributor II

Determining location of DBIO file fragments" is a message that may be displayed during the boot process of a computer running the NetApp Data ONTAP operating system. This message indicates that the system is currently in the process of identifying and locating the DBIO (Data Block Input/Output) file fragments on the storage system. This process is necessary in order to ensure that all data on the system is accessible and in a consistent state.

The time it takes to complete this process can depend on several factors, such as the number of disks in the system, the amount of data stored on the disks, and the performance of the disks themselves. However, there are a few things you can do to potentially speed up this process:

  1. Increase the number of spare disks: Adding more spare disks to the system can help to speed up the process, as the system can use these spare disks to rebuild data faster.
  2. Check for disk errors: Make sure that all the disks are functioning properly and there are no errors on them.
  3. Check for firmware updates: Make sure that the firmware of the storage system and the disks is up to date.
  4. Check for performance bottlenecks: Check for any performance bottlenecks on the storage system, such as high CPU or memory usage, and address them if necessary.
  5. Check for any other software issues: Ensure that the software is running smoothly and not having any issues.

Keep in mind that this process is an important step in ensuring data integrity, it should not be skipped or rushed. It's crucial to be patient and let the process finish.

Ajay-Pandey
Esteemed Contributor III

Thanks

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.