Databricks Community

Ajay-Pandey · ‎01-16-2023

Determining location of DBIO file fragments. This operation can take some time.

What does this mean, and how do I prevent it from having to perform this apparently-expensive operation every time? This happens even when all the underlying tables are Delta tables.

Ajay Kumar Pandey

LandanG · ‎01-16-2023

Hey @Ajay Pandey ,

That message is related to delta caching, basically if a cluster is constantly scaling up or down then occasionally you might lose delta cache pieces. Determining the location of DBIO file fragments is the operation determining which executors the files were cached.

This is something that can be helped by trying a newer DBR such as 11.3 or 12.X. You could also try turning off the cache by setting the below configuration in the notebook and observing the behaviour:

spark.conf.set("spark.databricks.io.cache.enabled", "false")

You could also try optimizing the table(s)

%sql Optimize [table name]

View solution in original post

LandanG · ‎01-16-2023

Hey @Ajay Pandey ,

That message is related to delta caching, basically if a cluster is constantly scaling up or down then occasionally you might lose delta cache pieces. Determining the location of DBIO file fragments is the operation determining which executors the files were cached.

This is something that can be helped by trying a newer DBR such as 11.3 or 12.X. You could also try turning off the cache by setting the below configuration in the notebook and observing the behaviour:

spark.conf.set("spark.databricks.io.cache.enabled", "false")

You could also try optimizing the table(s)

%sql Optimize [table name]

Ajay-Pandey · ‎01-18-2023

Thanks

Ajay Kumar Pandey

AdrianLobacz · ‎01-16-2023

That is a message about the delta cache. It’s determines on which executors it has what cached, to route tasks for best cached locality. Optimizing your table more frequently so there are fewer files will make this better

U can try:

%sql Optimize [table name]

Ajay-Pandey · ‎01-18-2023

Thanks

Ajay Kumar Pandey

Christianben9 · ‎01-19-2023

Determining location of DBIO file fragments" is a message that may be displayed during the boot process of a computer running the NetApp Data ONTAP operating system. This message indicates that the system is currently in the process of identifying and locating the DBIO (Data Block Input/Output) file fragments on the storage system. This process is necessary in order to ensure that all data on the system is accessible and in a consistent state.

The time it takes to complete this process can depend on several factors, such as the number of disks in the system, the amount of data stored on the disks, and the performance of the disks themselves. However, there are a few things you can do to potentially speed up this process:

Increase the number of spare disks: Adding more spare disks to the system can help to speed up the process, as the system can use these spare disks to rebuild data faster.
Check for disk errors: Make sure that all the disks are functioning properly and there are no errors on them.
Check for firmware updates: Make sure that the firmware of the storage system and the disks is up to date.
Check for performance bottlenecks: Check for any performance bottlenecks on the storage system, such as high CPU or memory usage, and address them if necessary.
Check for any other software issues: Ensure that the software is running smoothly and not having any issues.

Keep in mind that this process is an important step in ensuring data integrity, it should not be skipped or rushed. It's crucial to be patient and let the process finish.