cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Delta Sharing - RESOURCE_LIMIT_EXCEEDED (AddFiles, RemoveFiles)

Leszek
Contributor

Hi,

I would like to start using Delta Sharing but I would need to check if my tables hit limitations which are in the following article: RESOURCE_LIMIT_EXCEEDED error when querying a Delta Sharing table - Databricks.

Delta Sharing has limits on the metadata size of a shared table.

  • You are limited to 700k AddFiles actions in the DeltaLog. This is how many active files you can have in a shared Delta table.
  • You are limited to 100k RemoveFiles actions in the DeltaLog. This is the number of files that have been deleted. This includes files that have been removed by operations like OPTIMIZE and MERGE.

Basically, the question is how can I calculate number of AddFiles / RemoveFiles for Delta Table?

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @LeszekDelta Sharing is a powerful protocol that allows secure real-time exchange of large datasets across different computing platforms.

Let’s dive into how you can calculate the number of AddFiles and RemoveFiles actions for a shared Delta table.

  1. AddFiles Actions:

    • The AddFiles actions represent the addition of new files to the shared Delta table. These files contain data that has been appended or ingested.
    • To calculate the number of AddFiles actions, you need to consider the total count of files added to the table over time.
    • Each time new data is ingested or appended, it contributes to the AddFiles count.
  2. RemoveFiles Actions:

    • The RemoveFiles actions represent the removal of files from the shared Delta table. These files might have been deleted due to operations like OPTIMIZE or MERGE.
    • To calculate the number of RemoveFiles actions, you need to track the files that have been deleted.
    • Keep in mind that even if a file is removed by an operation like OPTIMIZE, it still counts as a RemoveFiles action.
  3. Total Actions:

    • The total number of actions (both AddFiles and RemoveFiles) in the DeltaLog determines the metadata size of the shared table.
    • You can sum up the AddFiles and RemoveFiles counts to get the total number of actions.
  4. Monitoring and Tracking:

    • To monitor these actions, you can query the DeltaLog associated with the shared table.
    • Databricks provide APIs and commands to inspect the DeltaLog and retrieve information about actions performed on the table.

Remember that staying within the specified limits is crucial to ensure the smooth operation of your shared Delta table. If you approach the limits, consider optimizing your data management strategies or archiving older data to maintain a healthy metadata size.

For more detailed information, refer to the official Delta Sharing documentation and explore the spe...12.

Happy sharing! 🚀

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!