cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Databricks Free Edition Help
Engage in discussions about the Databricks Free Edition within the Databricks Community. Share insights, tips, and best practices for getting started, troubleshooting issues, and maximizing the value of your trial experience to explore Databricks' capabilities effectively.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Delta Shared catalog use cost allocation

PumpItUpper
New Contributor II

Hi. I have the following scenario on AWS Databricks:

I have a catalog that I need to Delta Share to external systems (both other dbks accounts and other non-dbks systems).
All tables on this catalog are MANAGED type, with all the files sitting in the same S3 bucket.
This catalog sees really heavy use internally (inside our dbks account), and we expect heavy external use as well once we start sharing it.

My worry:
I understand that Delta Share doesn't incur extra cost on itself, but the queries from external systems against this catalog will trigger the GetObject action on the underlying bucket. Also, there will be network egress costs for each query.
How can I attribute the cost for the use of this catalog? I need to know which is internal use, and be able to identify each of the external systems individually so I can chargeback to the respective teams.

Thanks in advance!

1 REPLY 1

emma_s
Databricks Employee
Databricks Employee

Hi

As you've rightly said the cost of delta sharing is not in the compute but in the egress costs. I would also ignore the GetObject costs as these again will probably be negligible, around $0.0004 per 1000 get object requests. 

So the egress charges is where you may need a solution. Has this been rolled out yet? and do you have a concept of the scale of the likely egress bill. Again you have a certain amount of egress for free, so it may be worth monitoring to see if it's actually a sizeable amount. Remember you'll only be paying egress if it's to a different cloud/region. if you do need to so this I think you would look to query the system audit table for query numbers by user by table. I would then create a reference table of all the relative table sizes in my catalog and use this to apportion costs.

Query to get to how many times a user has used the delta share

SELECT

 event_time,

 event_date,

 action_name,

 user_identity.email                          AS recipient_email,

 source_ip_address,

 request_params['share']                      AS share_name,

 request_params['schema']                     AS schema_name,

 request_params['name']                       AS table_name,

 response.status_code                         AS status_code,

 response.error_message                       AS error_message

FROM system.access.audit

WHERE action_name LIKE 'deltaSharing%'

 -- Uncomment and set a share name to filter to a specific share:

 -- AND request_params['share'] = '<your_share_name>'

 AND event_date >= current_date() - INTERVAL 30 DAYS

ORDER BY event_time DESC

 

 

The following is a reference of the actions:

emma_s_0-1782462000790.png

 

I hope this helps.


Thanks,

Emma