yesterday
Hi all - I am trying to create a dashabord where I need to list down all users and service principals along with groups and understand their databricks usages. Is there any table available in Databricks that contains user, service principal details. Please help me on this.
8 hours ago
Hi @Sanjeeb2024
Databricks does not provide a single built in table that lists users and service principals along with their usage, but you can build this by combining a few supported sources. User and service principal details can be retrieved using the system information schema tables such as system information users, system information service principals, and system information group members, which give identities and group mappings. For usage and activity, you can use the system access audit logs tables to understand actions performed by users and service principals, and the system compute usage tables to analyze cluster and SQL usage. By joining identity information from the information schema with audit logs and usage tables, you can build a dashboard that shows users, service principals, their group memberships, and how they are using Databricks resources.
5 hours ago
I already explore this but not getting the desired information.
5 hours ago
Hi,
Hi, there isn't a table which shows mappings of Users to groups, you can only see this through the groups editing. You can, however, check grants on tables to see which groups or users have been granted permission to tables. The following query is an example of how you can see this:
SELECT
'catalog' AS level,
catalog_name,
NULL AS schema_name,
NULL AS table_name,
grantee,
privilege_type
FROM system.information_schema.catalog_privileges
UNION ALL
SELECT
'schema' AS level,
catalog_name,
schema_name,
NULL AS table_name,
grantee,
privilege_type
FROM system.information_schema.schema_privileges
UNION ALL
SELECT
'table' AS level,
table_catalog AS catalog_name,
table_schema AS schema_name,
table_name,
grantee,
privilege_type
FROM system.information_schema.table_privileges
ORDER BY level, catalog_name, schema_name, table_name, grantee, privilege_type;
In terms of analysing system usage by user, here are some queries that should help you:
Analysing Usage For SQL Warehouses, attributing cost based on time
%sql
WITH wh_hourly_usd AS (
SELECT
u.usage_metadata.warehouse_id AS warehouse_id,
date_trunc('hour', u.usage_start_time) AS hour_ts,
SUM(u.usage_quantity * p.pricing.effective_list.default) AS usd_hour
FROM system.billing.usage AS u
JOIN system.billing.list_prices AS p
ON p.sku_name = u.sku_name
AND u.usage_end_time >= p.price_start_time
AND (p.price_end_time IS NULL OR u.usage_end_time < p.price_end_time)
WHERE u.usage_metadata.warehouse_id IS NOT NULL
GROUP BY ALL
),
wh_hourly_load AS (
SELECT
h.compute.warehouse_id AS warehouse_id,
date_trunc('hour', h.start_time) AS hour_ts,
h.executed_by AS user_email,
SUM(h.total_task_duration_ms) AS task_ms
FROM system.query.history AS h
WHERE h.compute.type = 'WAREHOUSE'
AND h.execution_status IN ('FINISHED','FAILED','CANCELED')
GROUP BY ALL
),
user_share AS (
SELECT
l.warehouse_id,
l.hour_ts,
l.user_email,
l.task_ms
/ NULLIF(SUM(l.task_ms) OVER (PARTITION BY l.warehouse_id, l.hour_ts), 0) AS share
FROM wh_hourly_load l
),
attributed AS (
SELECT
s.warehouse_id,
s.user_email,
SUM(s.share * u.usd_hour) AS usd_attributed
FROM user_share s
JOIN wh_hourly_usd u
ON s.warehouse_id = u.warehouse_id AND s.hour_ts = u.hour_ts
GROUP BY s.warehouse_id, s.user_email
)
SELECT warehouse_id, user_email, usd_attributed
FROM attributed
ORDER BY usd_attributed DESC;
Analysing costs for jobs
%sql
-- Estimated $USD per job run_as identity
SELECT
usage.usage_metadata.job_id AS job_id,
usage.usage_metadata.job_run_id AS job_run_id,
usage.identity_metadata.run_as AS run_as_identity,
SUM(usage.usage_quantity * prices.pricing.effective_list.default) AS est_cost_usd
FROM system.billing.usage AS usage
JOIN system.billing.list_prices AS prices
ON prices.sku_name = usage.sku_name
AND usage.usage_end_time >= prices.price_start_time
AND (prices.price_end_time IS NULL OR usage.usage_end_time < prices.price_end_time)
WHERE usage.usage_metadata.job_id IS NOT NULL
GROUP BY ALL
ORDER BY est_cost_usd DESC;
Costs for all purpose clusters, note for all purpose clusters you can only attribute costs to the person who set it up, you may want to thing about a tagging strategy to attribute to teams etc
SELECT
usage.usage_metadata.cluster_id AS cluster_id,
usage.custom_tags['creator'] AS creator_tag,
SUM(usage.usage_quantity) AS dbus
FROM system.billing.usage AS usage
WHERE usage.usage_metadata.cluster_id IS NOT NULL
GROUP BY ALL
ORDER BY dbus DESC;
5 hours ago
Thanks. The cost queries are helpful. However I am looking for a report which will provide me all users and service principals created in Databricks, can you share me how to get that list.
4 hours ago
Hi @Sanjeeb2024
You can get the complete list of users and service principals directly from the system information schema tables. Databricks exposes system information users for all human users and system information service principals for all service principals created in the workspace. These tables provide identity details such as name, email or application id, and status. If you also need group membership, you can join these with system information groups and system information group members to derive user and service principal to group mappings. These system tables are the supported and recommended way to report on all identities created in Databricks.
To list all users in Databricks:
select id, user_name, active
from system.information_schema.users;To list all service principals:
select id, application_id, display_name, active
from system.information_schema.service_principals;To see users and service principals with their group memberships:
select gm.member_id, gm.member_type, g.display_name as group_name
from system.information_schema.group_members gm
join system.information_schema.groups g
on gm.group_id = g.id;You can join these results with users or service principals tables using the member id to build a complete report of all identities and their groups.
4 hours ago
Hi @mukul1409 - I do not find any users table under information_schema under my Databricks account. Can you please validate it at your end and share me a screenshot. Do not send any data, just a catalog screenshot should be ok. I checked Databricks AI assistant, it provide me the same SQL, I think it is pointing to old data.
Regards - San
4 hours ago
Hi @Sanjeeb2024
This is expected in many workspaces and it is not an issue on your side. The system information schema tables such as users and service principals are only available when system tables are enabled for the workspace and the querying user has the required permissions. These tables are not present in older workspaces, workspaces without Unity Catalog enabled, or when system tables have not been turned on by an account admin. In addition, visibility can be limited if you are not an account admin or do not have access to system schemas. That is why you do not see the users table under information schema, and why the AI assistant response may appear outdated. The correct next step is to check whether system tables are enabled for your workspace and confirm permissions with your Databricks account admin.
4 hours ago
Hi @mukul1409 - are you able to validate at your end to ensure you have users table under information_schema. Even if I tried to check in Databricks Free edition where I am the account admin and not able locate this table. Requesting you, please validate once to check this? See the screenshot from Databrick Free Edition.
I am able to see all other tables under system database, so need to understand whether the issue is with my end or these tables are depreciated.
Regards - San
4 hours ago
Another screenshot:
4 hours ago
Hi @Sanjeeb2024
In Databricks Free Edition , the identity related system tables such as users, service principals, groups, and group members are not exposed, even if you are the account admin. These tables are part of Databricks system tables and are only available in paid workspaces with Unity Catalog enabled and system tables turned on at the account level.
4 hours ago
Ok, thanks, let me check with account admin for our Enterprise edition.
3 hours ago
Hi, I can't find any reference to a user system table in our docs. Instead the recommended approach is to use the API to return users, groups and service principals. You can either run this using the Workspace Client if you only have worspace admin permissions or the Account client if you have account admin permissions. Example code below:
from databricks.sdk import AccountClient
# If env vars are set, this picks them up automatically
a = WorkspaceClient()
# List identities
users = list(a.users.list())
groups = list(a.groups.list())
service_principals = list(a.service_principals.list())
print(f"Users: {len(users)}")
for u in users[:10]:
print(f"- {u.user_name}")
print(f"\nGroups: {len(groups)}")
for g in groups[:10]:
print(f"- {g.display_name}")
print(f"\nService Principals: {len(service_principals)}")
for sp in service_principals[:10]:
print(f"- {getattr(sp, 'display_name', getattr(sp, 'application_id', 'unknown'))}")
def get_group_members_by_id(group_id: str):
w = WorkspaceClient()
group = w.groups.get(id=group_id) # SCIM read of the group
members = group.members or []
return members
# List users and service principals in each group
for group in a.groups.list():
print(f"Group: {group.display_name}")
members = list(get_group_members_by_id(group.id))
for member in members:
print(f" - {member.type}: {getattr(member, 'display', getattr(member, 'user_name', 'unknown'))}")
2 hours ago
Thank you @emma_s . Appreciated your help. I will give a try and update.
Passionate about hosting events and connecting people? Help us grow a vibrant local communityโsign up today to get started!
Sign Up Now