article Databricks for Identity Systems - Part 5 (Application Assignments) in Technical Blog

article Databricks for Identity Systems - Part 5 (Application Assignments) in Technical Blog https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-5-application-assignments/ba-p/123185 <P>In the previous section, we discovered how to pull <STRONG>Applications</STRONG> and <STRONG>Application Groups</STRONG> into Databricks. I briefly mentioned that we generally assign users to groups and then assign groups to applications for access in our system. This practice makes it a bit easier to manage access, especially in a complicated and fast growing environment. Later, we can also use these groups in things like <STRONG>Group Rules</STRONG> to assign applications as birthrights, based on user attributes in our directory.</P> <P>Though <STRONG>Group Membership</STRONG> is our preferred method of granting access to applications, it is not the <EM>ONLY</EM> method that Okta provides. Accounts can also be individually assigned to applications, without being in a group. To discover these types of assignments, we'll pull all of the individual users assigned to each app and store that information for analysis and alerting, as necessary. This also helps us to know <EM>HOW</EM> each user is getting access to an application, in the case of multiple potential access groups being assigned to 1 user.</P> <P><FONT size="5">Let's Begin</FONT></P> <P><SPAN>As with each previous Notebook, we'll install our libraries for this Notebook and restart our kernel if any of the modules weren't originally present.</SPAN></P> <LI-CODE lang="python">import importlib.util import sys mods = ['nest_asyncio', 'okta'] restart_required = False for mod in mods: spec = importlib.util.find_spec(str(mod)) if spec is not None: print(f'{mod} already installed') else: %pip install {mod} restart_required=True if restart_required==True: dbutils.library.restartPython()</LI-CODE> <P><FONT size="5">Don't Forget Your Keys!</FONT></P> <P>Check <A href="https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-1-users/ba-p/116122" target="_self">Part 1</A> for tips on how to work with Databricks Secrets. Don't skip this step because you'll need your API key for all operations later. In my case, the decoded key is named <STRONG>okta_key</STRONG>.</P> <P><FONT size="5">"Play Some Jazz"</FONT></P> <P>As I've previously noted, I am not a classically trained Data Engineer or Data Scientist by any means. However, I do understand some data structure and how things can fit together to tell a story. Because of this cursory knowledge, I feel like I can kind of experiment and make some things up a bit. With a bit of creativity and a lot of effort, it might just work out somehow. I call this "playing jazz". Not to downplay Jazz players in any way, as I'm also a musician and previously played in Jazz Band in High School. <span class="lia-unicode-emoji" title=":slightly_smiling_face:">🙂</span>  </P> <P>For this next section, I decided to use the SDK as well as a bit of creative collecting via a custom dictionary. So, instead of staying with the SDK sheet music, I decided to play a little Jazz and improv a bit. Let's define the necessary functions.</P> <LI-CODE lang="python">import okta import nest_asyncio import asyncio from okta.client import Client as OktaClient config = { 'orgUrl': 'https://my-okta-org.okta.com', 'token': okta_key } okta_client = OktaClient(config) async def list_okta_apps(): app_list = [] #use the SDK to collect the applications list apps, resp, err = await okta_client.list_applications() print(f'Got {apps.count}') while True: for app in apps: app_list.append(app) if resp.has_next(): apps, err = await resp.next() else: break return app_list async def list_app_assignments(apps): i = 1 for app in apps: app_users_list = [] if app['status'] == "ACTIVE": query_params = {'limit':'500'} app_users, resp, err = await okta_client.list_application_users(app['id'], query_params) #because we want to combine the application dictionary with the app user dictionary, #we create a new dictionary to combine the two dicts into one and add that to a list. if len(app_users) > 0: for app_u in app_users: app_u = app_u.__dict__ user_obj = {} user_obj['app_id'] = app['id'] #app ID user_obj['app_label'] = app['label'] #app label/displayName user_obj['id'] = app_u['id'] #app user ID user_obj['app_user_scope'] = app_u['scope'] #how are they assigned to the app if app_u['scope'] == 'GROUP': #if assignment by group, what group? user_obj['group_name'] = app_u['links']['group']['name'] else: user_obj['group_name'] = 'N/A' app_users_list.append(user_obj) user_obj = None #reset the custom dict while resp.has_next(): #paginate as needed app_users, err = await resp.next() if len(app_users) > 0: for app_u in app_users: app_u = app_u.__dict__ user_obj = {} user_obj['app_id'] = app['id'] user_obj['app_label'] = app['label'] user_obj['id'] = app_u['id'] user_obj['app_user_scope'] = app_u['scope'] if app_u['scope'] == 'GROUP': user_obj['group_name'] = app_u['links']['group']['name'] else: user_obj['group_name'] = 'N/A' app_users_list.append(user_obj) print(f'Collected: {len(app_users_list)} users') #print to the screen so we see that something is happening i=i+1 #counter for display return app_users_list</LI-CODE> <P><FONT size="5">The Main Event</FONT></P> <P>Here, we'll call our previously defined functions and remove any <STRONG>INACTIVE</STRONG> apps from the results before we start querying the users assigned. Next, we add our <STRONG>as_of_date</STRONG> to the <STRONG>ACTIVE</STRONG> apps in our collection.</P> <LI-CODE lang="python">nest_asyncio.apply() apps = [] apps = asyncio.run(list_okta_apps()) or [] #remove inactive apps new_apps = [] for row in apps: if row.status == 'INACTIVE': print(f'Skipping {row.label} from list of apps') else: new_apps.append(row) #add as_of_date new_coll = [] for app in new_apps: updated_row = app.__dict__.copy() updated_row['as_of_date'] = str(today) new_coll.append(updated_row) new_apps = new_coll</LI-CODE> <P><FONT size="5">Get Some Coffee</FONT></P> <P>Depending on your environment, this next step can take a while. In my environment, it takes roughly 3 hours to complete, due to the the number of users and apps configured. Because this is potentially going to take a while, I recommend allowing this step to process and going to do other things. Maybe revisit one of the previous posts in this series and give me some pointers on how I can improve! <span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:">😄</span> </P> <LI-CODE lang="python">app_group_users = [] #collection of all apps and app users count = 1 for item in new_apps: print(f'Processing app {count} of {len(new_apps)}: {item["label"]}') current_users = asyncio.run(list_app_assignments([item])) app_group_users += current_users count = count + 1 #add datestamp new_coll = [] for row in app_group_users: # Create a copy of the dictionary to avoid modifying the original updated_row = row.copy() # Add the new key-value pair updated_row['as_of_date'] = str(today) # Append the updated dictionary to the list new_coll.append(updated_row) app_groups = new_coll</LI-CODE> <P><FONT size="5">Define the Schema and Write to Tables</FONT></P> <P>Convert the list of dictionaries to a Dataframe and write to our bronze and silver layers. For silver, I also changed <STRONG>id</STRONG> to <STRONG>user_id</STRONG>, because I thought that <STRONG>id</STRONG> just seemed too vague. I feel like this will be helpful later on when we're writing some SQL and making some dashboard.</P> <LI-CODE lang="python">import json from pyspark.sql import SparkSession from pyspark.sql.functions import col from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType, ArrayType # Create a SparkSession spark = SparkSession.builder.appName("OktaAppUsers").getOrCreate() # Define the schema schema = StructType([ StructField("app_id", StringType(), True), StructField("app_label", StringType(), True), StructField("app_user_scope", StringType(), True), StructField("group_name", StringType(), True), StructField("id", StringType(), True), StructField("as_of_date", StringType(), True) ]) df = spark.createDataFrame(app_groups, schema) df_formatted = df.select("app_id", "app_label", col("id").alias("user_id"), "app_user_scope", "group_name", "as_of_date") #write to tables df.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_app_assignments", mode="append") df_formatted.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_app_assignments_formatted", mode="append") </LI-CODE> <P><FONT size="5">Another One Down!</FONT></P> <P>If you've come this far, then you have successfully written your <A href="https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-1-users/ba-p/116122" target="_self">Users</A>, <A href="https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-2-groups-and-group-rules/ba-p/116260" target="_self">Groups</A>, <A href="https://community.databricks.com/t5/blogs/blogworkflowpage/blog-id/technical-blog/article-id/536" target="_self">Group Members</A>, and <A href="https://community.databricks.com/t5/blogs/blogworkflowpage/blog-id/technical-blog/article-id/670" target="_self">Apps</A> tables. We also have relationships between all of the tables in some form! Thanks for sticking around this far!</P> <P>Let's go play with some data, shall we?</P> Wed, 09 Jul 2025 16:19:51 GMT jack_zaldivar 2025-07-09T16:19:51Z Databricks for Identity Systems - Part 5 (Application Assignments) https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-5-application-assignments/ba-p/123185 <P>Collect app assignments for each application in your environment and whether users aer assigned by group or individually</P> Wed, 09 Jul 2025 16:19:51 GMT https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-5-application-assignments/ba-p/123185 jack_zaldivar 2025-07-09T16:19:51Z