<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>article Databricks for Identity Systems - Part 5 (Application Assignments) in Technical Blog</title>
    <link>https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-5-application-assignments/ba-p/123185</link>
    <description>&lt;P&gt;In the previous section, we discovered how to pull &lt;STRONG&gt;Applications&lt;/STRONG&gt; and &lt;STRONG&gt;Application Groups&lt;/STRONG&gt; into Databricks. I briefly mentioned that we generally assign users to groups and then assign groups to applications for access in our system. This practice makes it a bit easier to manage access, especially in a complicated and fast growing environment. Later, we can also use these groups in things like &lt;STRONG&gt;Group Rules&lt;/STRONG&gt; to assign applications as birthrights, based on user attributes in our directory.&lt;/P&gt;
&lt;P&gt;Though &lt;STRONG&gt;Group Membership&lt;/STRONG&gt; is our preferred method of granting access to applications, it is not the &lt;EM&gt;ONLY&lt;/EM&gt; method that Okta provides. Accounts can also be individually assigned to applications, without being in a group. To discover these types of assignments, we'll pull all of the individual users assigned to each app and store that information for analysis and alerting, as necessary. This also helps us to know &lt;EM&gt;HOW&lt;/EM&gt; each user is getting access to an application, in the case of multiple potential access groups being assigned to 1 user.&lt;/P&gt;
&lt;P&gt;&lt;FONT size="5"&gt;Let's Begin&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;&lt;SPAN&gt;As with each previous Notebook, we'll install our libraries for this Notebook and restart our kernel if any of the modules weren't originally present.&lt;/SPAN&gt;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import importlib.util
import sys

mods = ['nest_asyncio', 'okta']
restart_required = False
for mod in mods:
  spec = importlib.util.find_spec(str(mod))
  if spec is not None:
    print(f'{mod} already installed')
  else:
    %pip install {mod}
    restart_required=True

if restart_required==True:
  dbutils.library.restartPython()&lt;/LI-CODE&gt;
&lt;P&gt;&lt;FONT size="5"&gt;Don't Forget Your Keys!&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Check &lt;A href="https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-1-users/ba-p/116122" target="_self"&gt;Part 1&lt;/A&gt; for tips on how to work with Databricks Secrets. Don't skip this step because you'll need your API key for all operations later. In my case, the decoded key is named&amp;nbsp;&lt;STRONG&gt;okta_key&lt;/STRONG&gt;.&lt;/P&gt;
&lt;P&gt;&lt;FONT size="5"&gt;"Play Some Jazz"&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;As I've previously noted, I am not a classically trained Data Engineer or Data Scientist by any means. However, I do understand some data structure and how things can fit together to tell a story. Because of this cursory knowledge, I feel like I can kind of experiment and make some things up a bit. With a bit of creativity and a lot of effort, it might just work out somehow. I call this "playing jazz". Not to downplay Jazz players in any way, as I'm also a musician and previously played in Jazz Band in High School. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt;&amp;nbsp;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;For this next section, I decided to use the SDK as well as a bit of creative collecting via a custom dictionary. So, instead of staying with the SDK sheet music, I decided to play a little Jazz and improv a bit. Let's define the necessary functions.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import okta
import nest_asyncio
import asyncio
from okta.client import Client as OktaClient


config = {
    'orgUrl': 'https://my-okta-org.okta.com',
    'token': okta_key
}

okta_client = OktaClient(config)

async def list_okta_apps():
    app_list = []
    #use the SDK to collect the applications list
    apps, resp, err = await okta_client.list_applications()
    print(f'Got {apps.count}')
    while True:
        for app in apps:
            app_list.append(app)
        if resp.has_next():
            apps, err = await resp.next()
        else:
            break
    return app_list

async def list_app_assignments(apps):
    i = 1
    for app in apps:
        app_users_list = []
        if app['status'] == "ACTIVE":
            query_params = {'limit':'500'}
            app_users, resp, err = await okta_client.list_application_users(app['id'], query_params)
            #because we want to combine the application dictionary with the app user dictionary, 
            #we create a new dictionary to combine the two dicts into one and add that to a list.
            if len(app_users) &amp;gt; 0:
                for app_u in app_users:
                    app_u = app_u.__dict__
                    user_obj = {}
                    user_obj['app_id'] = app['id'] #app ID
                    user_obj['app_label'] = app['label'] #app label/displayName
                    user_obj['id'] = app_u['id'] #app user ID
                    user_obj['app_user_scope'] = app_u['scope'] #how are they assigned to the app
                    if app_u['scope'] == 'GROUP':
                        #if assignment by group, what group?
                        user_obj['group_name'] = app_u['links']['group']['name']
                    else:
                        user_obj['group_name'] = 'N/A'
                    app_users_list.append(user_obj)
                    user_obj = None #reset the custom dict
                while resp.has_next(): #paginate as needed
                    app_users, err = await resp.next()

                    if len(app_users) &amp;gt; 0:
                        for app_u in app_users:
                            app_u = app_u.__dict__
                            user_obj = {}
                            user_obj['app_id'] = app['id']
                            user_obj['app_label'] = app['label']
                            user_obj['id'] = app_u['id']
                            user_obj['app_user_scope'] = app_u['scope']
                            if app_u['scope'] == 'GROUP':
                                user_obj['group_name'] = app_u['links']['group']['name']
                            else:
                                user_obj['group_name'] = 'N/A'
                            app_users_list.append(user_obj)
        print(f'Collected: {len(app_users_list)} users') #print to the screen so we see that something is happening

        i=i+1 #counter for display

    return app_users_list&lt;/LI-CODE&gt;
&lt;P&gt;&lt;FONT size="5"&gt;The Main Event&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Here, we'll call our previously defined functions and remove any &lt;STRONG&gt;INACTIVE&lt;/STRONG&gt; apps from the results before we start querying the users assigned. Next, we add our &lt;STRONG&gt;as_of_date&lt;/STRONG&gt; to the &lt;STRONG&gt;ACTIVE&lt;/STRONG&gt; apps in our collection.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;nest_asyncio.apply()
apps = []
apps = asyncio.run(list_okta_apps()) or []

#remove inactive apps
new_apps = []
for row in apps:
    if row.status == 'INACTIVE':
        print(f'Skipping {row.label} from list of apps')
    else:
        new_apps.append(row)

#add as_of_date
new_coll = []
for app in new_apps:
    updated_row = app.__dict__.copy()
    updated_row['as_of_date'] = str(today)
    new_coll.append(updated_row)

new_apps = new_coll&lt;/LI-CODE&gt;
&lt;P&gt;&lt;FONT size="5"&gt;Get Some Coffee&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Depending on your environment, this next step can take a while. In my environment, it takes roughly 3 hours to complete, due to the the number of users and apps configured. Because this is potentially going to take a while, I recommend allowing this step to process and going to do other things. Maybe revisit one of the previous posts in this series and give me some pointers on how I can improve! &lt;span class="lia-unicode-emoji" title=":grinning_face_with_smiling_eyes:"&gt;😄&lt;/span&gt;&amp;nbsp;&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;app_group_users = [] #collection of all apps and app users
count = 1
for item in new_apps:
    print(f'Processing app {count} of {len(new_apps)}: {item["label"]}')
    current_users = asyncio.run(list_app_assignments([item]))
    app_group_users += current_users
    count = count + 1

#add datestamp
new_coll = []
for row in app_group_users:
    # Create a copy of the dictionary to avoid modifying the original
    updated_row = row.copy()
    # Add the new key-value pair
    updated_row['as_of_date'] = str(today)
    # Append the updated dictionary to the list
    new_coll.append(updated_row)
app_groups = new_coll&lt;/LI-CODE&gt;
&lt;P&gt;&lt;FONT size="5"&gt;Define the Schema and Write to Tables&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;Convert the list of dictionaries to a Dataframe and write to our bronze and silver layers. For silver, I also changed&amp;nbsp;&lt;STRONG&gt;id&lt;/STRONG&gt; to&amp;nbsp;&lt;STRONG&gt;user_id&lt;/STRONG&gt;, because I thought that&amp;nbsp;&lt;STRONG&gt;id&lt;/STRONG&gt; just seemed too vague. I feel like this will be helpful later on when we're writing some SQL and making some dashboard.&lt;/P&gt;
&lt;LI-CODE lang="python"&gt;import json
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType, ArrayType

# Create a SparkSession
spark = SparkSession.builder.appName("OktaAppUsers").getOrCreate()

# Define the schema
schema = StructType([
  StructField("app_id", StringType(), True),
  StructField("app_label", StringType(), True),
  StructField("app_user_scope", StringType(), True),
  StructField("group_name", StringType(), True),
  StructField("id", StringType(), True),
  StructField("as_of_date", StringType(), True)
])

df = spark.createDataFrame(app_groups, schema)
df_formatted = df.select("app_id", "app_label", col("id").alias("user_id"), "app_user_scope", "group_name", "as_of_date")

#write to tables
df.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_app_assignments", mode="append") 
df_formatted.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_app_assignments_formatted", mode="append") &lt;/LI-CODE&gt;
&lt;P&gt;&lt;FONT size="5"&gt;Another One Down!&lt;/FONT&gt;&lt;/P&gt;
&lt;P&gt;If you've come this far, then you have successfully written your &lt;A href="https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-1-users/ba-p/116122" target="_self"&gt;Users&lt;/A&gt;, &lt;A href="https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-2-groups-and-group-rules/ba-p/116260" target="_self"&gt;Groups&lt;/A&gt;, &lt;A href="https://community.databricks.com/t5/blogs/blogworkflowpage/blog-id/technical-blog/article-id/536" target="_self"&gt;Group Members&lt;/A&gt;, and&amp;nbsp;&lt;A href="https://community.databricks.com/t5/blogs/blogworkflowpage/blog-id/technical-blog/article-id/670" target="_self"&gt;Apps&lt;/A&gt; tables. We also have relationships between all of the tables in some form! Thanks for sticking around this far!&lt;/P&gt;
&lt;P&gt;Let's go play with some data, shall we?&lt;/P&gt;</description>
    <pubDate>Wed, 09 Jul 2025 16:19:51 GMT</pubDate>
    <dc:creator>jack_zaldivar</dc:creator>
    <dc:date>2025-07-09T16:19:51Z</dc:date>
    <item>
      <title>Databricks for Identity Systems - Part 5 (Application Assignments)</title>
      <link>https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-5-application-assignments/ba-p/123185</link>
      <description>&lt;P&gt;Collect app assignments for each application in your environment and whether users aer assigned by group or individually&lt;/P&gt;</description>
      <pubDate>Wed, 09 Jul 2025 16:19:51 GMT</pubDate>
      <guid>https://community.databricks.com/t5/technical-blog/databricks-for-identity-systems-part-5-application-assignments/ba-p/123185</guid>
      <dc:creator>jack_zaldivar</dc:creator>
      <dc:date>2025-07-09T16:19:51Z</dc:date>
    </item>
  </channel>
</rss>

