Now that we have our Users, Groups, and Group Members exported to Databricks; it's time us to grab the applications that these users and groups are assigned to. Much like the previous walkthroughs, we will focus on how I created my Notebooks to query the Okta APIs for the data that I am interested in.
You may have noticed that I target a specific object type per blog entry. This is actually how I have my Notebooks set up. Each object type is it's own Notebook, so that I can schedule them independently or as a number of multi-task jobs. This is important because of 2 things:
And with that...
Let's begin the operation!
As with our previous Notebook, let's make sure we have the modules/libraries installed for this Notebook and restart our kernel if we had to install any of the modules.
import importlib.util
import sys
mods = ['nest_asyncio', 'okta']
restart_required = False
for mod in mods:
spec = importlib.util.find_spec(str(mod))
if spec is not None:
print(f'{mod} already installed')
else:
%pip install {mod}
restart_required=True
if restart_required==True:
dbutils.library.restartPython()
Shhh...It's a Secret!
I will continue placing a section on each post to remind you about secret management, but will not go into any more detail than already described previously. Just...please don't put your secrets in the Notebooks themselves. Make sure you're using Databricks Secret Management instead.
Let's Get Cooking!
Back to our normal cadence, we'll use the Okta SDK again to to collect all of the Applications and Groups assigned to those applications. We prefer to use group assignment for application access, as it's easier to manage. If you don't do this, you may need to change the script to collect assigned individuals instead.
import okta
import nest_asyncio
import asyncio
from okta.client import Client as OktaClient
config = {
'orgUrl': 'https://my-okta-org.okta.com',
'token': okta_key
}
okta_client = OktaClient(config)
async def get_app_groups(apps):
app_data = []
for app in apps:
#print(f'checking {app}')
app_groups, resp, err = await okta_client.list_application_group_assignments(app.id)
while True:
for group in app_groups:
app_data.append({"app_id": app.id, "group_id": group.id})
if resp.has_next():
app_groups, err = await resp.next()
else:
break
return app_data
async def list_okta_apps():
app_list = []
#attempting to filter for only ACTIVE apps, but that didn't work...if anyone knows of a better way to do this, please comment and let me know!
apps, resp, err = await okta_client.list_applications(query_params={'status': 'ACTIVE'})
#write the count to the scren so we know there's some progress while the script runs
print(f'Got {apps.count}')
while True:
for app in apps:
app_list.append(app)
if resp.has_next():
apps, err = await resp.next()
else:
break
return app_list
if __name__ == '__main__':
nest_asyncio.apply()
#collect all of the apps
apps = asyncio.run(list_okta_apps())
#since the URL filter didn't work, we'll just remove the INACTIVE apps by looping through all apps and removing any that aren't ACTIVE
#remove the INACTIVE apps
active_apps = []
for a in apps:
if a.status == 'ACTIVE':
active_apps.append(a)
#get groups assigned to ACTIVE apps
app_groups = asyncio.run(get_app_groups(active_apps))
Add your as_of_date!
As previously mentioned, this step is optional, but I like to add today's date to the data. This way, we can always see a snapshot of what the environment looked like on any given day.
from datetime import date
new_coll = []
today = date.today()
for one in app_groups:
# Create a copy of the dictionary to avoid modifying the original
updated_one = one.copy()
# Add the new key-value pair
updated_one['as_of_date'] = str(today)
# Append the updated dictionary to the list
new_coll.append(updated_one)
app_groups = new_coll
Define the Schema for App to Group Mapping
Another simple schema for app groups. We really only need 2 pieces of information which will tie the Groups table (Part 2 of the series) and the Apps table (below) together.
import json
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType, ArrayType
# Create a SparkSession
spark = SparkSession.builder.appName("OktaAppGroups").getOrCreate()
# Define the schema
schema = StructType([
StructField("app_id", StringType(), True),
StructField("group_id", StringType(), True),
StructField("as_of_date", StringType(), True)
])
df = spark.createDataFrame(app_groups, schema)
df_formatted = df.select("app_id", "group_id", "as_of_date")
Write to the Table
Since there isn't really much transformation needed here, we'll just write to one table. However, to keep naming consistent, we'll still use the _formatted table name.
df_formatted.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_app_groups_formatted", mode="append")
Two for One
Since we had to get the Application list in order to get the App Groups assigned to those applications, we should take advantage of that! We have the App list in memory as active_apps. Let's add our as_of_date to that collection next. While running this next cell, I also ran into some issues where an app may not have a defined accessPolicy. Because of this, I just added a little check for each app and created a new key in the dictionary for to handle null accessPolicies.
from datetime import date
new_coll = []
today = date.today()
for one in apps:
# Create a copy of the dictionary to avoid modifying the original
updated_one = one.__dict__.copy()
# Add the new key-value pair
if 'accessPolicy' in one.links:
#get attached policy by splitting the href and taking the last element
tmp_pol = str(one.links['accessPolicy']['href'])
split_pol = str(tmp_pol).split('/')
updated_one['access_policy'] = split_pol[len(split_pol)-1]
else:
updated_one['access_policy'] = ''
updated_one['as_of_date'] = str(today)
# Append the updated dictionary to the list
new_coll.append(updated_one)
apps = new_coll
Define our App Schema
import json
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType, ArrayType
# Create a SparkSession
spark = SparkSession.builder.appName("OktaGroups").getOrCreate()
# Define the schema
schema = StructType([
StructField("id", StringType(), True),
StructField("name", StringType(), True),
StructField("label", StringType(), True),
StructField("status", StringType(), True),
StructField("last_updated", StringType(), True),
StructField("created", StringType(), True),
StructField("accessibility", StructType([
StructField("self_service", StringType(), True),
StructField("error_redirect_url", StringType(), True),
StructField("login_redirect_url", StringType(), True),
]), True),
StructField("visibility", StructType([
StructField("auto_submit_toolbar", StringType(), True),
StructField("hide", StructType([
StructField("i_os", BooleanType(), True),
StructField("web", BooleanType(), True),
])),
])),
StructField("features", ArrayType(StringType()), True),
StructField("sign_on_mode", StringType(), True),
StructField("access_policy", StringType(), True),
StructField("as_of_date", StringType(), True)
])
df_apps = spark.createDataFrame(apps, schema)
df_apps_formatted = df_apps.select("id", "name", "label", "status", "last_updated", "created", "accessibility.*", "visibility.auto_submit_toolbar", "visibility.hide.*", "features", "sign_on_mode", "access_policy", "as_of_date")
Write to the Apps Table
df_apps.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_apps", mode="append")
df_apps_formatted.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_prod_apps_formatted", mode="append")
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.