Welcome back!
In Part 1 of this series, we walked through the process of exporting our Okta Users to Databricks. In Part 2 of the series, we exported our Okta Groups and Group Rules. In this installment, we'll collect our Group Members, so we can start tying these tables together!
Notebook Setup
Okta, so...admittedly, the SDK is working out ok (except for the pickling issue in the last Notebook). So, I guess I'm going to try to keep it going. Let's make sure we have the modules/libraries installed for this Notebook and restart our kernel if we had to install any of the modules.
import importlib.util
import sys
mods = ['nest_asyncio', 'okta']
restart_required = False
for mod in mods:
spec = importlib.util.find_spec(str(mod))
if spec is not None:
print(f'{mod} already installed')
else:
%pip install {mod}
restart_required=True
if restart_required==True:
dbutils.library.restartPython()
Don't Forget Your Secret!
I'm not going into detail with the secret management anymore, but don't forget that you'll need to retrieve it and decode is appropriately (see Part 1 for details).
Get You Groups and Members!
In order to get all group memberships, we need to first get all the groups, right? I mean, it makes sense to me, at least. So, let's do that step first, then iterate through all of the groups to collect the group memberships.
#%pip install okta
import okta
import nest_asyncio
import asyncio
from okta.client import Client as OktaClient
config = {
'orgUrl': 'https://my-okta-org.okta.com',
'token': okta_key
}
okta_client = OktaClient(config)
async def list_okta_groups():
group_list = []
groups, resp, err = await okta_client.list_groups()
while True:
for group in groups:
group_list.append(group)
if resp.has_next():
groups, err = await resp.next()
else:
break
return group_list
async def get_all_group_memberships(groups):
group_data = []
#first get all the groups
for group in groups:
print(f'checking {group.profile.name}: {group.id}')
member_list = []
members, resp, err = await okta_client.list_group_users(groupId=group.id)
while True:
for member in members:
group_data.append({"group_id":group.id, "user_id":member.id, "user_login":member.profile.login})
if resp.has_next():
members, err = await resp.next()
else:
break
return group_data
if __name__ == '__main__':
nest_asyncio.apply()
groups = asyncio.run(list_okta_groups()) # get all groups
members = asyncio.run(get_all_group_memberships(groups)) # for each group, let's get the members
Add your as_of_date!
This step is optional, of course, but I like to add today's date to the data, so we can always see a snapshot of what the environment looked like on any given day.
from datetime import date
new_coll = []
today = date.today()
for one in members:
# Create a copy of the dictionary to avoid modifying the original
updated_one = one.copy()
# Add the new key-value pair
updated_one['as_of_date'] = str(today)
# Append the updated dictionary to the list
new_coll.append(updated_one)
members = new_coll
Define the Schema
This is probably one of the simplest schemas in the series. There really isn't much nested information to extract from this JSON object. In this instance, our bronze and silver layers are basically the same. I think I only kept it as both tables for consistency. /shrug
import json
from pyspark.sql import SparkSession
from pyspark.sql.functions import col
from pyspark.sql.types import StructType, StructField, StringType, IntegerType, BooleanType, ArrayType
# Create a SparkSession
spark = SparkSession.builder.appName("OktaGroupMembers").getOrCreate()
# Define the schema
schema = StructType([
StructField("group_id", StringType(), True),
StructField("user_id", StringType(), True),
StructField("user_login", StringType(), True),
StructField("as_of_date", StringType(), True)
])
df = spark.createDataFrame(members, schema)
df_formatted = df.select("group_id", "user_id", "user_login", "as_of_date")
Write to the table
As always, our last step is to write our dataframe to a table.
df.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_group_members", mode="append")
df_formatted.write.option("mergeSchema", "true").saveAsTable("users.jack_zaldivar.okta_group_members_formatted", mode="append")
Well done!
You've made it to the end of the next installment and now you've got your Users, Groups, Group Rules, and Group Members all imported to Databricks! Don't forget to create a Schedule so that these Notebooks will all run daily. This will give you a daily snapshot of your environment.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.