Hi,
I have a dataframe that has name and company
from pyspark.sql import SparkSession
spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()
columns = ["company","name"]
data = [("company1", "Jon"), ("company2", "Steve"), ("company1", "Kim"), ("company3", "Sam"), ("company4", "Jim"), ("company4", "Tony"), ("company5", "Stan"),
]
df = spark.createDataFrame(data=data, schema = columns).show()
Then I have another dataframe that has the company names
columns2 = ["job_comany","num"]
data2 = [("company1",1), ("company2",2), ("company3",3), ("company4",4), ("company5",5),]
df2 = spark.createDataFrame(data=data2, schema = columns2).show()
What I would like to do is use the company names dataframe to search the dataframe with the person names and identify the companies associated with the people and create a dataframe with the company names as columns with a 0 or 1 with if the person is with that company. Here is a picture of what I would like to see as my final dataframe.