cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Comparing 2 dataframes and create columns from values within a dataframe

lmcglone
New Contributor II

Hi,

I have a dataframe that has name and company

from pyspark.sql import SparkSession

spark = SparkSession.builder.appName('SparkByExamples.com').getOrCreate()

columns = ["company","name"]

data = [("company1", "Jon"), ("company2", "Steve"), ("company1", "Kim"), ("company3", "Sam"), ("company4", "Jim"), ("company4", "Tony"), ("company5", "Stan"),

]

df = spark.createDataFrame(data=data, schema = columns).show()

Then I have another dataframe that has the company names

columns2 = ["job_comany","num"]

data2 = [("company1",1), ("company2",2), ("company3",3), ("company4",4), ("company5",5),]

df2 = spark.createDataFrame(data=data2, schema = columns2).show()

What I would like to do is use the company names dataframe to search the dataframe with the person names and identify the companies associated with the people and create a dataframe with the company names as columns with a 0 or 1 with if the person is with that company. Here is a picture of what I would like to see as my final dataframe.

image

2 REPLIES 2

Hubert-Dudek
Esteemed Contributor III

You need to join and pivot

df
.join(df2, on=[df.company == df2.job_company]))
.groupBy("company", "name")
.pivot("job_company")
.count()

image.png

Thanks....that is perfect. 😀

Another question to take it a step further. From this code how would I alter the name of the produced column name. In your example there is company1, company2, etc. Is it possible to change those names to company1_a, company2_a, etc.?

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.