03-30-2022 11:54 AM
Project_Details.csv
ProjectNo|ProjectName|EmployeeNo
100|analytics|1
100|analytics|2
101|machine learning|3
101|machine learning|1
101|machine learning|4
Find each employee in the form of list working on each project?
Output:
ProjectNo|employeeNo
100|[1,2]
101|[3,1,4]
03-31-2022 01:06 PM
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))
03-31-2022 01:06 PM
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))
03-31-2022 11:10 PM
Hi @SANJEEV BANDRU , Did you get a chance to try the code provided by @Garren Staubli ?
04-02-2022 09:53 AM
I tried but that was created in pyspark and i'm unable to crack that code into spark Sql
04-02-2022 10:11 AM
@SANJEEV BANDRU , You can persist the data frame in temp view by adding following in the python:
df.createOrReplaceTempView("employees_csv")
then you can select:
select projectNo, collect_list(EmployeeNo)
from employees_csv
group by projectNo
04-13-2022 08:56 AM
@SANJEEV BANDRU You can simply do this
Just change the file path
CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS (
path "dbfs:/docs/test.csv",
header "true",
delimiter "|",
mode "FAILFAST"
);
select
ProjectNo,
collect_list(EmployeeNo) Employees
from
readcsv
group by
projectNo
04-26-2022 09:31 AM
Hi @SANJEEV BANDRU , Just a friendly follow-up. Do you still need help? Please let us know.