- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-30-2022 11:54 AM
Project_Details.csv
ProjectNo|ProjectName|EmployeeNo
100|analytics|1
100|analytics|2
101|machine learning|3
101|machine learning|1
101|machine learning|4
Find each employee in the form of list working on each project?
Output:
ProjectNo|employeeNo
100|[1,2]
101|[3,1,4]
- Labels:
-
Machine Learning
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-31-2022 01:06 PM
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-31-2022 01:06 PM
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2022 09:53 AM
I tried but that was created in pyspark and i'm unable to crack that code into spark Sql
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-02-2022 10:11 AM
@SANJEEV BANDRU , You can persist the data frame in temp view by adding following in the python:
df.createOrReplaceTempView("employees_csv")
then you can select:
select projectNo, collect_list(EmployeeNo)
from employees_csv
group by projectNo
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
04-13-2022 08:56 AM
@SANJEEV BANDRU You can simply do this
Just change the file path
CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS (
path "dbfs:/docs/test.csv",
header "true",
delimiter "|",
mode "FAILFAST"
);
select
ProjectNo,
collect_list(EmployeeNo) Employees
from
readcsv
group by
projectNo

