Databricks Community

sannycse · ‎03-30-2022

Project_Details.csv

ProjectNo|ProjectName|EmployeeNo

100|analytics|1

100|analytics|2

101|machine learning|3

101|machine learning|1

101|machine learning|4

Find each employee in the form of list working on each project?

Output:

ProjectNo|employeeNo

100|[1,2]

101|[3,1,4]

garren_staubli · ‎03-31-2022

from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

View solution in original post

garren_staubli · ‎03-31-2022

from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

Kaniz · ‎03-31-2022

Hi @SANJEEV BANDRU , Did you get a chance to try the code provided by @Garren Staubli ?

sannycse · ‎04-02-2022

I tried but that was created in pyspark and i'm unable to crack that code into spark Sql

merca · ‎04-02-2022

@SANJEEV BANDRU , You can persist the data frame in temp view by adding following in the python:

df.createOrReplaceTempView("employees_csv")

then you can select:

select projectNo, collect_list(EmployeeNo)
from employees_csv
group by projectNo

User16764241763 · ‎04-13-2022

@SANJEEV BANDRU You can simply do this

Just change the file path

CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS (

path "dbfs:/docs/test.csv",

header "true",

delimiter "|",

mode "FAILFAST"

);

select

ProjectNo,

collect_list(EmployeeNo) Employees

from

readcsv

group by

projectNo

Kaniz · ‎04-26-2022

Hi @SANJEEV BANDRU , Just a friendly follow-up. Do you still need help? Please let us know.

Databricks Community

read the csv file as shown in description

Get Certified at Data & AI Summit and Earn this Exclusive Databricks Jacket

Supercharge Your Code Generation

Registration now open! Databricks Data + AI Summit 2024

Announcing General Availability of Liquid Clustering

Introducing the Databricks AI Fund