topic Re: read the csv file as shown in description in Data Engineering

read the csv file as shown in description

sannycse — Wed, 30 Mar 2022 18:54:53 GMT

Project_Details.csv

ProjectNo|ProjectName|EmployeeNo

100|analytics|1

100|analytics|2

101|machine learning|3

101|machine learning|1

101|machine learning|4

Find each employee in the form of list working on each project?

Output:

ProjectNo|employeeNo

100|[1,2]

101|[3,1,4]

Re: read the csv file as shown in description

garren_staubli — Thu, 31 Mar 2022 20:06:53 GMT

from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

Re: read the csv file as shown in description

sannycse — Sat, 02 Apr 2022 16:53:40 GMT

I tried but that was created in pyspark and i'm unable to crack that code into spark Sql

Re: read the csv file as shown in description

merca — Sat, 02 Apr 2022 17:11:26 GMT

@SANJEEV BANDRU , You can persist the data frame in temp view by adding following in the python:

df.createOrReplaceTempView("employees_csv")

then you can select:

select projectNo, collect_list(EmployeeNo)
from employees_csv
group by projectNo

Re: read the csv file as shown in description

User16764241763 — Wed, 13 Apr 2022 15:56:47 GMT

@SANJEEV BANDRU You can simply do this

Just change the file path

CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS (

path "dbfs:/docs/test.csv",

header "true",

delimiter "|",

mode "FAILFAST"

);

select

ProjectNo,

collect_list(EmployeeNo) Employees

from

readcsv

group by

projectNo