cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

read the csv file as shown in description

sannycse
New Contributor II

Project_Details.csv

ProjectNo|ProjectName|EmployeeNo

100|analytics|1

100|analytics|2

101|machine learning|3

101|machine learning|1

101|machine learning|4

Find each employee in the form of list working on each project?

Output:

ProjectNo|employeeNo

100|[1,2]

101|[3,1,4]

1 ACCEPTED SOLUTION

Accepted Solutions

garren_staubli
New Contributor III
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

View solution in original post

6 REPLIES 6

garren_staubli
New Contributor III
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

Hi @SANJEEV BANDRU​ , Did you get a chance to try the code provided by @Garren Staubli​ ?

sannycse
New Contributor II

I tried but that was created in pyspark and i'm unable to crack that code into spark Sql

merca
Valued Contributor II

@SANJEEV BANDRU​ , You can persist the data frame in temp view by adding following in the python:

df.createOrReplaceTempView("employees_csv")

then you can select:

select projectNo, collect_list(EmployeeNo)
from employees_csv
group by projectNo

User16764241763
Honored Contributor

@SANJEEV BANDRU​  You can simply do this

Just change the file path

CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS (

 path "dbfs:/docs/test.csv",

 header "true",

 delimiter "|",

 mode "FAILFAST"

);

select

 ProjectNo,

 collect_list(EmployeeNo) Employees

from

 readcsv

group by

 projectNo

Kaniz
Community Manager
Community Manager

Hi @SANJEEV BANDRU​ , Just a friendly follow-up. Do you still need help? Please let us know.