cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

read the csv file as shown in description

sannycse
New Contributor II

Project_Details.csv

ProjectNo|ProjectName|EmployeeNo

100|analytics|1

100|analytics|2

101|machine learning|3

101|machine learning|1

101|machine learning|4

Find each employee in the form of list working on each project?

Output:

ProjectNo|employeeNo

100|[1,2]

101|[3,1,4]

1 ACCEPTED SOLUTION

Accepted Solutions

garren_staubli
New Contributor III
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

View solution in original post

4 REPLIES 4

garren_staubli
New Contributor III
from pyspark.sql import functions as F
df = spark.read.option("sep", "|").option("header", "true").csv("/tmp/file.csv")
display(df.groupBy("projectNo").agg(F.expr("collect_list(EmployeeNo)").alias("employees")))

I tried but that was created in pyspark and i'm unable to crack that code into spark Sql

merca
Valued Contributor II

@SANJEEV BANDRU​ , You can persist the data frame in temp view by adding following in the python:

df.createOrReplaceTempView("employees_csv")

then you can select:

select projectNo, collect_list(EmployeeNo)
from employees_csv
group by projectNo

User16764241763
Honored Contributor

@SANJEEV BANDRU​  You can simply do this

Just change the file path

CREATE TEMPORARY VIEW readcsv USING CSV OPTIONS (

 path "dbfs:/docs/test.csv",

 header "true",

 delimiter "|",

 mode "FAILFAST"

);

select

 ProjectNo,

 collect_list(EmployeeNo) Employees

from

 readcsv

group by

 projectNo

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group