cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Load csv file as a dataframe?

Kaniz
Community Manager
Community Manager
 
3 REPLIES 3

Kaniz
Community Manager
Community Manager

spark-csv is part of core Spark functionality and doesn't require a separate library.

df = spark.read.format("csv").option("header", "true").load("file.csv")

Kaniz
Community Manager
Community Manager

In scala,(this works for any format-in delimiter mention "," for csv, "\t" for tsv etc)

val df = sqlContext.read.format("com.databricks.spark.csv")

.option("delimiter", ",")

.load("csvfile.csv")

Hubert-Dudek
Esteemed Contributor III

as @Kaniz Fatmaโ€‹ wrote you can use native functions for it:

df = spark.read.format("csv").option("header", "true").load("file.csv")

Alternative really nice way is to use sql syntax for that:

%sql
CREATE TEMPORARY VIEW diamonds
USING CSV
OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true", mode "FAILFAST")

Here is spark documentation:

https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.csv.html?h...

and databricks documentation:

https://docs.databricks.com/data/data-sources/read-csv.html

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.