cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How to load CSV file as a DataFrame in Spark?

Kaniz
Community Manager
Community Manager
 
2 REPLIES 2

jose_gonzalez
Moderator
Moderator

Hi,

You can use the following examples:

%scala

val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true")

.load("path_to_file_name.csv")

%python

df = spark.read.format('csv').options(header='true', inferSchema='true').load('path_to_file_name.csv')

For more examples, please check our sample notebook from here https://docs.databricks.com/data/data-sources/read-csv.html

SreedharVengala
New Contributor III

you can use code provided by Jose in %python by just removing val

If you know the schema, it is better to avoid schema inference and pass it to 

DataFrameReader. Exxample if you have three columns - integer, double and string:

from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, IntegerType, StringType
 
schema = StructType([
    StructField("A", IntegerType()),
    StructField("B", DoubleType()),
    StructField("C", StringType())
])
 
(
    sqlContext
    .read
    .format("com.databricks.spark.csv")
    .schema(schema)
    .option("header", "true")
    .option("mode", "DROPMALFORMED")
    .load("some_input_file.csv")
)

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.