Databricks

Kaniz · ‎09-22-2021

jose_gonzalez · ‎09-23-2021

Hi,

You can use the following examples:

%scala

val df = spark.read.format("csv").option("header", "true").option("inferSchema", "true")

.load("path_to_file_name.csv")

%python

df = spark.read.format('csv').options(header='true', inferSchema='true').load('path_to_file_name.csv')

For more examples, please check our sample notebook from here https://docs.databricks.com/data/data-sources/read-csv.html

SreedharVengala · ‎09-24-2021

you can use code provided by Jose in %python by just removing val

If you know the schema, it is better to avoid schema inference and pass it to

DataFrameReader. Exxample if you have three columns - integer, double and string:

from pyspark.sql.types import StructType, StructField
from pyspark.sql.types import DoubleType, IntegerType, StringType
 
schema = StructType([
    StructField("A", IntegerType()),
    StructField("B", DoubleType()),
    StructField("C", StringType())
])
 
(
    sqlContext
    .read
    .format("com.databricks.spark.csv")
    .schema(schema)
    .option("header", "true")
    .option("mode", "DROPMALFORMED")
    .load("some_input_file.csv")
)

Databricks

How to load CSV file as a DataFrame in Spark?

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI