โ06-18-2021 07:22 AM
โ06-18-2021 07:26 AM
spark-csv is part of core Spark functionality and doesn't require a separate library.
df = spark.read.format("csv").option("header", "true").load("file.csv")
โ06-18-2021 07:31 AM
In scala,(this works for any format-in delimiter mention "," for csv, "\t" for tsv etc)
val df = sqlContext.read.format("com.databricks.spark.csv")
.option("delimiter", ",")
.load("csvfile.csv")
โ11-18-2021 03:42 AM
as @Kaniz Fatmaโ wrote you can use native functions for it:
Alternative really nice way is to use sql syntax for that:
%sql CREATE TEMPORARY VIEW diamonds USING CSV OPTIONS (path "/databricks-datasets/Rdatasets/data-001/csv/ggplot2/diamonds.csv", header "true", mode "FAILFAST")
Here is spark documentation:
https://spark.apache.org/docs/latest/api/python/reference/api/pyspark.sql.DataFrameReader.csv.html?h...
and databricks documentation:
https://docs.databricks.com/data/data-sources/read-csv.html
never-displayed
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.