cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Reading a csv file

JissMathew
New Contributor II

while try to read a csv file using data frame , read csv using a  file format , but fail in case of formatting and column error while loading how the data in databricks ,how the data in databricks ,the code i used for

df = spark.read.format("csv") \
    .option("header", "true") \
    .option("quote", '"') \
    .option("delimiter", ",") \
    .option("nullValue", "") \
    .option("emptyValue", "NULL") \
    .schema(schema) \
    .load(f"{bronze_folder_path}/Test.csv")

this is actually data formatthis is actually data format

1 ACCEPTED SOLUTION

Accepted Solutions

Mike_Szklarczyk
New Contributor III

You can try add multiline option:

 

df = (
	spark.read.format("csv")
		.option("header", "true")
		.option("quote", '"')
		.option("delimiter", ",")
		.option("nullValue", "")
		.option("emptyValue", "NULL")
		.option("multiline", True)
		.schema(schema)
		.load(f"{bronze_folder_path}/Test.csv"
)

 

https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html

I also encourage you to use the syntax

 

df = (
  spark.read
  .some_transformation
) 
rather than 

df=spark.read \
  .some_transformation \ 

 

it improves readability and allows you to comment out selected lines

View solution in original post

12 REPLIES 12

MuthuLakshmi
Databricks Employee
Databricks Employee

@JissMathew What is the error that you are getting when trying to load?

JissMathew
New Contributor II

@MuthuLakshmi  actually, In "adreess" column  we need  "kochi", and column miss match and get into "name" column , that is the error  Screenshot 2024-11-14 172800.png

Hi @JissMathew ,

Could you also provide sample csv file?

Hi @szymon_dybczak  have only option to send in png, jpg formats 

 

holly
Databricks Employee
Databricks Employee

Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue. 

JissMathew
New Contributor II

hey @holly 

actually this  .option("quote"'"')  option in code should have to fix the error but its not working !, is there any standard file format for csv files ?

Lakshay
Databricks Employee
Databricks Employee

As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row. 

JissMathew
New Contributor II

if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?

gilt
New Contributor III

test

JissMathew
New Contributor II

@gilt test ????

Mike_Szklarczyk
New Contributor III

You can try add multiline option:

 

df = (
	spark.read.format("csv")
		.option("header", "true")
		.option("quote", '"')
		.option("delimiter", ",")
		.option("nullValue", "")
		.option("emptyValue", "NULL")
		.option("multiline", True)
		.schema(schema)
		.load(f"{bronze_folder_path}/Test.csv"
)

 

https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html

I also encourage you to use the syntax

 

df = (
  spark.read
  .some_transformation
) 
rather than 

df=spark.read \
  .some_transformation \ 

 

it improves readability and allows you to comment out selected lines

@Mike_Szklarczyk  Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. 😊

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group