a week ago
while try to read a csv file using data frame , read csv using a file format , but fail in case of formatting and column error while loading the code i used for
Wednesday - last edited Wednesday
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
a week ago
@JissMathew What is the error that you are getting when trying to load?
a week ago
@MuthuLakshmi actually, In "adreess" column we need "kochi", and column miss match and get into "name" column , that is the error
a week ago
Hi @JissMathew ,
Could you also provide sample csv file?
Sunday
Monday
Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.
Monday
hey @holly
actually this .option("quote", '"') option in code should have to fix the error but its not working !, is there any standard file format for csv files ?
Monday
As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row.
Tuesday
if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?
Tuesday
test
Tuesday
@gilt test ????
Wednesday - last edited Wednesday
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
Thursday
@Mike_Szklarczyk Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. 😊
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group