2 weeks ago
while try to read a csv file using data frame , read csv using a file format , but fail in case of formatting and column error while loading the code i used for
Wednesday - last edited Wednesday
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
2 weeks ago
@JissMathew What is the error that you are getting when trying to load?
2 weeks ago
@MuthuLakshmi actually, In "adreess" column we need "kochi", and column miss match and get into "name" column , that is the error
2 weeks ago
Hi @JissMathew ,
Could you also provide sample csv file?
a week ago
a week ago
Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.
a week ago
hey @holly
actually this .option("quote", '"') option in code should have to fix the error but its not working !, is there any standard file format for csv files ?
a week ago
As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row.
a week ago
if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?
a week ago
test
a week ago
@gilt test ????
Wednesday - last edited Wednesday
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
Thursday
@Mike_Szklarczyk Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. 😊
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group