โ11-14-2024 04:03 AM
while try to read a csv file using data frame , read csv using a file format , but fail in case of formatting and column error while loading the code i used for
4 weeks ago - last edited 4 weeks ago
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
โ11-14-2024 05:12 AM
@JissMathew What is the error that you are getting when trying to load?
โ11-15-2024 02:03 AM
@MuthuLakshmi actually, In "adreess" column we need "kochi", and column miss match and get into "name" column , that is the error
โ11-15-2024 10:01 AM
Hi @JissMathew ,
Could you also provide sample csv file?
a month ago
a month ago
Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.
a month ago
hey @holly
actually this .option("quote", '"') option in code should have to fix the error but its not working !, is there any standard file format for csv files ?
a month ago
As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row.
a month ago
if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?
a month ago
test
a month ago
@gilt test ????
4 weeks ago - last edited 4 weeks ago
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
4 weeks ago
@Mike_Szklarczyk Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. ๐
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group