11-14-2024 04:03 AM
while try to read a csv file using data frame , read csv using a file format , but fail in case of formatting and column error while loading the code i used for
11-20-2024 11:30 PM - edited 11-20-2024 11:31 PM
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
11-14-2024 05:12 AM
@JissMathew What is the error that you are getting when trying to load?
11-15-2024 02:03 AM
@MuthuLakshmi actually, In "adreess" column we need "kochi", and column miss match and get into "name" column , that is the error
11-15-2024 10:01 AM
Hi @JissMathew ,
Could you also provide sample csv file?
11-17-2024 10:06 PM
11-18-2024 03:12 AM
Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.
11-18-2024 03:33 AM
hey @holly
actually this .option("quote", '"') option in code should have to fix the error but its not working !, is there any standard file format for csv files ?
11-18-2024 08:16 AM
As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row.
11-19-2024 12:43 AM
if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?
11-19-2024 01:16 AM
test
11-19-2024 05:35 AM
@gilt test ????
11-20-2024 11:30 PM - edited 11-20-2024 11:31 PM
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
11-21-2024 02:14 AM
@Mike_Szklarczyk Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. 😊
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group