- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-14-2024 04:03 AM
while try to read a csv file using data frame , read csv using a file format , but fail in case of formatting and column error while loading how the data in databricks ,the code i used for
this is actually data format
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-20-2024 11:30 PM - edited 11-20-2024 11:31 PM
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-14-2024 05:12 AM
@JissMathew What is the error that you are getting when trying to load?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-15-2024 02:03 AM
@MuthuLakshmi actually, In "adreess" column we need "kochi", and column miss match and get into "name" column , that is the error
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-15-2024 10:01 AM
Hi @JissMathew ,
Could you also provide sample csv file?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-17-2024 10:06 PM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-18-2024 03:12 AM
Hey, what's the schema you're referencing? The dates are very inconsistent and unlikely to be loaded in as anything useful. It also looks like the delimiter of a comma is causing you issues as it's also within the body of the text without quotes each time. If this is a csv you want to use for a one off instance, you could export it to a tab delimited file (or other delimiter of your choice) and that should go some way to fixing the issue.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-18-2024 03:33 AM
hey @holly
actually this .option("quote", '"') option in code should have to fix the error but its not working !, is there any standard file format for csv files ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-18-2024 08:16 AM
As the "kochi" is in new line, that is causing the issue. Ideally, I would suggest to avoid generating a csv file that has line breaks in a column data. But if you want to handle this scenario, you probably need to put exclusive quotes in your file for each column data so that the line break in a column data are not identified as new row.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-19-2024 12:43 AM
if there is a option for handle this scenario using a file format for this ? or we have to manually edit in our source file ?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-19-2024 01:16 AM
test
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-19-2024 05:35 AM
@gilt test ????
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-20-2024 11:30 PM - edited 11-20-2024 11:31 PM
You can try add multiline option:
df = (
spark.read.format("csv")
.option("header", "true")
.option("quote", '"')
.option("delimiter", ",")
.option("nullValue", "")
.option("emptyValue", "NULL")
.option("multiline", True)
.schema(schema)
.load(f"{bronze_folder_path}/Test.csv"
)
https://spark.apache.org/docs/3.5.1/sql-data-sources-csv.html
I also encourage you to use the syntax
df = (
spark.read
.some_transformation
)
rather than
df=spark.read \
.some_transformation \
it improves readability and allows you to comment out selected lines
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2024 02:14 AM
@Mike_Szklarczyk Thank you! The issue has been successfully resolved. I sincerely appreciate your guidance and support throughout this process. Your assistance was invaluable. 😊
![](/skins/images/B38AF44D4BD6CE643D2A527BE673CCF6/responsive_peak/images/icon_anonymous_message.png)
![](/skins/images/B38AF44D4BD6CE643D2A527BE673CCF6/responsive_peak/images/icon_anonymous_message.png)