- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2024 10:02 PM
Hello, I am trying to use from_csv in the sql warehouse, but I am getting unexpected results:
As a small example I am running:
WITH your_table AS (
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
)
SELECT from_csv(csv_column, 'a INT, b STRING, c DOUBLE', map('header', 'true','multiLine','true')) AS parsed_csv
FROM your_table
which is returning one row with the object:
object
a: null
b: "b"
c: null
It looks like it's still reading the header and not understanding the linebreak correctly. I would expect it to return an array of objects with two rows following the schema. I've tried a bunch of different combinations of mapping with linesep, header, etc. But I just can not get it to work how I would expect.
Does anyone have any advice on this?
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 08:26 PM
Here is the code I used to produce the results shown in the image I shared earlier. It's a bit verbose, so I’m not entirely satisfied with it, but I hope it might provide some helpful insights for you.
%sql
WITH your_table AS (
-- Example CSV data
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
),
split_lines AS (
-- Split data into lines
SELECT explode(split(csv_column, '\n')) AS line, row_number() OVER (ORDER BY csv_column) AS row_num
FROM your_table
),
data_without_header AS (
-- Exclude header row
SELECT line
FROM split_lines
WHERE row_num > 1
),
parsed_data AS (
-- Parse CSV format (considering newlines and double quotes)
SELECT
from_csv(
line,
'a INT, b STRING, c DOUBLE',
map('header', 'false', 'multiLine', 'true', 'quote', '"', 'delimiter', ',')
) AS parsed_row
FROM data_without_header
)
SELECT parsed_row.*
FROM parsed_data;
Takuya Omi (尾美拓哉)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-03-2024 11:47 PM
Hi, @seanstachff
Is the following result what you are expecting?
parsed_csv
{"a":1,"b":"hello, world","c":3.14}
{"a":2,"b":"goodbye, world","c":2.71}
If so, you can achieve it with the SQL below.
%sql
WITH your_table AS (
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
)
SELECT
from_csv(
row,
'a INT, b STRING, c DOUBLE',
map('header', 'false', 'multiLine', 'false')
) AS parsed_csv
FROM your_table
LATERAL VIEW explode(slice(split(csv_column, '\n'), 2, size(split(csv_column, '\n')) - 1)) AS row;
If you’re working with table-formatted data where each column exists as a separate field, there are other methods available as well.
Takuya Omi (尾美拓哉)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 01:59 PM
Thanks, I didn't read the docs clear enough and see that csv_data is for one row. I'm a big confused as to why it has a header then. My only issue with this is how could you handle the possibility of it having a linebreak in a string value on the csv row
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-04-2024 05:46 PM
Is this the kind of result you are expecting?
- The data is interpreted as multiple rows.
- The CSV contains multiple data rows (records), each split according to the schema.
- Even when line breaks are part of a column value, they are not mistaken as row delimiters and are handled correctly as part of the dataset.
Is it something like the result shown in the image?
Takuya Omi (尾美拓哉)
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 07:22 PM
Yes, you are correct. That is the result I want. The above I showed is just an example however my real source table is a list of files with one column value being the entire contents of a CSV.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-05-2024 08:26 PM
Here is the code I used to produce the results shown in the image I shared earlier. It's a bit verbose, so I’m not entirely satisfied with it, but I hope it might provide some helpful insights for you.
%sql
WITH your_table AS (
-- Example CSV data
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
),
split_lines AS (
-- Split data into lines
SELECT explode(split(csv_column, '\n')) AS line, row_number() OVER (ORDER BY csv_column) AS row_num
FROM your_table
),
data_without_header AS (
-- Exclude header row
SELECT line
FROM split_lines
WHERE row_num > 1
),
parsed_data AS (
-- Parse CSV format (considering newlines and double quotes)
SELECT
from_csv(
line,
'a INT, b STRING, c DOUBLE',
map('header', 'false', 'multiLine', 'true', 'quote', '"', 'delimiter', ',')
) AS parsed_row
FROM data_without_header
)
SELECT parsed_row.*
FROM parsed_data;
Takuya Omi (尾美拓哉)

