12-03-2024 10:02 PM
Hello, I am trying to use from_csv in the sql warehouse, but I am getting unexpected results:
As a small example I am running:
WITH your_table AS (
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
)
SELECT from_csv(csv_column, 'a INT, b STRING, c DOUBLE', map('header', 'true','multiLine','true')) AS parsed_csv
FROM your_table
which is returning one row with the object:
object
a: null
b: "b"
c: null
It looks like it's still reading the header and not understanding the linebreak correctly. I would expect it to return an array of objects with two rows following the schema. I've tried a bunch of different combinations of mapping with linesep, header, etc. But I just can not get it to work how I would expect.
Does anyone have any advice on this?
12-05-2024 08:26 PM
Here is the code I used to produce the results shown in the image I shared earlier. It's a bit verbose, so I’m not entirely satisfied with it, but I hope it might provide some helpful insights for you.
%sql
WITH your_table AS (
-- Example CSV data
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
),
split_lines AS (
-- Split data into lines
SELECT explode(split(csv_column, '\n')) AS line, row_number() OVER (ORDER BY csv_column) AS row_num
FROM your_table
),
data_without_header AS (
-- Exclude header row
SELECT line
FROM split_lines
WHERE row_num > 1
),
parsed_data AS (
-- Parse CSV format (considering newlines and double quotes)
SELECT
from_csv(
line,
'a INT, b STRING, c DOUBLE',
map('header', 'false', 'multiLine', 'true', 'quote', '"', 'delimiter', ',')
) AS parsed_row
FROM data_without_header
)
SELECT parsed_row.*
FROM parsed_data;
12-03-2024 11:47 PM
Hi, @seanstachff
Is the following result what you are expecting?
parsed_csv
{"a":1,"b":"hello, world","c":3.14}
{"a":2,"b":"goodbye, world","c":2.71}
If so, you can achieve it with the SQL below.
%sql
WITH your_table AS (
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
)
SELECT
from_csv(
row,
'a INT, b STRING, c DOUBLE',
map('header', 'false', 'multiLine', 'false')
) AS parsed_csv
FROM your_table
LATERAL VIEW explode(slice(split(csv_column, '\n'), 2, size(split(csv_column, '\n')) - 1)) AS row;
If you’re working with table-formatted data where each column exists as a separate field, there are other methods available as well.
12-04-2024 01:59 PM
Thanks, I didn't read the docs clear enough and see that csv_data is for one row. I'm a big confused as to why it has a header then. My only issue with this is how could you handle the possibility of it having a linebreak in a string value on the csv row
12-04-2024 05:46 PM
Is this the kind of result you are expecting?
Is it something like the result shown in the image?
12-05-2024 07:22 PM
Yes, you are correct. That is the result I want. The above I showed is just an example however my real source table is a list of files with one column value being the entire contents of a CSV.
12-05-2024 08:26 PM
Here is the code I used to produce the results shown in the image I shared earlier. It's a bit verbose, so I’m not entirely satisfied with it, but I hope it might provide some helpful insights for you.
%sql
WITH your_table AS (
-- Example CSV data
SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
),
split_lines AS (
-- Split data into lines
SELECT explode(split(csv_column, '\n')) AS line, row_number() OVER (ORDER BY csv_column) AS row_num
FROM your_table
),
data_without_header AS (
-- Exclude header row
SELECT line
FROM split_lines
WHERE row_num > 1
),
parsed_data AS (
-- Parse CSV format (considering newlines and double quotes)
SELECT
from_csv(
line,
'a INT, b STRING, c DOUBLE',
map('header', 'false', 'multiLine', 'true', 'quote', '"', 'delimiter', ',')
) AS parsed_row
FROM data_without_header
)
SELECT parsed_row.*
FROM parsed_data;
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now