<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Using FROM_CSV giving unexpected results in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/101150#M40563</link>
    <description>&lt;P&gt;Yes, you are correct. That is the result I want. The above I showed is just an example however my real source table is a list of files with one column value being the entire contents of a CSV.&lt;/P&gt;</description>
    <pubDate>Fri, 06 Dec 2024 03:22:08 GMT</pubDate>
    <dc:creator>seanstachff</dc:creator>
    <dc:date>2024-12-06T03:22:08Z</dc:date>
    <item>
      <title>Using FROM_CSV giving unexpected results</title>
      <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100845#M40438</link>
      <description>&lt;P&gt;Hello, I am trying to use from_csv in the sql warehouse, but I am getting unexpected results:&lt;BR /&gt;&lt;BR /&gt;As a small example I am running:&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;WITH your_table AS (
  SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
)

SELECT from_csv(csv_column, 'a INT, b STRING, c DOUBLE', map('header', 'true','multiLine','true')) AS parsed_csv
FROM your_table&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;which is returning one row with the object:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="markup"&gt;object
a: null
b: "b"
c: null&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;It looks like it's still reading the header and not understanding the linebreak correctly. I would expect it to return an array of objects with two rows following the schema. I've tried a bunch of different combinations of mapping with linesep, header, etc. But I just can not get it to work how I would expect.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Does anyone have any advice on this?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 06:02:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100845#M40438</guid>
      <dc:creator>seanstachff</dc:creator>
      <dc:date>2024-12-04T06:02:22Z</dc:date>
    </item>
    <item>
      <title>Re: Using FROM_CSV giving unexpected results</title>
      <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100860#M40447</link>
      <description>&lt;P&gt;Hi,&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/134883"&gt;@seanstachff&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is the following result what you are expecting?&lt;/P&gt;&lt;P&gt;parsed_csv&lt;BR /&gt;{"a":1,"b":"hello, world","c":3.14}&lt;BR /&gt;{"a":2,"b":"goodbye, world","c":2.71}&lt;/P&gt;&lt;P&gt;If so, you can achieve it with the SQL below.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql

WITH your_table AS (
  SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
)

SELECT 
  from_csv(
    row, 
    'a INT, b STRING, c DOUBLE', 
    map('header', 'false', 'multiLine', 'false')
  ) AS parsed_csv
FROM your_table
LATERAL VIEW explode(slice(split(csv_column, '\n'), 2, size(split(csv_column, '\n')) - 1)) AS row;&lt;/LI-CODE&gt;&lt;P&gt;If you’re working with table-formatted data where each column exists as a separate field, there are other methods available as well.&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 07:47:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100860#M40447</guid>
      <dc:creator>Takuya-Omi</dc:creator>
      <dc:date>2024-12-04T07:47:53Z</dc:date>
    </item>
    <item>
      <title>Re: Using FROM_CSV giving unexpected results</title>
      <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100981#M40497</link>
      <description>&lt;P&gt;Thanks, I didn't read the docs clear enough and see that csv_data is for one row. I'm a big confused as to why it has a header then. My only issue with this is how could you handle the possibility of it having a linebreak in a string value on the csv row&lt;/P&gt;</description>
      <pubDate>Wed, 04 Dec 2024 21:59:22 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100981#M40497</guid>
      <dc:creator>seanstachff</dc:creator>
      <dc:date>2024-12-04T21:59:22Z</dc:date>
    </item>
    <item>
      <title>Re: Using FROM_CSV giving unexpected results</title>
      <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100995#M40507</link>
      <description>&lt;P&gt;Is this the kind of result you are expecting?&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;The data is interpreted as multiple rows.&lt;/LI&gt;&lt;LI&gt;The CSV contains multiple data rows (records), each split according to the schema.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Even when line breaks are part of a column value, they are not mistaken as row delimiters and are handled correctly as part of the dataset.&lt;/STRONG&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;Is it something like the result shown in the image?&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="TakuyaOmi_0-1733363129023.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/13326iCE487FC770C7B3C7/image-size/medium?v=v2&amp;amp;px=400" role="button" title="TakuyaOmi_0-1733363129023.png" alt="TakuyaOmi_0-1733363129023.png" /&gt;&lt;/span&gt;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 05 Dec 2024 01:46:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/100995#M40507</guid>
      <dc:creator>Takuya-Omi</dc:creator>
      <dc:date>2024-12-05T01:46:35Z</dc:date>
    </item>
    <item>
      <title>Re: Using FROM_CSV giving unexpected results</title>
      <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/101150#M40563</link>
      <description>&lt;P&gt;Yes, you are correct. That is the result I want. The above I showed is just an example however my real source table is a list of files with one column value being the entire contents of a CSV.&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2024 03:22:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/101150#M40563</guid>
      <dc:creator>seanstachff</dc:creator>
      <dc:date>2024-12-06T03:22:08Z</dc:date>
    </item>
    <item>
      <title>Re: Using FROM_CSV giving unexpected results</title>
      <link>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/101152#M40564</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/134883"&gt;@seanstachff&lt;/a&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Here is the code I used to produce the results shown in the image I shared earlier. It's a bit verbose, so I’m not entirely satisfied with it, but I hope it might provide some helpful insights for you.&lt;/P&gt;&lt;LI-CODE lang="python"&gt;%sql
WITH your_table AS (
  -- Example CSV data
  SELECT 'a,b,c\n1,"hello, world",3.14\n2,"goodbye, world",2.71' AS csv_column
),
split_lines AS (
  -- Split data into lines
  SELECT explode(split(csv_column, '\n')) AS line, row_number() OVER (ORDER BY csv_column) AS row_num
  FROM your_table
),
data_without_header AS (
  -- Exclude header row
  SELECT line
  FROM split_lines
  WHERE row_num &amp;gt; 1
),
parsed_data AS (
  -- Parse CSV format (considering newlines and double quotes)
  SELECT
    from_csv(
      line,
      'a INT, b STRING, c DOUBLE',
      map('header', 'false', 'multiLine', 'true', 'quote', '"', 'delimiter', ',')
    ) AS parsed_row
  FROM data_without_header
)
SELECT parsed_row.*
FROM parsed_data;&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Fri, 06 Dec 2024 04:26:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/using-from-csv-giving-unexpected-results/m-p/101152#M40564</guid>
      <dc:creator>Takuya-Omi</dc:creator>
      <dc:date>2024-12-06T04:26:50Z</dc:date>
    </item>
  </channel>
</rss>

