<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Pyspark - how to save the schema of a csv file in a delta table's column in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30665#M22255</link>
    <description>&lt;P&gt;Hi Piper,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately, I was not able to test it before I changed to a new employer, so I can no longer test it. However, I think it would work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Tiago R.&lt;/P&gt;</description>
    <pubDate>Fri, 04 Mar 2022 18:20:31 GMT</pubDate>
    <dc:creator>tarente</dc:creator>
    <dc:date>2022-03-04T18:20:31Z</dc:date>
    <item>
      <title>Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30657#M22247</link>
      <description>&lt;P&gt;How to save the schema of a csv file in a delta table's column?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In a previous project implemented in Databricks using Scala notebooks, we stored the schema of csv files as a "json string" in a SQL Server table.&lt;/P&gt;&lt;P&gt;When we needed to read or write the csv and the source dataframe das 0 rows, or the source csv does not exist, we use the schema stored in the SQL Server to either create an empty dataframe or empty csv file.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Now, I would like to implement something similar in Databricks but using Python notebook and store the schema of csv files in a delta table.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Any suggestions?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Tiago.&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 18:31:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30657#M22247</guid>
      <dc:creator>tarente</dc:creator>
      <dc:date>2022-01-27T18:31:58Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30658#M22248</link>
      <description>&lt;P&gt;After you read csv to dataframe spark.read.csv ... there are 3 ways &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;DataFrame.Schema&lt;/P&gt;&lt;P&gt;DataFrame.printSchema() - it is StructType &lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;and 3rd tricky way is DDL string&lt;/P&gt;&lt;P&gt;DataFrame._jdf.schema().toDDL()&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Usually DDL as it is simple string is easiest to save somewhere and than reuse. Just insert to some delta table schema and then select when needed.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Thu, 27 Jan 2022 18:41:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30658#M22248</guid>
      <dc:creator>Hubert-Dudek</dc:creator>
      <dc:date>2022-01-27T18:41:03Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30659#M22249</link>
      <description>&lt;P&gt;Hi Hubert,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for you answer, but I was not able to make it work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Let me ask the question in a different way.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;I have a &lt;I&gt;csv&lt;/I&gt; file with the following basic estruture:&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;ProductId - integer.&lt;/LI&gt;&lt;LI&gt;ProductDesc - string.&lt;/LI&gt;&lt;LI&gt;ProductCost - decimal.&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;In PySpark I would like to store the file schema in:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;In a variable to be used in the spark.read.schema(schema).options(**fileOptions).schema(schema).load(...).&lt;/LI&gt;&lt;LI&gt;Be able to store the file schema in a delta table's column.&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;What kind of transformations do I need to do to the variable in 1. to be able to stored in 2., and vice-versa?&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks in advance,&lt;/P&gt;&lt;P&gt;Tiago R.&lt;/P&gt;</description>
      <pubDate>Wed, 02 Feb 2022 09:17:26 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30659#M22249</guid>
      <dc:creator>tarente</dc:creator>
      <dc:date>2022-02-02T09:17:26Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30661#M22251</link>
      <description>&lt;P&gt;Hi Kaniz,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for your answer, although it did not answer my questions.&lt;/P&gt;</description>
      <pubDate>Mon, 07 Feb 2022 18:17:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30661#M22251</guid>
      <dc:creator>tarente</dc:creator>
      <dc:date>2022-02-07T18:17:57Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30662#M22252</link>
      <description>&lt;P&gt;Hi @Tiago Rente​&amp;nbsp;, Hope below code would help.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper" image-alt="image"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/2145iC72658D3595C8813/image-size/large?v=v2&amp;amp;px=999" role="button" title="image" alt="image" /&gt;&lt;/span&gt;&lt;/P&gt;</description>
      <pubDate>Tue, 22 Feb 2022 12:43:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30662#M22252</guid>
      <dc:creator>RKNutalapati</dc:creator>
      <dc:date>2022-02-22T12:43:49Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30663#M22253</link>
      <description>&lt;P&gt;Hi,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Thanks for you code, I will test it.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Tiago.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 23 Feb 2022 09:23:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30663#M22253</guid>
      <dc:creator>tarente</dc:creator>
      <dc:date>2022-02-23T09:23:05Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30664#M22254</link>
      <description>&lt;P&gt;@Tiago Rente​&amp;nbsp;- How did the test go?&lt;/P&gt;</description>
      <pubDate>Tue, 01 Mar 2022 21:47:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30664#M22254</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-03-01T21:47:49Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30665#M22255</link>
      <description>&lt;P&gt;Hi Piper,&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Unfortunately, I was not able to test it before I changed to a new employer, so I can no longer test it. However, I think it would work.&lt;/P&gt;&lt;P&gt;&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Tiago R.&lt;/P&gt;</description>
      <pubDate>Fri, 04 Mar 2022 18:20:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30665#M22255</guid>
      <dc:creator>tarente</dc:creator>
      <dc:date>2022-03-04T18:20:31Z</dc:date>
    </item>
    <item>
      <title>Re: Pyspark - how to save the schema of a csv file in a delta table's column</title>
      <link>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30666#M22256</link>
      <description>&lt;P&gt;@tarente - Thanks for letting us know. &lt;span class="lia-unicode-emoji" title=":slightly_smiling_face:"&gt;🙂&lt;/span&gt; &lt;/P&gt;</description>
      <pubDate>Sun, 06 Mar 2022 21:35:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/pyspark-how-to-save-the-schema-of-a-csv-file-in-a-delta-table-s/m-p/30666#M22256</guid>
      <dc:creator>Anonymous</dc:creator>
      <dc:date>2022-03-06T21:35:23Z</dc:date>
    </item>
  </channel>
</rss>

