<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic How do you properly read database-files (.db) with Spark in Python after the JDBC update? in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/how-do-you-properly-read-database-files-db-with-spark-in-python/m-p/39444#M5631</link>
    <description>&lt;P&gt;I have a set of database-files (.db) which I need to read into my Python Notebook in Databricks. I managed to do this fairly simple up until July when a update in SQLite&amp;nbsp;JDBC library was introduced.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Up until now I have read the files in question with this (modified) code:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; `&lt;SPAN&gt;df = spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"jdbc"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;.options&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;url=&lt;/SPAN&gt;&lt;SPAN&gt;'&amp;lt;url&amp;gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;dbtable=&lt;/SPAN&gt;&lt;SPAN&gt;'&amp;lt;tablename&amp;gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;driver=&lt;/SPAN&gt;&lt;SPAN&gt;"org.sqlite.JDBC"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;.load&lt;/SPAN&gt;&lt;SPAN&gt;()`&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;However, after the update the data that is being read in is completely wrong (e.g. numeric columns with non-negative numbers, all of a sudden contains some negative numbers very different from the real value of the files).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Is there a better way to read in the .db files in the new SQLite JDBC 3.42.0.0 upgrade?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
    <pubDate>Wed, 09 Aug 2023 13:00:03 GMT</pubDate>
    <dc:creator>jomt</dc:creator>
    <dc:date>2023-08-09T13:00:03Z</dc:date>
    <item>
      <title>How do you properly read database-files (.db) with Spark in Python after the JDBC update?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-do-you-properly-read-database-files-db-with-spark-in-python/m-p/39444#M5631</link>
      <description>&lt;P&gt;I have a set of database-files (.db) which I need to read into my Python Notebook in Databricks. I managed to do this fairly simple up until July when a update in SQLite&amp;nbsp;JDBC library was introduced.&amp;nbsp;&lt;/P&gt;&lt;P&gt;Up until now I have read the files in question with this (modified) code:&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; `&lt;SPAN&gt;df = spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"jdbc"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;.options&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;url=&lt;/SPAN&gt;&lt;SPAN&gt;'&amp;lt;url&amp;gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;dbtable=&lt;/SPAN&gt;&lt;SPAN&gt;'&amp;lt;tablename&amp;gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;driver=&lt;/SPAN&gt;&lt;SPAN&gt;"org.sqlite.JDBC"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;.load&lt;/SPAN&gt;&lt;SPAN&gt;()`&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;However, after the update the data that is being read in is completely wrong (e.g. numeric columns with non-negative numbers, all of a sudden contains some negative numbers very different from the real value of the files).&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&amp;nbsp;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;Is there a better way to read in the .db files in the new SQLite JDBC 3.42.0.0 upgrade?&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;</description>
      <pubDate>Wed, 09 Aug 2023 13:00:03 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-do-you-properly-read-database-files-db-with-spark-in-python/m-p/39444#M5631</guid>
      <dc:creator>jomt</dc:creator>
      <dc:date>2023-08-09T13:00:03Z</dc:date>
    </item>
    <item>
      <title>Re: How do you properly read database-files (.db) with Spark in Python after the JDBC update?</title>
      <link>https://community.databricks.com/t5/get-started-discussions/how-do-you-properly-read-database-files-db-with-spark-in-python/m-p/39515#M5632</link>
      <description>&lt;P&gt;When the numbers in the table are really big (millions and billions) or really low (e.g. 1e-15), SQLite JDBC may struggle to import the correct values. To combat this, a good idea could be to use&amp;nbsp;&lt;STRONG&gt;&lt;EM&gt;customSchema&lt;/EM&gt;&lt;/STRONG&gt;&amp;nbsp;in options to define the schema using Decimals with a high range (or many decimals when numbers are really low).&lt;/P&gt;&lt;P&gt;&amp;nbsp; &amp;nbsp; `&lt;SPAN&gt;df = spark.read.&lt;/SPAN&gt;&lt;SPAN&gt;format&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;"jdbc"&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;.options&lt;/SPAN&gt;&lt;SPAN&gt;(&lt;/SPAN&gt;&lt;SPAN&gt;url=&lt;/SPAN&gt;&lt;SPAN&gt;'&amp;lt;url&amp;gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;dbtable=&lt;/SPAN&gt;&lt;SPAN&gt;'&amp;lt;tablename&amp;gt;'&lt;/SPAN&gt;&lt;SPAN&gt;,&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;driver=&lt;/SPAN&gt;&lt;SPAN&gt;"org.sqlite.JDBC",&lt;/SPAN&gt;&lt;/DIV&gt;&lt;DIV&gt;&lt;SPAN&gt;&lt;SPAN&gt;&amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp; &amp;nbsp;customSchema=&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;"&amp;lt;col1&amp;gt; DECIMAL(38, 0), &amp;lt;col2&amp;gt; DECIMAL(38, 0), &amp;lt;col3&amp;gt; DECIMAL(38, 0)"&lt;/SPAN&gt;&lt;SPAN&gt;&lt;SPAN&gt;&lt;BR /&gt;&lt;/SPAN&gt;&lt;/SPAN&gt;&lt;SPAN&gt;)&lt;/SPAN&gt;&lt;SPAN&gt;.load&lt;/SPAN&gt;&lt;SPAN&gt;()`&lt;/SPAN&gt;&lt;/DIV&gt;&lt;/DIV&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 10 Aug 2023 13:36:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/how-do-you-properly-read-database-files-db-with-spark-in-python/m-p/39515#M5632</guid>
      <dc:creator>jomt</dc:creator>
      <dc:date>2023-08-10T13:36:48Z</dc:date>
    </item>
  </channel>
</rss>

