How do you properly read database-files (.db) with Spark in Python after the JDBC update?

jomt — Wed, 09 Aug 2023 13:00:03 GMT

I have a set of database-files (.db) which I need to read into my Python Notebook in Databricks. I managed to do this fairly simple up until July when a update in SQLite JDBC library was introduced.

Up until now I have read the files in question with this (modified) code:

`df = spark.read.format("jdbc").options(url='<url>',

dbtable='<tablename>',

driver="org.sqlite.JDBC").load()`

However, after the update the data that is being read in is completely wrong (e.g. numeric columns with non-negative numbers, all of a sudden contains some negative numbers very different from the real value of the files).

Is there a better way to read in the .db files in the new SQLite JDBC 3.42.0.0 upgrade?

Re: How do you properly read database-files (.db) with Spark in Python after the JDBC update?

jomt — Thu, 10 Aug 2023 13:36:48 GMT

When the numbers in the table are really big (millions and billions) or really low (e.g. 1e-15), SQLite JDBC may struggle to import the correct values. To combat this, a good idea could be to use customSchema in options to define the schema using Decimals with a high range (or many decimals when numbers are really low).

`df = spark.read.format("jdbc").options(url='<url>',

dbtable='<tablename>',

driver="org.sqlite.JDBC",

customSchema="<col1> DECIMAL(38, 0), <col2> DECIMAL(38, 0), <col3> DECIMAL(38, 0)"
).load()`

topic How do you properly read database-files (.db) with Spark in Python after the JDBC update? in Get Started Discussions

How do you properly read database-files (.db) with Spark in Python after the JDBC update?

Re: How do you properly read database-files (.db) with Spark in Python after the JDBC update?