Schema issue while fetching data from oracle

ashraf1395 — Wed, 21 Aug 2024 14:32:39 GMT

I dont have the complete context of the issue.

But Here it is what I know, a friend of mine facing this
""
I am fetching data from Oracle data in databricks using python.But every time i do it the schema gets changes
so if the column is of type decimal for col value is 0.125 then its writting as 0.125000000
another example - 20240821 it is returning it as 20240821.0000000,
if a column has value 0 it shows 0E-10. ""

I suggested a solution that you can specify the data types while reading it ,he said
- I am having 15 tables and 150 columns each of approx size 20-40 million.
- I am casting every column as string before query the data then its taking hell amount of time

Re: Schema issue while fetching data from oracle

VZLA — Fri, 27 Dec 2024 18:26:32 GMT

Thanks for your question!
To address schema issues when fetching Oracle data in Databricks, use JDBC schema inference to define data types programmatically or batch-cast columns dynamically after loading. For performance, enable predicate pushdown and partitioning during the read, minimizing the data load per query. If the trailing zeros or scientific notation persist during writes, configure specific decimalOptions or cast columns explicitly to maintain consistency. Hope it helps!

topic Re: Schema issue while fetching data from oracle in Data Engineering

Schema issue while fetching data from oracle

Re: Schema issue while fetching data from oracle