Schema issue while fetching data from oracle
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2024 07:32 AM
I dont have the complete context of the issue.
But Here it is what I know, a friend of mine facing this
""
I am fetching data from Oracle data in databricks using python.But every time i do it the schema gets changes
so if the column is of type decimal for col value is 0.125 then its writting as 0.125000000
another example - 20240821 it is returning it as 20240821.0000000,
if a column has value 0 it shows 0E-10. ""
I suggested a solution that you can specify the data types while reading it ,he said
- I am having 15 tables and 150 columns each of approx size 20-40 million.
- I am casting every column as string before query the data then its taking hell amount of time
- Labels:
-
Spark
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-27-2024 10:26 AM
Thanks for your question!
To address schema issues when fetching Oracle data in Databricks, use JDBC schema inference to define data types programmatically or batch-cast columns dynamically after loading. For performance, enable predicate pushdown and partitioning during the read, minimizing the data load per query. If the trailing zeros or scientific notation persist during writes, configure specific decimalOptions or cast columns explicitly to maintain consistency. Hope it helps!

