<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to partition JDBC Oracle read query and cast with TO_DATE on partition date field? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/125460#M47450</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106217"&gt;@joshuat&lt;/a&gt;&amp;nbsp;Hello. Did you manage to solve this issue? We are having a similar problem, where we can't read Oracle data partitioned, unless the lower and upper bounds are integrers.&lt;/P&gt;</description>
    <pubDate>Wed, 16 Jul 2025 14:51:16 GMT</pubDate>
    <dc:creator>pavlosskev</dc:creator>
    <dc:date>2025-07-16T14:51:16Z</dc:date>
    <item>
      <title>How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/76119#M35153</link>
      <description>&lt;P&gt;I'm attempting to fetch an Oracle Netsuite table in parallel via JDBC using the Netsuite Connect JAR, already installed on the cluster and setup correctly.&amp;nbsp;I can do successfully with a single-threaded approach using the `dbtable` option:&lt;/P&gt;&lt;PRE&gt;table = 'Transaction' &lt;BR /&gt;df = spark.read.format("jdbc") \ &lt;BR /&gt;.option("url", jdbc_url) \ &lt;BR /&gt;.option("driver", jdbc_driver) \ &lt;BR /&gt;.option("dbtable", table) \ &lt;BR /&gt;.load()&lt;/PRE&gt;&lt;P&gt;&lt;SPAN&gt;&amp;nbsp;I'd like to partition the fetch on a date field, but this code receives an error - java.sql.SQLSyntaxErrorException: [NetSuite][SuiteAnalytics Connect JDBC Driver][OpenAccess SDK SQL Engine]Syntax Error in the SQL statement.[10104]:&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;&lt;DIV&gt;&lt;PRE&gt;lower_bound = "2024-01-01"&lt;BR /&gt;upper_bound = "2024-03-31"&lt;BR /&gt;query = f"""&lt;BR /&gt;(&lt;BR /&gt;SELECT * FROM {table}&lt;BR /&gt;WHERE TO_DATE(lastmodifieddate, 'yyyy-MM-dd') &amp;gt;= TO_DATE('{lower_bound}', 'yyyy-MM-dd')&lt;BR /&gt;AND TO_DATE(lastmodifieddate, 'yyyy-MM-dd') &amp;lt;= TO_DATE('{upper_bound}', 'yyyy-MM-dd')&lt;BR /&gt;) AS t&lt;BR /&gt;"""&lt;BR /&gt;&lt;SPAN&gt;df&lt;/SPAN&gt; &lt;SPAN&gt;=&lt;/SPAN&gt; &lt;SPAN&gt;spark&lt;/SPAN&gt;&lt;SPAN&gt;.read.format(&lt;/SPAN&gt;&lt;SPAN&gt;"jdbc"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"url"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;jdbc_url&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"driver"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;jdbc_driver&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"dbtable"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;query&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"partitionColumn"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"lastmodifieddate"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"lowerBound"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;lower_bound&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"upperBound"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;upper_bound&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"numPartitions"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;10&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"fetchsize"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;1000&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .option(&lt;/SPAN&gt;&lt;SPAN&gt;"sessionInitStatement"&lt;/SPAN&gt;&lt;SPAN&gt;, &lt;/SPAN&gt;&lt;SPAN&gt;"ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'"&lt;/SPAN&gt;&lt;SPAN&gt;) \&lt;/SPAN&gt;&lt;BR /&gt;&lt;SPAN&gt; .load()&lt;/SPAN&gt;&lt;/PRE&gt;&lt;/DIV&gt;&lt;P&gt;I believe this is because the string is not being cast to a date using TO_DATE because of how partitioning works in Spark. Oracle/Netsuite wants the strings cast to dates in the SQL statement. Sure enough, the Spark debug logs show that no TO_DATE is being applied to the WHERE clauses:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;INFO JDBCRelation: Number of partitions: 10, WHERE clauses of these partitions: "lastmodifieddate" &amp;lt; '2024-01-09 23:54:00' or "lastmodifieddate" is null...&lt;/PRE&gt;&lt;P&gt;How do I address this?&lt;/P&gt;</description>
      <pubDate>Fri, 28 Jun 2024 18:48:23 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/76119#M35153</guid>
      <dc:creator>joshuat</dc:creator>
      <dc:date>2024-06-28T18:48:23Z</dc:date>
    </item>
    <item>
      <title>Re: How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/76536#M35251</link>
      <description>&lt;P&gt;Thank you for your reply. The answers and the Stack Overflow post you link appear to be heading off in the wrong direction. I don't have a named partition in the source Netsuite table. Additionally, the source table isn't partitioned by a date field. Lastly, the answer SQL you and SO provide only covers one hardcoded day of data:&amp;nbsp;&lt;/P&gt;&lt;PRE&gt;PARTITION FOR (DATE '2024-07-24')&lt;/PRE&gt;&lt;P&gt;My time range spans 3 months, so I would need PARTITION FOR parameterized. The Spark logs don't help, and the Netsuite logs merely give this information:&amp;nbsp;&lt;SPAN&gt;"Invalid Month."&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;I'd appreciate it if you could propose a code refactor that fits my use case more closely. Thank you!&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2024 17:28:46 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/76536#M35251</guid>
      <dc:creator>joshuat</dc:creator>
      <dc:date>2024-07-02T17:28:46Z</dc:date>
    </item>
    <item>
      <title>Re: How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/76544#M35255</link>
      <description>&lt;P&gt;Hello,&lt;/P&gt;&lt;P&gt;To address the issue&lt;/P&gt;&lt;P&gt;Step 1: Load Data Without Partitioning&lt;/P&gt;&lt;LI-CODE lang="python"&gt;table = 'Transaction'
lower_bound = "2024-01-01"
upper_bound = "2024-03-31"

query = f"""
(
SELECT * FROM {table}
WHERE TO_DATE(lastmodifieddate, 'yyyy-MM-dd') &amp;gt;= TO_DATE('{lower_bound}', 'yyyy-MM-dd')
AND TO_DATE(lastmodifieddate, 'yyyy-MM-dd') &amp;lt;= TO_DATE('{upper_bound}', 'yyyy-MM-dd')
) AS t
"""

df = spark.read.format("jdbc") \
 .option("url", jdbc_url) \
 .option("driver", jdbc_driver) \
 .option("dbtable", query) \
 .option("fetchsize", 1000) \
 .option("sessionInitStatement", "ALTER SESSION SET NLS_DATE_FORMAT = 'YYYY-MM-DD'") \
 .load()&lt;/LI-CODE&gt;&lt;P&gt;Step 2: Repartition the Data Within Spark Once you have the DataFrame df loaded, you can repartition it based on the lastmodifieddate field:&lt;/P&gt;&lt;LI-CODE lang="python"&gt;# Convert the 'lastmodifieddate' column to date type if it's not already
from pyspark.sql.functions import col, to_date

df = df.withColumn("lastmodifieddate", to_date(col("lastmodifieddate"), "yyyy-MM-dd"))

# Repartition the DataFrame
num_partitions = 10
df_repartitioned = df.repartition(num_partitions, "lastmodifieddate")

# Now you can proceed with further processing on df_repartitioned
df_repartitioned.show()&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 02 Jul 2024 20:07:42 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/76544#M35255</guid>
      <dc:creator>mtajmouati</dc:creator>
      <dc:date>2024-07-02T20:07:42Z</dc:date>
    </item>
    <item>
      <title>Re: How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/78832#M35610</link>
      <description>&lt;P&gt;&lt;SPAN class=""&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/109839"&gt;@mtajmouati&lt;/a&gt;&lt;/SPAN&gt;&amp;nbsp;I appreciate your response. This approach resulted in a generic "bad SQL" error in Netsuite: "&lt;SPAN&gt;java.sql.SQLSyntaxErrorException: [NetSuite][SuiteAnalytics Connect JDBC Driver][OpenAccess SDK SQL Engine]Syntax Error in the SQL statement.[10104]"&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;My response was delayed because I have since decided to work around this issue and partition the fetch by the long integer ID field, since that doesn't require casting to a DATE data type. Thank you.&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 15 Jul 2024 16:27:45 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/78832#M35610</guid>
      <dc:creator>joshuat</dc:creator>
      <dc:date>2024-07-15T16:27:45Z</dc:date>
    </item>
    <item>
      <title>Re: How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/125460#M47450</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/106217"&gt;@joshuat&lt;/a&gt;&amp;nbsp;Hello. Did you manage to solve this issue? We are having a similar problem, where we can't read Oracle data partitioned, unless the lower and upper bounds are integrers.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 14:51:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/125460#M47450</guid>
      <dc:creator>pavlosskev</dc:creator>
      <dc:date>2025-07-16T14:51:16Z</dc:date>
    </item>
    <item>
      <title>Re: How to partition JDBC Oracle read query and cast with TO_DATE on partition date field?</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/125516#M47465</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/114440"&gt;@pavlosskev&lt;/a&gt;&amp;nbsp;I did not and have to do partitioned reads via the ID.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 23:21:58 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-partition-jdbc-oracle-read-query-and-cast-with-to-date-on/m-p/125516#M47465</guid>
      <dc:creator>joshuat</dc:creator>
      <dc:date>2025-07-16T23:21:58Z</dc:date>
    </item>
  </channel>
</rss>

