Databricks Community

thomas_berry · ‎05-13-2025

Hello,

I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.

Here is an example of the spark read command that I want to migrate:

spark.read.format("jdbc").option("driver", "org.postgresql.Driver").option("url", url).option("user", user).option("password", password).option("query", query).option("preferTimestampNTZ", True).load()That "preferTimestampNTZ" option is really important because I have many table columns in the source that timestamps without timezone and are not in UTC.

Problem is that I cannot find in the documentation on how to set up the catalog to leverage this option.

The closest that I can find is at the bottom of this link here: Run federated queries on PostgreSQL | Databricks Documentation but that doesn't explain how that option can be applied.

Here is an example of how I want the future queries to look like:

spark.table("postgres_catalog.postgres_database.postgres_table")

Does anyone know?

Thank You

lingareddy_Alva · ‎05-13-2025

Hi @thomas_berry

It's a common challenge when transitioning from Spark JDBC reads to Unity Catalog federated tables, especially when dealing with timestamp types.
This option is specific to the JDBC data source and tells Spark to treat timestamp columns without timezone as TIMESTAMP_NTZ (no time zone) instead of defaulting to TIMESTAMP (which is UTC by default).

Federated Tables in Unity Catalog
When you migrate to Unity Catalog federated tables, the underlying mechanics change:
-- You're no longer using spark.read.format("jdbc") — Databricks uses SQL pushdown via connectors in foreign catalogs.
-- These are managed metadata tables that automatically sync schemas and push down query execution to the external system (PostgreSQL in your case).
-- Because you're querying them via spark.table("...") or SQL (SELECT * FROM ...), you're using Databricks SQL dialect, not raw JDBC options.

No Option for preferTimestampNTZ:
Currently, Unity Catalog does not expose a documented setting like preferTimestampNTZ=True for federated PostgreSQL connections.
Instead, Databricks does automatic type inference, and by default:
-- PostgreSQL’s timestamp without time zone is mapped to TIMESTAMP (UTC) in Databricks.
-- If your data is not in UTC, this mapping can silently shift your values unless you apply manual conversion logic (e.g., using from_utc_timestamp()).

Workarounds:
1. Manually adjust time zones in queries
If your timestamps are in a known local timezone (say "America/New_York"), you can do:

In SQL:
SELECT from_utc_timestamp(your_column, 'America/New_York') FROM postgres_catalog.db.table

Or in PySpark:
from pyspark.sql.functions import from_utc_timestamp
df = spark.table("postgres_catalog.db.table").withColumn("local_ts", from_utc_timestamp("your_column", "America/New_York"))

2. Use Views to Abstract the Logic
If this transformation is needed in many places, create views in Unity Catalog that do this adjustment:

CREATE VIEW postgres_catalog.db.adjusted_table AS
SELECT from_utc_timestamp(ts_col, 'America/New_York') as ts_col_local, ...
FROM postgres_catalog.db.raw_table

3. Keep JDBC reads where necessary
If your timestamp logic is critical and cannot be centralized via views, you may need to keep JDBC reads for
certain use cases until Databricks exposes a control for this behavior in Unity Catalog.

Federated tables abstract away many low-level controls (like JDBC options) to simplify cross-source querying
— but that means some fine-grained behaviors are lost, especially those related to timezone-sensitive data types.
Until Databricks adds an equivalent of preferTimestampNTZ to foreign catalog configurations,
you'll need to handle these cases explicitly.

LR

View solution in original post

lingareddy_Alva · ‎05-13-2025