topic federated queries on PostgreSQL - TimestampNTZ option in Data Engineering

federated queries on PostgreSQL - TimestampNTZ option

thomas_berry — Tue, 13 May 2025 17:05:33 GMT

Hello,

I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.

Here is an example of the spark read command that I want to migrate:

spark.read.format("jdbc").option("driver", "org.postgresql.Driver").option("url", url).option("user", user).option("password", password).option("query", query).option("preferTimestampNTZ", True).load()That "preferTimestampNTZ" option is really important because I have many table columns in the source that timestamps without timezone and are not in UTC.

Problem is that I cannot find in the documentation on how to set up the catalog to leverage this option.

The closest that I can find is at the bottom of this link here: Run federated queries on PostgreSQL | Databricks Documentation but that doesn't explain how that option can be applied.

Here is an example of how I want the future queries to look like:

spark.table("postgres_catalog.postgres_database.postgres_table")

Does anyone know?

Thank You

Re: federated queries on PostgreSQL - TimestampNTZ option

lingareddy_Alva — Tue, 13 May 2025 18:01:04 GMT

Hi @thomas_berry

It's a common challenge when transitioning from Spark JDBC reads to Unity Catalog federated tables, especially when dealing with timestamp types.
This option is specific to the JDBC data source and tells Spark to treat timestamp columns without timezone as TIMESTAMP_NTZ (no time zone) instead of defaulting to TIMESTAMP (which is UTC by default).

Federated Tables in Unity Catalog
When you migrate to Unity Catalog federated tables, the underlying mechanics change:
-- You're no longer using spark.read.format("jdbc") — Databricks uses SQL pushdown via connectors in foreign catalogs.
-- These are managed metadata tables that automatically sync schemas and push down query execution to the external system (PostgreSQL in your case).
-- Because you're querying them via spark.table("...") or SQL (SELECT * FROM ...), you're using Databricks SQL dialect, not raw JDBC options.

No Option for preferTimestampNTZ:
Currently, Unity Catalog does not expose a documented setting like preferTimestampNTZ=True for federated PostgreSQL connections.
Instead, Databricks does automatic type inference, and by default:
-- PostgreSQL’s timestamp without time zone is mapped to TIMESTAMP (UTC) in Databricks.
-- If your data is not in UTC, this mapping can silently shift your values unless you apply manual conversion logic (e.g., using from_utc_timestamp()).

Workarounds:
1. Manually adjust time zones in queries
If your timestamps are in a known local timezone (say "America/New_York"), you can do:

In SQL:
SELECT from_utc_timestamp(your_column, 'America/New_York') FROM postgres_catalog.db.table

Or in PySpark:
from pyspark.sql.functions import from_utc_timestamp
df = spark.table("postgres_catalog.db.table").withColumn("local_ts", from_utc_timestamp("your_column", "America/New_York"))

2. Use Views to Abstract the Logic
If this transformation is needed in many places, create views in Unity Catalog that do this adjustment:

CREATE VIEW postgres_catalog.db.adjusted_table AS
SELECT from_utc_timestamp(ts_col, 'America/New_York') as ts_col_local, ...
FROM postgres_catalog.db.raw_table

3. Keep JDBC reads where necessary
If your timestamp logic is critical and cannot be centralized via views, you may need to keep JDBC reads for
certain use cases until Databricks exposes a control for this behavior in Unity Catalog.

Federated tables abstract away many low-level controls (like JDBC options) to simplify cross-source querying
— but that means some fine-grained behaviors are lost, especially those related to timezone-sensitive data types.
Until Databricks adds an equivalent of preferTimestampNTZ to foreign catalog configurations,
you'll need to handle these cases explicitly.

Re: federated queries on PostgreSQL - TimestampNTZ option

thomas_berry — Wed, 14 May 2025 08:21:07 GMT

Thank you for your reply. I will mark your answer as accepted solution but I hope that in the future this options, as you explained, gets added!

Re: federated queries on PostgreSQL - TimestampNTZ option

lingareddy_Alva — Wed, 14 May 2025 15:46:10 GMT

Thanks @thomas_berry I hope so 😀