05-13-2025 10:05 AM
Hello,
I am trying to migrate some spark reads away from JDBC into the federated queries based in unity catalog.
Here is an example of the spark read command that I want to migrate:
The closest that I can find is at the bottom of this link here: Run federated queries on PostgreSQL | Databricks Documentation but that doesn't explain how that option can be applied.
Here is an example of how I want the future queries to look like:
spark.table("postgres_catalog.postgres_database.postgres_table")
Does anyone know?
Thank You
05-13-2025 11:01 AM
It's a common challenge when transitioning from Spark JDBC reads to Unity Catalog federated tables, especially when dealing with timestamp types.
This option is specific to the JDBC data source and tells Spark to treat timestamp columns without timezone as TIMESTAMP_NTZ (no time zone) instead of defaulting to TIMESTAMP (which is UTC by default).
Federated Tables in Unity Catalog
When you migrate to Unity Catalog federated tables, the underlying mechanics change:
-- You're no longer using spark.read.format("jdbc") — Databricks uses SQL pushdown via connectors in foreign catalogs.
-- These are managed metadata tables that automatically sync schemas and push down query execution to the external system (PostgreSQL in your case).
-- Because you're querying them via spark.table("...") or SQL (SELECT * FROM ...), you're using Databricks SQL dialect, not raw JDBC options.
No Option for preferTimestampNTZ:
Currently, Unity Catalog does not expose a documented setting like preferTimestampNTZ=True for federated PostgreSQL connections.
Instead, Databricks does automatic type inference, and by default:
-- PostgreSQL’s timestamp without time zone is mapped to TIMESTAMP (UTC) in Databricks.
-- If your data is not in UTC, this mapping can silently shift your values unless you apply manual conversion logic (e.g., using from_utc_timestamp()).
Workarounds:
1. Manually adjust time zones in queries
If your timestamps are in a known local timezone (say "America/New_York"), you can do:
In SQL:
SELECT from_utc_timestamp(your_column, 'America/New_York') FROM postgres_catalog.db.table
Or in PySpark:
from pyspark.sql.functions import from_utc_timestamp
df = spark.table("postgres_catalog.db.table").withColumn("local_ts", from_utc_timestamp("your_column", "America/New_York"))
2. Use Views to Abstract the Logic
If this transformation is needed in many places, create views in Unity Catalog that do this adjustment:
CREATE VIEW postgres_catalog.db.adjusted_table AS
SELECT from_utc_timestamp(ts_col, 'America/New_York') as ts_col_local, ...
FROM postgres_catalog.db.raw_table
3. Keep JDBC reads where necessary
If your timestamp logic is critical and cannot be centralized via views, you may need to keep JDBC reads for
certain use cases until Databricks exposes a control for this behavior in Unity Catalog.
Federated tables abstract away many low-level controls (like JDBC options) to simplify cross-source querying
— but that means some fine-grained behaviors are lost, especially those related to timezone-sensitive data types.
Until Databricks adds an equivalent of preferTimestampNTZ to foreign catalog configurations,
you'll need to handle these cases explicitly.
05-13-2025 11:01 AM
It's a common challenge when transitioning from Spark JDBC reads to Unity Catalog federated tables, especially when dealing with timestamp types.
This option is specific to the JDBC data source and tells Spark to treat timestamp columns without timezone as TIMESTAMP_NTZ (no time zone) instead of defaulting to TIMESTAMP (which is UTC by default).
Federated Tables in Unity Catalog
When you migrate to Unity Catalog federated tables, the underlying mechanics change:
-- You're no longer using spark.read.format("jdbc") — Databricks uses SQL pushdown via connectors in foreign catalogs.
-- These are managed metadata tables that automatically sync schemas and push down query execution to the external system (PostgreSQL in your case).
-- Because you're querying them via spark.table("...") or SQL (SELECT * FROM ...), you're using Databricks SQL dialect, not raw JDBC options.
No Option for preferTimestampNTZ:
Currently, Unity Catalog does not expose a documented setting like preferTimestampNTZ=True for federated PostgreSQL connections.
Instead, Databricks does automatic type inference, and by default:
-- PostgreSQL’s timestamp without time zone is mapped to TIMESTAMP (UTC) in Databricks.
-- If your data is not in UTC, this mapping can silently shift your values unless you apply manual conversion logic (e.g., using from_utc_timestamp()).
Workarounds:
1. Manually adjust time zones in queries
If your timestamps are in a known local timezone (say "America/New_York"), you can do:
In SQL:
SELECT from_utc_timestamp(your_column, 'America/New_York') FROM postgres_catalog.db.table
Or in PySpark:
from pyspark.sql.functions import from_utc_timestamp
df = spark.table("postgres_catalog.db.table").withColumn("local_ts", from_utc_timestamp("your_column", "America/New_York"))
2. Use Views to Abstract the Logic
If this transformation is needed in many places, create views in Unity Catalog that do this adjustment:
CREATE VIEW postgres_catalog.db.adjusted_table AS
SELECT from_utc_timestamp(ts_col, 'America/New_York') as ts_col_local, ...
FROM postgres_catalog.db.raw_table
3. Keep JDBC reads where necessary
If your timestamp logic is critical and cannot be centralized via views, you may need to keep JDBC reads for
certain use cases until Databricks exposes a control for this behavior in Unity Catalog.
Federated tables abstract away many low-level controls (like JDBC options) to simplify cross-source querying
— but that means some fine-grained behaviors are lost, especially those related to timezone-sensitive data types.
Until Databricks adds an equivalent of preferTimestampNTZ to foreign catalog configurations,
you'll need to handle these cases explicitly.
05-14-2025 01:21 AM
Thank you for your reply. I will mark your answer as accepted solution but I hope that in the future this options, as you explained, gets added!
05-14-2025 08:46 AM
Thanks @thomas_berry I hope so 😀
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now