Hi team,
In Databricks I need to query a postgres source like
select * from postgres_tbl where id in (select id from df)
the df is got from a hive table. If I use JDBC driver, and do
query = '(select * from postgres_tbl) as t'
src_df = spark.read.format("postgresql").option("dbtable", query)....
if I join src_df with df, seems no pushdown to postgres query.
I know I can filter or get a sql string by convert df to id string list, but if df has many rows, the query is going to be very long. Is there a good way to make it pushdown or do federated query efficiently?
Thanks
Brad