topic how read through jdbc from postgres to databricks a particular data type in Data Engineering

how read through jdbc from postgres to databricks a particular data type

jeremy98 — Fri, 07 Feb 2025 22:59:03 GMT

Hi Community,

I need to load data from PostgreSQL into Databricks through JDBC without changing the data type of a VARCHAR[]column in PostgreSQL, which should remain as an array of strings in Databricks.

Previously, I used psycopg2, and it worked, but I had to first convert the data in Pandas and then in Spark before using it in Databricks. I’d like to switch to JDBC for a more direct approach.

However, when I load the data, the VARCHAR[] column appears as a string in the format:
"{LIKE, COME, "LIKE_COME"}
Similarly, arrays of BIGINT appear as:
"{45,46}".

I understand that PostgreSQL array types are not natively supported in Databricks, but what steps should I take to retain the same data type after loading via JDBC?

Any guidance would be appreciated!

Re: how read through jdbc from postgres to databricks a particular data type

jeremy98 — Sat, 08 Feb 2025 14:12:04 GMT

Hi community,

Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:

query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string, array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch""" postgres_df = spark.read.format("jdbc") \ .option("url", connection_properties["jdbc_url"]) \ .option("dbtable", query) \ .option("user", connection_properties["db_username"]) \ .option("password", connection_properties["db_password"]) \ .option("fetchsize", FETCH_SIZE) \ .load() \ .withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \ .withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \ .drop("columns_to_export_string", "ups_to_string")