Databricks Community

jeremy98 · ‎02-07-2025

Hi Community,

I need to load data from PostgreSQL into Databricks through JDBC without changing the data type of a VARCHAR[]column in PostgreSQL, which should remain as an array of strings in Databricks.

Previously, I used psycopg2, and it worked, but I had to first convert the data in Pandas and then in Spark before using it in Databricks. I’d like to switch to JDBC for a more direct approach.

However, when I load the data, the VARCHAR[] column appears as a string in the format:
"{LIKE, COME, "LIKE_COME"}
Similarly, arrays of BIGINT appear as:
"{45,46}".

I understand that PostgreSQL array types are not natively supported in Databricks, but what steps should I take to retain the same data type after loading via JDBC?

Any guidance would be appreciated!

jeremy98 · ‎02-08-2025

Hi community,

Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:

   query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string, 
      array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch"""
    postgres_df = spark.read.format("jdbc") \
      .option("url", connection_properties["jdbc_url"]) \
      .option("dbtable", query) \
      .option("user", connection_properties["db_username"]) \
      .option("password", connection_properties["db_password"]) \
      .option("fetchsize", FETCH_SIZE) \
      .load() \
      .withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \
      .withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \
      .drop("columns_to_export_string", "ups_to_string")

View solution in original post

jeremy98 · ‎02-08-2025

Hi community,

Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:

   query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string, 
      array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch"""
    postgres_df = spark.read.format("jdbc") \
      .option("url", connection_properties["jdbc_url"]) \
      .option("dbtable", query) \
      .option("user", connection_properties["db_username"]) \
      .option("password", connection_properties["db_password"]) \
      .option("fetchsize", FETCH_SIZE) \
      .load() \
      .withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \
      .withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \
      .drop("columns_to_export_string", "ups_to_string")

Databricks Community

how read through jdbc from postgres to databricks a particular data type

Join Us as a Local Community Builder!

Join us for another BrickTalk: Vibe-Coding Databricks Apps in Replit with Augusto!

🌟 Community Pulse: Your Weekly Roundup! November 14 – 20, 2025

Celebrating Our First Brickster Champion: Louis Frolio

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples