- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-07-2025 02:59 PM
Hi Community,
I need to load data from PostgreSQL into Databricks through JDBC without changing the data type of a VARCHAR[]column in PostgreSQL, which should remain as an array of strings in Databricks.
Previously, I used psycopg2, and it worked, but I had to first convert the data in Pandas and then in Spark before using it in Databricks. I’d like to switch to JDBC for a more direct approach.
However, when I load the data, the VARCHAR[] column appears as a string in the format:
"{LIKE, COME, "LIKE_COME"}
Similarly, arrays of BIGINT appear as:
"{45,46}".
I understand that PostgreSQL array types are not natively supported in Databricks, but what steps should I take to retain the same data type after loading via JDBC?
Any guidance would be appreciated!
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2025 06:12 AM
Hi community,
Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:
query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string,
array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch"""
postgres_df = spark.read.format("jdbc") \
.option("url", connection_properties["jdbc_url"]) \
.option("dbtable", query) \
.option("user", connection_properties["db_username"]) \
.option("password", connection_properties["db_password"]) \
.option("fetchsize", FETCH_SIZE) \
.load() \
.withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \
.withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \
.drop("columns_to_export_string", "ups_to_string")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2025 06:12 AM
Hi community,
Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:
query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string,
array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch"""
postgres_df = spark.read.format("jdbc") \
.option("url", connection_properties["jdbc_url"]) \
.option("dbtable", query) \
.option("user", connection_properties["db_username"]) \
.option("password", connection_properties["db_password"]) \
.option("fetchsize", FETCH_SIZE) \
.load() \
.withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \
.withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \
.drop("columns_to_export_string", "ups_to_string")

