jeremy98
Honored Contributor

Hi community,

Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:

   query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string, 
      array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch"""
    postgres_df = spark.read.format("jdbc") \
      .option("url", connection_properties["jdbc_url"]) \
      .option("dbtable", query) \
      .option("user", connection_properties["db_username"]) \
      .option("password", connection_properties["db_password"]) \
      .option("fetchsize", FETCH_SIZE) \
      .load() \
      .withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \
      .withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \
      .drop("columns_to_export_string", "ups_to_string")

View solution in original post