Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
02-08-2025 06:12 AM
Hi community,
Yesterday, I found a solution. This is to query through jdbc from postgres creating two columns that are manageable in databricks. Here the code:
query = f"""(SELECT *, array_to_string(columns_to_export, ',') AS columns_to_export_string,
array_to_string(ups, ',') AS ups_to_string from {pg_table_name}) as data_to_fetch"""
postgres_df = spark.read.format("jdbc") \
.option("url", connection_properties["jdbc_url"]) \
.option("dbtable", query) \
.option("user", connection_properties["db_username"]) \
.option("password", connection_properties["db_password"]) \
.option("fetchsize", FETCH_SIZE) \
.load() \
.withColumn("columns_to_export", split(col("columns_to_export_string"), ",")) \
.withColumn("ups", split(col("ups_to_string"), ",").cast("array<bigint>")) \
.drop("columns_to_export_string", "ups_to_string")