Databricks Community

kp12 · ‎09-26-2023

Hello,

I'm trying to write to Azure PostgreSQL flexible database from Azure Databricks, using PostgreSQL connector in Databricks Runtime in 12.2LTS.

I'm using df.write.format("postgresql").save() to write to PostgreSQL database, but getting the following error:

column "id" is of type uuid but expression is of type character varying.

df has an id column that contains a guid, but is of string type in dataframe and of type uuid in PG database.

As there is no guid or uuid type in Databricks, whet is the best way to insert into an uuid type column?

Thanks,

Kalyani

Priyanka_Biswas · ‎10-02-2023

Hi @kp12 The error you are encountering is because the PostgreSQL UUID data type is not directly compatible with the string type in Spark DataFrame. You can try casting the string to binary type before writing to the database. Here is how you can do it:

python
from pyspark.sql.functions import expr

df = df.withColumn("id", expr("uuid(id)"))

df.write.format("jdbc").option("url", "jdbc:postgresql://<hostname>:<port>/<database>").option("dbtable", "<table>").option("user", "<username>").option("password", "<password>").save()

In the above code, replace <hostname>, <port>, <database>, <table>, <username>, and <password> with your actual PostgreSQL database details. Please note that you need to use "jdbc" as the format in the write method, not "postgresql". The "uuid()" function in the expr method is used to convert the string type to binary type which is compatible with the UUID type in PostgreSQL.

noname123 · ‎03-25-2024

This is completely wrong. Answer looks like chat gpt generated. Chat gpt always provide answers with jdbc connector type. Author is asking about PostgreSQL connector which is better than jdbc.

uuid(id) function don't accept arguments. You can do uuid() and it will generate random guid which will be of type string in dataframe. You provided function is just what chat gpt made up.

noname123 · ‎03-25-2024

Had same problem. You need to add option "stringtype": "unspecified".

Example:

df.write.format("postgresql").mode("overwrite").option("truncate", "true").option("stringtype", "unspecified").option("dbtable", table).option("host", host).option("database", database).option("user", username).option("password", password).save()