cancel
Showing results for 
Search instead for 
Did you mean: 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results for 
Search instead for 
Did you mean: 

column "id" is of type uuid but expression is of type character varying.

kp12
New Contributor II

Hello,

I'm trying to write to Azure PostgreSQL flexible  database from Azure Databricks, using PostgreSQL connector in Databricks Runtime in 12.2LTS.

I'm using df.write.format("postgresql").save() to write to PostgreSQL database, but getting the following error:

column "id" is of type uuid but expression is of type character varying.

df has an id column that contains a guid, but is of string type in dataframe and of type uuid in PG database.

As there is no guid or uuid type in Databricks, whet is the best way to insert into an uuid type column?

Thanks,

Kalyani

 

 

 

4 REPLIES 4

Priyanka_Biswas
Databricks Employee
Databricks Employee

Hi @kp12  The error you are encountering is because the PostgreSQL UUID data type is not directly compatible with the string type in Spark DataFrame. You can try casting the string to binary type before writing to the database. Here is how you can do it:

python
from pyspark.sql.functions import expr

df = df.withColumn("id", expr("uuid(id)"))

df.write.format("jdbc").option("url", "jdbc:postgresql://<hostname>:<port>/<database>").option("dbtable", "<table>").option("user", "<username>").option("password", "<password>").save()

In the above code, replace <hostname>, <port>, <database>, <table>, <username>, and <password> with your actual PostgreSQL database details. Please note that you need to use "jdbc" as the format in the write method, not "postgresql". The "uuid()" function in the expr method is used to convert the string type to binary type which is compatible with the UUID type in PostgreSQL.

This is completely wrong. Answer looks like chat gpt generated. Chat gpt always provide answers with jdbc connector type. Author is asking about PostgreSQL connector which is better than jdbc.

uuid(id) function don't accept arguments. You can do uuid() and it will generate random guid which will be of type string in dataframe. You provided function is just what chat gpt made up.

noname123
New Contributor III

Had same problem. You need to add option "stringtype": "unspecified".

Example:

df.write.format("postgresql").mode("overwrite").option("truncate", "true").option("stringtype", "unspecified").option("dbtable", table).option("host", host).option("database", database).option("user", username).option("password", password).save()

Student-Learn
New Contributor II

Yes, this stack overflow was my reference too and adding below option made load go with no error on UUID data type in postgres column

Spoiler
.option(stringtype, "unspecified")

https://stackoverflow.com/questions/40973909/how-to-write-into-postgresql-hstore-using-spark-dataset...

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group