cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Migrating jsonb data from Postgresql database to Databricks

Karene
New Contributor

Hi Team,

I am trying to create a pipeline to incrementally ingest data from an RDS postgresql database which contains tables that have some columns of jsonb data type. I am currently using AWS DMS with CDC to first load the data into an S3 bucket as csv files, and then using Databricks Autoloader to ingest the files into a streaming delta table.

Currently, the json data is being stored as a string data type, whereas I would like it to be stored as a struct data type so that it can be queried.

What is the best way to achieve this with Autoloader so that the ingested data has the jsonb columns as struct data types? This is the code I am using to ingest the data -

 

spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("cloudFiles.inferSchema", "true")
.option("cloudFiles.inferColumnTypes", "true")
.load("s3://path/to/bucket")
 
Thanks in advance!
1 REPLY 1

BR_DatabricksAI
Contributor

Hello Karene, 

You can do the transformation in following manner from string to struct and refer to the example below:

 

data =[('001','{"name":"bhupendra","zipcode":"260100"}')]
schema = ['id','propertytype']

df = spark.createDataFrame(data,schema)
df.show(truncate=False)
df.printSchema()
from pyspark.sql.functions import from_json
from pyspark.sql.types import StructType, StructField,StringType

structTypeSchema = StructType([\
    StructField('name',StringType()),\
    StructField('zipcode',StringType())])

df1 = df.withColumn('propertystructtype', from_json(df.propertytype, structTypeSchema))
df1.show(truncate=False)
df1.printSchema()

 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group