Hi Team,
I am trying to create a pipeline to incrementally ingest data from an RDS postgresql database which contains tables that have some columns of jsonb data type. I am currently using AWS DMS with CDC to first load the data into an S3 bucket as csv files, and then using Databricks Autoloader to ingest the files into a streaming delta table.
Currently, the json data is being stored as a string data type, whereas I would like it to be stored as a struct data type so that it can be queried.
What is the best way to achieve this with Autoloader so that the ingested data has the jsonb columns as struct data types? This is the code I am using to ingest the data -
spark.readStream.format("cloudFiles")
.option("cloudFiles.format", "csv")
.option("cloudFiles.inferSchema", "true")
.option("cloudFiles.inferColumnTypes", "true")
.load("s3://path/to/bucket")
Thanks in advance!