Databricks Community

arunak · ‎04-09-2024

Hello Experts,

A new databricks user here. I am trying to access an Redshift serverless table using a databricks notebook.
Here is what happens when I try the below code,

df = spark.read.format("redshift")\

.option("dbtable", "public.customer")\

.option("tempdir", "s3://BLAH/rs-temp/")\

.option("url", "jdbc:redshift://BLAH:5439/dev")\

.option("user", "user")\

.option("password", "password")\

.load()

df.show(10,False)

It fails with the below error

IllegalArgumentException: requirement failed: You must specify a method for authenticating Redshift's connection to S3 (aws_iam_role, forward_spark_s3_credentials, or temporary_aws_*. For a discussion of the differences between these options, please see the README.

If I edit the format to "jdbc", it works no issue. I am on 13.3 LTS (includes Apache Spark 3.4.1, Scala 2.12)

I don't have an instance profile role. Why wouldn't the format("redshift") use the provided username and password and connect to redshift? What config should I be using?

shan_chandra · ‎04-10-2024

@arunak - we need to specify forward_spark_s3_credentials to true during read. This will help spark detect the credentials used to authenticate to the S3 bucket and use these credentials to r read from redshift.

Databricks Community

Connecting to Serverless Redshift from a Databricks Notebook

Join Us as a Local Community Builder!

Announcing Backfill Runs in Lakeflow Jobs for Higher Quality Downstream Data

🚀 New: Databricks Interactive Architecture Design Workshops

Introducing Community Pulse — Your Weekly Databricks Roundup!

Solution Accelerator Series | #5 - Automating Product Review Summarization with LLMs

Databricks DevConnect I Washington D.C.