- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2022 10:37 AM
Happy Friday afternoon fellow Bricksters! Got another question for you... I have a pyspark notebook that reads from redshift into a DF, does some 'stuff', then writes back to redshift. All good here. What I'm trying to do with no luck yet is first DROP TABLE IF EXISTS, then follow that with CREATE TABLE IF NOT EXISTS but can't seem to figure out how. Any suggestions are welcomed! thanks
- Labels:
-
Pyspark
Accepted Solutions
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2022 12:07 PM
Answered my own question!! check this out:
dropSQL = ("DROP TABLE IF EXISTS <tablename>;"). --note the semicolon at the end!
createSQL = ("CREATE TABLE IF NOT EXISTS <tablename> (field1 int, field2 date, etc...);")
preActionsSQL = dropSQL + createSQL
then when writing your dataframe to a Redshift table, include this:
.option("preactions", preActionsSQL)
Side note: if you use the .mode("overwrite") option, I *think it actually either drops or truncates the target table first anyway, but I haven't confirmed this...was too excited to share my findings with y'all !!!!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-12-2022 12:07 PM
Answered my own question!! check this out:
dropSQL = ("DROP TABLE IF EXISTS <tablename>;"). --note the semicolon at the end!
createSQL = ("CREATE TABLE IF NOT EXISTS <tablename> (field1 int, field2 date, etc...);")
preActionsSQL = dropSQL + createSQL
then when writing your dataframe to a Redshift table, include this:
.option("preactions", preActionsSQL)
Side note: if you use the .mode("overwrite") option, I *think it actually either drops or truncates the target table first anyway, but I haven't confirmed this...was too excited to share my findings with y'all !!!!

