Databricks Community

AJ270990 · ‎08-09-2022

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.

%python

from pyspark.sql import SparkSession

# Create a SparkSession

spark = (SparkSession

.builder

.appName("SparkSQLExampleApp")

.getOrCreate())

# Path to data set

csv_file = "dbfs:/mnt/Testing.csv"

# Read and create a temporary view

# Infer schema (note that for larger files you

# may want to specify the schema)

df = (spark.read.format("csv")

.option("inferSchema", "true")

.option("header", "true")

.load(csv_file))

df.createOrReplaceTempView("US_CPSC_AEP_TBL")

spark.sql("""select sum(cast(enrollment as float)), sum(cast(growth as float)), [plan type], [Parent Organization], state, [Special Needs Plan], [Plan Name Sec A],

CASE when [Plan ID] between '800' and '899' then '899'

else '1'

END as plan_id

FROM US_CPSC_AEP_TBL

WHERE [Plan Name Sec A] is not null

group by [Parent Organization],[plan type], state, [Special Needs Plan], [Plan Name Sec A],

CASE when [Plan ID] between '800' and '899' then '899'

else '1'

END

having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0""")

AJ270990 · ‎08-10-2022

This is resolved. Below query works fine now

sqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END as plan_id from US_CPSC_AEP_TBL WHERE `Plan Name Sec A` is not null group by `Parent Organization`,`plan type`, state, `Special Needs Plan`, `Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0")

View solution in original post

Zair · ‎08-10-2022

Hi @Abhishek Jain,

can you post the exact error as well, try one thing use ` and ` to inclose your fields e.g., [`plan type`]

Thanks,

AJ270990 · ‎08-10-2022

@Agha Zair Ali Thanks for looking into this. Below is the error screenshot. I also added ` but no success

AJ270990 · ‎08-10-2022

This is resolved. Below query works fine now

sqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END as plan_id from US_CPSC_AEP_TBL WHERE `Plan Name Sec A` is not null group by `Parent Organization`,`plan type`, state, `Special Needs Plan`, `Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0")

Databricks Community

I am getting ParseException: error while running the spark SQL query

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences