cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

I am getting ParseException: error while running the spark SQL query

AJ270990
Contributor II

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.

%python

from pyspark.sql import SparkSession     

# Create a SparkSession

spark = (SparkSession

 .builder

 .appName("SparkSQLExampleApp")

 .getOrCreate())

# Path to data set

csv_file = "dbfs:/mnt/Testing.csv"

# Read and create a temporary view

# Infer schema (note that for larger files you 

# may want to specify the schema)

df = (spark.read.format("csv")

   .option("inferSchema", "true")

   .option("header", "true")

   .load(csv_file))

df.createOrReplaceTempView("US_CPSC_AEP_TBL")

spark.sql("""select sum(cast(enrollment as float)), sum(cast(growth as float)), [plan type], [Parent Organization], state, [Special Needs Plan], [Plan Name Sec A],

       CASE when [Plan ID] between '800' and '899' then '899'

          else '1'

       END as plan_id

       FROM US_CPSC_AEP_TBL

       WHERE [Plan Name Sec A] is not null

       group by [Parent Organization],[plan type], state, [Special Needs Plan], [Plan Name Sec A], 

       CASE when [Plan ID] between '800' and '899' then '899'

          else '1'

       END

       having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0""")

1 ACCEPTED SOLUTION

Accepted Solutions

AJ270990
Contributor II

This is resolved. Below query works fine now

sqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END as plan_id from US_CPSC_AEP_TBL WHERE `Plan Name Sec A` is not null group by `Parent Organization`,`plan type`, state, `Special Needs Plan`, `Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0")

View solution in original post

3 REPLIES 3

Zair
New Contributor II

Hi @Abhishek Jain​,

can you post the exact error as well, try one thing use ` and ` to inclose your fields e.g., [`plan type`]

Thanks,

AJ270990
Contributor II

@Agha Zair Ali​ Thanks for looking into this. Below is the error screenshot. I also added  ` but no success

image.png

AJ270990
Contributor II

This is resolved. Below query works fine now

sqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END as plan_id from US_CPSC_AEP_TBL WHERE `Plan Name Sec A` is not null group by `Parent Organization`,`plan type`, state, `Special Needs Plan`, `Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0")

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.