cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

I am getting ParseException: error while running the spark SQL query

AJ270990
Contributor II

I am using below code to create the Spark session and also loading the csv file. Spark session and loading csv is running well. However SQL query is generating the Parse Exception.

%python

from pyspark.sql import SparkSession     

# Create a SparkSession

spark = (SparkSession

 .builder

 .appName("SparkSQLExampleApp")

 .getOrCreate())

# Path to data set

csv_file = "dbfs:/mnt/Testing.csv"

# Read and create a temporary view

# Infer schema (note that for larger files you 

# may want to specify the schema)

df = (spark.read.format("csv")

   .option("inferSchema", "true")

   .option("header", "true")

   .load(csv_file))

df.createOrReplaceTempView("US_CPSC_AEP_TBL")

spark.sql("""select sum(cast(enrollment as float)), sum(cast(growth as float)), [plan type], [Parent Organization], state, [Special Needs Plan], [Plan Name Sec A],

       CASE when [Plan ID] between '800' and '899' then '899'

          else '1'

       END as plan_id

       FROM US_CPSC_AEP_TBL

       WHERE [Plan Name Sec A] is not null

       group by [Parent Organization],[plan type], state, [Special Needs Plan], [Plan Name Sec A], 

       CASE when [Plan ID] between '800' and '899' then '899'

          else '1'

       END

       having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0""")

1 ACCEPTED SOLUTION

Accepted Solutions

AJ270990
Contributor II

This is resolved. Below query works fine now

sqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END as plan_id from US_CPSC_AEP_TBL WHERE `Plan Name Sec A` is not null group by `Parent Organization`,`plan type`, state, `Special Needs Plan`, `Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0")

View solution in original post

3 REPLIES 3

Zair
New Contributor III

Hi @Abhishek Jain​,

can you post the exact error as well, try one thing use ` and ` to inclose your fields e.g., [`plan type`]

Thanks,

AJ270990
Contributor II

@Agha Zair Ali​ Thanks for looking into this. Below is the error screenshot. I also added  ` but no success

image.png

AJ270990
Contributor II

This is resolved. Below query works fine now

sqldf = spark.sql("select sum(cast(enrollment as float)), sum(cast(growth as float)),`plan type`,`Parent Organization`,state,`Special Needs Plan`,`Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END as plan_id from US_CPSC_AEP_TBL WHERE `Plan Name Sec A` is not null group by `Parent Organization`,`plan type`, state, `Special Needs Plan`, `Plan Name Sec A`, CASE when `Plan ID` between '800' and '899' then '899' else '1' END having sum(cast(enrollment as float)) = 0 and sum(cast(growth as float)) = 0")

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group