NameError: name 'col' is not defined
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2019 03:15 PM
I m executing the below code and using Pyhton in notebook and it appears that the col() function is not getting recognized .
I want to know if the col() function belongs to any specific Dataframe library or Python library .I dont want to use pyspark api and would like to write code using sql dataframes API
Trying to run the below code and getting error -NameError: name 'col' is not defined
peopleDF = spark.read.parquet("/mnt/training/dataframes/people-10m.parquet") peopleDF.printSchema() peopleDF.show() peopleDF.select(col("firstName")).filter(col("firstName"))=="An"
As per SPARK doc
https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.Column
df("columnName") // On a specific `df` DataFrame.
col("columnName") // A generic column not yet associated with a DataFrame.
col("columnName.field") // Extracting a struct field
col("`a.column.with.dots`") // Escape `.` in column names.
$"columnName" // Scala short hand for a named column.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-22-2019 02:18 AM
as the document describe generic column not yet associated. Please refer the below code.
display(peopleDF.select("firstName").filter("firstName = 'An'"))