3 weeks ago
This code fails with exception:
[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ----> 7 df.select(lit(5).alias('height')).show()
from databricks.connect import DatabricksSession
from pyspark.sql.functions import lit
spark = DatabricksSession.builder.getOrCreate()
df = spark.range(1)
df.select(lit(5).alias('height'), df.id).show()
Can you confirm this is a bug?
3 weeks ago
Hi @TWib, The error message youโre encounteringโ[NOT_COLUMN_OR_STR] Argument 'col' should be a Column or str, got Column
โindicates that thereโs an issue with the select
operation in your code.
Letโs break it down:
The problem occurs in this line:
df.select(lit(5).alias('height')).show()
Youโre trying to create a new column named โheightโ with a constant value of 5 using the lit(5)
function. However, the error suggests that the argument passed to select
is not a valid column or string.
The issue lies in the use of lit(5).alias('height')
. The lit(5)
creates a literal value (in this case, the integer 5), but itโs not a valid column expression. The .alias('height')
part is an attempt to rename the column, but itโs not working as expected.
To fix this, you can directly create a new column with a constant value using the lit
function without the alias:
df.withColumn('height', lit(5)).show()
This will add a new column named โheightโ with a value of 5 to all rows in your DataFrame.
If you encounter any other issues or need further assistance, feel free to ask! ๐
3 weeks ago
I am sorry but this is not helpfull. ChatGPT does not work well for PySpark code
3 weeks ago
Hi @TWib, References -
I apologize if my previous response didnโt meet your expectations. If you have any other questions or need further assistance, feel free to ask. Have a great day! ๐
3 weeks ago - last edited 3 weeks ago
It's an official example from pyspark documentation:
It works on older runtime, it used to work one week ago. Please fix your internal databricks connect on latest runtimes.
3 weeks ago
We dont understand the issue becauuse it suddently appeared but fixed it with migrating to 15.2.
Maybe databricks released some 15.1.XXXX update that broke stuff?
3 weeks ago
Hi, I'm having the same problem using the 14.3LTS runtime.
The error just appeared yesterday. Before that, everything was working fine.
3 weeks ago
We are also seeing this error in 14.3 LTS from a simple example:
from pyspark.sql.functions import col
df = spark.table('things')
things = df.select(col('thing_id')).collect()
[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
Excited to expand your horizons with us? Click here to Register and begin your journey to success!
Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!