05-24-2024 03:40 AM
This code fails with exception:
[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ----> 7 df.select(lit(5).alias('height')).show()
from databricks.connect import DatabricksSession
from pyspark.sql.functions import lit
spark = DatabricksSession.builder.getOrCreate()
df = spark.range(1)
df.select(lit(5).alias('height'), df.id).show()
Can you confirm this is a bug?
05-27-2024 05:09 AM
Hi @TWib, The error message you’re encountering—[NOT_COLUMN_OR_STR] Argument 'col' should be a Column or str, got Column
—indicates that there’s an issue with the select
operation in your code.
Let’s break it down:
The problem occurs in this line:
df.select(lit(5).alias('height')).show()
You’re trying to create a new column named ‘height’ with a constant value of 5 using the lit(5)
function. However, the error suggests that the argument passed to select
is not a valid column or string.
The issue lies in the use of lit(5).alias('height')
. The lit(5)
creates a literal value (in this case, the integer 5), but it’s not a valid column expression. The .alias('height')
part is an attempt to rename the column, but it’s not working as expected.
To fix this, you can directly create a new column with a constant value using the lit
function without the alias:
df.withColumn('height', lit(5)).show()
This will add a new column named ‘height’ with a value of 5 to all rows in your DataFrame.
If you encounter any other issues or need further assistance, feel free to ask! 😊
05-27-2024 10:33 PM
I am sorry but this is not helpfull. ChatGPT does not work well for PySpark code
05-28-2024 12:13 AM
Hi @TWib, References -
I apologize if my previous response didn’t meet your expectations. If you have any other questions or need further assistance, feel free to ask. Have a great day! 😊
05-27-2024 07:17 AM - edited 05-27-2024 07:20 AM
It's an official example from pyspark documentation:
It works on older runtime, it used to work one week ago. Please fix your internal databricks connect on latest runtimes.
05-27-2024 10:35 PM
We dont understand the issue becauuse it suddently appeared but fixed it with migrating to 15.2.
Maybe databricks released some 15.1.XXXX update that broke stuff?
05-29-2024 08:27 AM
Hi, I'm having the same problem using the 14.3LTS runtime.
The error just appeared yesterday. Before that, everything was working fine.
05-29-2024 02:48 PM
We are also seeing this error in 14.3 LTS from a simple example:
from pyspark.sql.functions import col
df = spark.table('things')
things = df.select(col('thing_id')).collect()
[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
a week ago
I can see this issue in 13.3 LTS, production code still running in 11.3LTS but upgradding to higher LTS DBR version gives this error. I believe you should fix it or provide a migration guide from one DBR to the other
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group