DatabricksSession broken for 15.1
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-24-2024 03:40 AM
This code fails with exception:
[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
File <command-4420517954891674>, line 7 4 spark = DatabricksSession.builder.getOrCreate() 6 df = spark.read.table("samples.nyctaxi.trips") ----> 7 df.select(lit(5).alias('height')).show()
from databricks.connect import DatabricksSession
from pyspark.sql.functions import lit
spark = DatabricksSession.builder.getOrCreate()
df = spark.range(1)
df.select(lit(5).alias('height'), df.id).show()
Can you confirm this is a bug?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2024 10:33 PM
I am sorry but this is not helpfull. ChatGPT does not work well for PySpark code
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2024 07:17 AM - edited 05-27-2024 07:20 AM
It's an official example from pyspark documentation:
It works on older runtime, it used to work one week ago. Please fix your internal databricks connect on latest runtimes.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-27-2024 10:35 PM
We dont understand the issue becauuse it suddently appeared but fixed it with migrating to 15.2.
Maybe databricks released some 15.1.XXXX update that broke stuff?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2024 08:27 AM
Hi, I'm having the same problem using the 14.3LTS runtime.
The error just appeared yesterday. Before that, everything was working fine.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-29-2024 02:48 PM
We are also seeing this error in 14.3 LTS from a simple example:
from pyspark.sql.functions import col
df = spark.table('things')
things = df.select(col('thing_id')).collect()
[NOT_COLUMN_OR_STR] Argument `col` should be a Column or str, got Column.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
09-10-2024 05:51 PM
I can see this issue in 13.3 LTS, production code still running in 11.3LTS but upgradding to higher LTS DBR version gives this error. I believe you should fix it or provide a migration guide from one DBR to the other
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2024 08:13 AM
I also get the same for runtime 13.3 LTS. The same code with 15.2 LTS seems to work.
df.withColumn("new_col", concat("col1", lit("-"), "col2"))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-16-2024 10:47 AM
So actually it isn't working on runtime 15.
It is working on a shared cluster in runtime 15.4. But then I need also to use rdd for something and it fails on shared clusters.
D

