How to check databricks-connect types of objects.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
3 weeks ago - last edited 3 weeks ago
While using `databricks-sdk` in my code, I've found that checking PySpark objects types is not reliable anymore.
I've used to do the following:
from pyspark.sql import Column, DataFrame, SparkSession
isinstance(spark, SparkSession)
isinstance(a_df, DataFrame)
isinstance(a_col, Column)
And so I've found that this isn't reliable as it dependes on the context in which I'm running Spark.
That is to say:
type(spark) -> pyspark.sql.(connect).session.SparkSession
type(a_df) -> pyspark.sql.(connect).dataframe.DataFrame
type(a_col) -> pyspark.sql.(connect).column.Column
Where depending on the context Databricks may add it's `connect` module to the object in question.
Am I not supposed to check types when using PySpark in Databricks?
Is there a reference for this change that I can follow?
And more practically, how do I check if the dataframe is a dataframe?
The only way I can think of at the moment is:
from pyspark.sql import SparkSession, Column, DataFrame
from pyspark.sql.connect import session, dataframe, column
isinstance(spark, (SparkSession, session.SparkSession))
isinstance(a_df, (DataFrame, dataframe.DataFrame))
isinstance(a_col, (Column, column.Column))
Is there a more natural way to do this type of checking?

