Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-21-2025 01:44 AM
I have found a work around for this issue. Basically, I create a dummy_df and then I check if the dataframe I want to check has the same type as the dummy_df.
def get_dummy_df() -> DataFrame:
"""
Generates a dummy DataFrame with a range of integers.
This method creates a DataFrame containing integers starting from 0 up to (but not including) 2
using the current Spark session.
Returns:
DataFrame: A Spark DataFrame containing a single column with the values [0, 1].
"""
spark_session = SparkSession.builder.appName(
"dummy_df"
).getOrCreate()
return spark_session.range(0, 2)
def is_spark_df(df_to_check: DataFrame) -> bool:
"""
Checks if the provided object is a Spark DataFrame.
This function compares the type of the provided DataFrame with a dummy DataFrame created
using the `get_dummy_df()` function. This is necessary because in Databricks, depending
on the cluster configuration, the DataFrame type can vary. If you import
`pyspark.sql.dataframe`, your type check may fail because Databricks can provide
`pyspark.sql.connect.dataframe`.
Parameters:
df_to_check (DataFrame): The DataFrame instance to check.
Returns:
bool: True if the object is a Spark DataFrame, False otherwise.
For more information on this issue, please see:
https://community.databricks.com/t5/data-engineering/pyspark-sql-connect-dataframe-dataframe-vs-pyspark-sql-dataframe/td-p/71055
"""
return type(df_to_check) == type(get_dummy_df())Regards,
Gleydson C.