cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Pass Databricks's Spark session to a user defined module

viniaperes
New Contributor II

Hello everyone,

I have a .py file (not a notebook) where I have the following class with the following constructor:

class DataQualityChecker:
    def __init__(self, spark_session: SparkSession, df: DataFrame, quality_config_filepath: str) -> None:
        self.__spark = spark_session
        self.__df = df
        self.__quality_config = self.__import_quality_config(quality_config_filepath)
        self.__error_constraints = None

I'm trying to use this class in a notebook as follows:

from data_quality import DataQualityChecker

quality_checker = DataQualityChecker(spark, df, quality_config)

 However, I am receiving the following error in the code snippet above:

TypeError: __init__() takes 3 positional arguments but 4 were given

Does anyone have any idea what it could be? I suspect the problem is trying to pass the spark variable as a parameter to the class constructor, but I don't know how to solve it.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz
Community Manager
Community Manager

Hi @viniaperes, It looks like you're passing all the required arguments to your DataQualityChecker constructor, but the SparkSession parameter is being interpreted as a positional argument instead of a keyword argument.

To fix this, you can explicitly specify the name of the SparkSession parameter as a keyword argument:

 
quality_checker = DataQualityChecker(spark_session=spark, df=df, quality_config_filepath=quality_config)​

By specifying the parameter names as keywords, Python can match them properly to the constructor parameters.

View solution in original post

1 REPLY 1

Kaniz
Community Manager
Community Manager

Hi @viniaperes, It looks like you're passing all the required arguments to your DataQualityChecker constructor, but the SparkSession parameter is being interpreted as a positional argument instead of a keyword argument.

To fix this, you can explicitly specify the name of the SparkSession parameter as a keyword argument:

 
quality_checker = DataQualityChecker(spark_session=spark, df=df, quality_config_filepath=quality_config)​

By specifying the parameter names as keywords, Python can match them properly to the constructor parameters.
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.