cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

URGENT HELP NEEDED: Python functions deployed in the cluster throwing the error

Rajaniesh
New Contributor III

Hi,

I have created a python wheel with the following code. And the package name is rule_engine

"""

The entry point of the Python Wheel

"""

import sys

from pyspark.sql.functions import expr, col

def get_rules(tag):

 """

  loads data quality rules from a table

  :param tag: tag to match

  :return: dictionary of rules that matched the tag

 """

  

 rules = {}

 df = spark.read.table("rules")

 for row in df.filter(col("tag") == tag).collect():

  rules[row['name']] = row['constraint']

 return rules

def get_quarantine_rules(tag):

 """

  loads data quality rules from a table

  :param tag: tag to match

  :return: dictionary of rules that matched the tag

 """

 all_rules_in_tags=get_rules(tag)

 qurantine_rule="NOT({0})".format(" AND ".join(all_rules_in_tags.values()))

 return qurantine_rule

Now after I install it into Databricks Cluster and then import it so I can call the function defined into it.

import rule_engine

rule_dict=rule_engine.get_quarantine_rules("maintained")

It throws this error:

NameError Traceback (most recent call last)

<command-502204870200978> in <cell line: 2>()

1 import rule_engine

----> 2 rule_dict=rule_engine.get_quarantine_rules("maintained")

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/rule_engine/functions.py in get_quarantine_rules(tag)

27 :return: dictionary of rules that matched the tag

28 """

---> 29 all_rules_in_tags=get_rules(tag)

30 qurantine_rule="NOT({0})".format(" AND ".join(all_rules_in_tags.values()))

31 return qurantine_rule

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.9/site-packages/rule_engine/functions.py in get_rules(tag)

15 """

16 rules = {}

---> 17 df = spark.read.table("rules")

18 for row in df.filter(col("tag") == tag).collect():

19 rules[row['name']] = row['constraint']

NameError: name 'spark' is not defined

Regards

Rajaniesh

3 REPLIES 3

Anonymous
Not applicable

Hi @Rajaniesh Kaushikk​ 

Great to meet you, and thanks for your question!

Let's see if your peers in the community have an answer to your question. Thanks.

Kaniz
Community Manager
Community Manager

Hi @RajanieshIt seems you’re encountering a NameError related to the ‘spark’ object.

Let’s address this issue.

In PySpark, the ‘spark’ object is a SparkSession that is automatically created when you use the Spark shell or PySpark shell. However, when writing a PySpark program in a .py file, you need to explicitly create the SparkSession object using the builder.

Here’s how you can resolve the error:

  1. Import the necessary modules:

    • Make sure you have imported the required PySpark modules at the beginning of your script. You can add the following lines to your code:
      from pyspark.sql import SparkSession
      
  2. Create the SparkSession:

    • Explicitly create a SparkSession object using the builder. You can do this by adding the following lines before using the ‘spark’ object:
      spark = SparkSession.builder \
          .appName("YourAppName") \
          .getOrCreate()
      
  3. Check for other issues:

    • Ensure that you have installed PySpark and that your environment is set up correctly.
    • If you encounter a ‘No module named pyspark’ error, consider using the findspark library to set up the environment. You can install it using pip install findspark.

Here’s an example of how to create the SparkSession:

# Import PySpark and create SparkSession
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("YourAppName") \
    .getOrCreate()

# Now you can use the 'spark' object in your code
# ...

# Don't forget to stop the SparkSession when done
spark.stop()

Make sure to adjust the appName according to your application’s name. Once you’ve made these changes, your code should work without the ‘spark’ object error. Happy coding! 😊

jose_gonzalez
Moderator
Moderator
Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.