cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Problem Visual Studio Plugin with custom modules

VtotheG
New Contributor

We are using the Databricks Visual Studio Plugin to write our python / spark code.We are using the upload file to databricks functionality because our organisation has turned unity catelog off. 

We are now running into a weird bug with custom modules. I have written a custom module in python which uses the parallelisation functionality. But when I haven't created / uploaded a .whl from my module yet, my main code is not executing properly. The problem seems like the workers can't seem to find my module code in the directory. The weird thing is that the driver actually sees the code and i able to run abstract methods from my module just fine (if it is just the driver). To reproduce this error i have set up a small project (see code below):

 

#################
###File 1########
#################

#the .main file (file 1)
from testbug import *  
data = [2,4,6,8,10]

#this works fine
DBMultiplyTest.multiply_value(6)

#this gives an error module testbug not found
rdd = spark.sparkContext.parallelize(data)
result = rdd.map(DBMultiplyTest.multiply_value).collect()
print(result)

#################
###File 2########
#################
from .multiply import DBMultiplyTest

##This is the init file. located in testbug/__init__.py (file 2)  
__all__ = ["DBMultiplyTest"]

#################
###File 3########
#################

##just a simple multipy class in file 3 testbug/multiply.py
class DBMultiplyTest:

    @staticmethod
    def multiply_value(x):
        return x*2 

 

When i upload this code to my repo and run this code from the Databricks Web Interface it just runs fine and i get the expected results. There seems to be a difference in how the Visual Studio plugin runs the code compared to how the Databricks interface executes the code.  

Anyone has an idea how to fix this or is this just a bug in the plugin?

0 REPLIES 0

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group