cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

How do I import a python module when deploying with DAB?

kenny_hero
New Contributor

Below is how the folder structure of my project looks like: 

resources/
  |- etl_event/
     |- etl_event.job.yml
src/
  |- pipeline/
     |- etl_event/
        |- transformers/
          |- transformer_1.py
  |- utils/
     |- logger.py
databricks.yml

I created a python module, logger.py, that I want to reuse in many transformers, so I put it in utils folder. I want transformer_1.py to import get_logger() function from the module:

# transformer_1.py
from pyspark import pipelines as dp
from utils.logger import get_logger

@dp.table
def load_events():
    logger = get_logger(__name__)
    logger.info('Loading events...')
    ...

After I deploy my project using DAB, running "etl_event" job fails because 'utils' module is not found.

I learned the following from researching the documents:

  1.  PYTHONPATH is not set to the root of the bundle.
  2. One suggestion is to use `sys.path.append()` before each import. This feels rather a hack that I must remember to manipulate system path before each import.
  3. Another suggestion is to build a python wheel file. I'm not sure if that applies to my project because the module(utils/logger.py) is already included in bundle deployed. I should not need to build a separate wheel file.

My question is what is the proper way to configure my project/bundle so that I can have reusable python modules to be used in pipeline transformers?

 

0 REPLIES 0

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local communityโ€”sign up today to get started!

Sign Up Now