3 weeks ago
Hi,
I am familiar with object oriented programming and cannot really get my head around the philosophy of coding in Databricks. My approach that naturally consists in creating classes and instantiating objects does not seem to be the right one.
Can someone explain to me how Databricks is expected to be used, and if this is possible to do so, then how to create a simple 'foo' class that I can instantiate outside of its notebook definition?
Thanks -
3 weeks ago - last edited 3 weeks ago
Hi @Alex79 ,
Typically you define all the logic in plain old python modules, where you can have classes and functions. So for example, you can define following python module (in this case I'm defining simple function, but it could be class - it doesn't matter for the sake of example).
And now my notebook could be the client of that module and import it.
I would say that is the most common approach. So you have all the logic defined in python module that you can reuse and then those libraries are imported in notebooks (so notebooks act like orchestrator of the code).
But you can also use python wheel files if you want or stick only to py files and orchestrate those. You don't have use notebooks at all.
3 weeks ago - last edited 3 weeks ago
Hi @Alex79 ,
Typically you define all the logic in plain old python modules, where you can have classes and functions. So for example, you can define following python module (in this case I'm defining simple function, but it could be class - it doesn't matter for the sake of example).
And now my notebook could be the client of that module and import it.
I would say that is the most common approach. So you have all the logic defined in python module that you can reuse and then those libraries are imported in notebooks (so notebooks act like orchestrator of the code).
But you can also use python wheel files if you want or stick only to py files and orchestrate those. You don't have use notebooks at all.
3 weeks ago
OK, makes sense to me, thanks!
Perfect!
3 weeks ago
No problem @Alex79 . If the answer was helpful to you, please consider marking it as accepted soution.
3 weeks ago - last edited 3 weeks ago
Cool @szymon_dybczak!
Thanks for this. Will we have to define absolute/relative parts in the "from __ import ___" statement if the module doesn't live in the same directory as the notebook? What's the normal practice for this in production? For instance, if you want to share this module (which will likely be updated) with your colleagues, I need to look into what Python Wheel Files offer.
All the best,
BS
3 weeks ago
Hi @BS_THE_ANALYST ,
If modules don't live in the same directory you need to add directory that contains your module to sys.path. You can specify directories using a relative path, as in the following example:
import sys
import os
sys.path.append(os.path.abspath('..'))
Here's a documentation entry where all of this is well described:
Work with Python and R modules | Databricks Documentation
If there is a need to share module with colleagues than I guess wheel file is a perfect fit for that purpose 🙂
3 weeks ago
3 weeks ago
No problem, happy that I could clarify some things for you 🙂
Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!
Sign Up Now