03-23-2023 11:10 AM
Hi All,
Could you please suggest to me the best way to write PySpark code in Databricks,
I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the rest of the logic will be written in python files).
Could you please let me know the best way to achieve it?
Thanks,
Deepak
03-24-2023 11:57 PM
@Deepak Bhatt :
Yes, you can write your PySpark code in modular Python files outside of Databricks and then call them from a Databricks notebook. Here are the steps you can follow:
import my_pyspark_code
Call the main function in your Python file from the Databricks notebook. For example, if your main function is named run_spark_job() you can call it like this:
my_pyspark_code.run_spark_job()
By following these steps, you can write your PySpark code in a modular and maintainable way outside of Databricks, and then easily call it from a Databricks notebook.
03-24-2023 11:57 PM
@Deepak Bhatt :
Yes, you can write your PySpark code in modular Python files outside of Databricks and then call them from a Databricks notebook. Here are the steps you can follow:
import my_pyspark_code
Call the main function in your Python file from the Databricks notebook. For example, if your main function is named run_spark_job() you can call it like this:
my_pyspark_code.run_spark_job()
By following these steps, you can write your PySpark code in a modular and maintainable way outside of Databricks, and then easily call it from a Databricks notebook.
03-25-2023 10:48 PM
Hi @Deepak Bhatt
Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help.
We'd love to hear from you.
Thanks!
12-17-2023 07:30 AM
Hi all,
I have a very similar problem. I can write code perfectly in my github Repos here, but when I try to access it through an import command I receive the error:
ModuleNotFoundError: No module named 'module_name'
When I try to check my environment with os.getcwd() I see that my default path is databricks/driver. However, when I copy the path of my current environment through the files path, and use it in os.changedir() I receive the following error:
FileNotFoundError: [Errno 2] No such file or directory: "/Repos/repo_name"
Is there a quick fix for this? I usually do not have problems like this in VScode os Jupyter notebooks.
Thanks!
12-17-2023 07:50 AM
I understood the error now. It was quite easy actually. For me it was just about changing the .py script to the same cluster that the notebook.
Now it's working fine.
12-17-2023 11:52 AM
Sorry, I think I actually got it wrong in the comment above. It worked, but I also had to upload the .py to the dbfs file system. Still looking for a faster way to solve this issue.
01-13-2024 11:31 AM
Hi @ThiagoLDC ,
In order to import a user defined module, the .py file either needs to be in the same directory or you can place your file in Repo and import it form there.
In the notebook while importing the code form Repo you can import it like below:
import sys, os sys.path.append(os.path.abspath('<module-path>'))
from <pyfilename> import <class/function>
for detailed documentation refer
https://docs.databricks.com/en/delta-live-tables/import-workspace-files.html
01-17-2024 05:33 AM
Certainly! To write PySpark code in Databricks while maintaining a modular project in VSCode, you can organize your PySpark code into Python files in VSCode, with a primary function encapsulating the main logic. Then, upload these files to Databricks, create a Databricks notebook, and use the %run magic command to execute the primary function from the uploaded Python files, allowing you to keep the core logic outside of Databricks notebooks for better code organization and reusability.
Best wishes, Zpak
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group