cancel
Showing results for 
Search instead for 
Did you mean: 

Writing modular code in Databricks

Mr__D
New Contributor II

Hi All,

Could you please suggest to me the best way to write PySpark code in Databricks,

I don't want to write my code in Databricks notebook but create python files(modular project) in Vscode and call only the primary function in the notebook(the rest of the logic will be written in python files).

Could you please let me know the best way to achieve it?

Thanks,

Deepak

1 ACCEPTED SOLUTION

Accepted Solutions

Anonymous
Not applicable

@Deepak Bhatt​ :

Yes, you can write your PySpark code in modular Python files outside of Databricks and then call them from a Databricks notebook. Here are the steps you can follow:

  1. Create a Python file in your local development environment (e.g., VS Code) and write your PySpark code in it. You can define a main function in this file which will be called from the Databricks notebook.
  2. Save the Python file to a Git repository or a cloud storage service such as Azure Blob Storage or Amazon S3.
  3. In the Databricks notebook, you can clone the Git repository or mount the cloud storage service to access the Python file.
  4. Import the Python file in your notebook using the Python import statement. For example, if your Python file is named my_pyspark_code.py, you can import it like this:
import my_pyspark_code

Call the main function in your Python file from the Databricks notebook. For example, if your main function is named run_spark_job() you can call it like this:

my_pyspark_code.run_spark_job()

By following these steps, you can write your PySpark code in a modular and maintainable way outside of Databricks, and then easily call it from a Databricks notebook.

View solution in original post

2 REPLIES 2

Anonymous
Not applicable

@Deepak Bhatt​ :

Yes, you can write your PySpark code in modular Python files outside of Databricks and then call them from a Databricks notebook. Here are the steps you can follow:

  1. Create a Python file in your local development environment (e.g., VS Code) and write your PySpark code in it. You can define a main function in this file which will be called from the Databricks notebook.
  2. Save the Python file to a Git repository or a cloud storage service such as Azure Blob Storage or Amazon S3.
  3. In the Databricks notebook, you can clone the Git repository or mount the cloud storage service to access the Python file.
  4. Import the Python file in your notebook using the Python import statement. For example, if your Python file is named my_pyspark_code.py, you can import it like this:
import my_pyspark_code

Call the main function in your Python file from the Databricks notebook. For example, if your main function is named run_spark_job() you can call it like this:

my_pyspark_code.run_spark_job()

By following these steps, you can write your PySpark code in a modular and maintainable way outside of Databricks, and then easily call it from a Databricks notebook.

Anonymous
Not applicable

Hi @Deepak Bhatt​ 

Hope all is well! Just wanted to check in if you were able to resolve your issue and would you be happy to share the solution or mark an answer as best? Else please let us know if you need more help. 

We'd love to hear from you.

Thanks!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.