โ09-11-2024 10:19 PM
I am trying to create spark SQL function in particular schema
(i.e)
spark.sql(" CREATE OR REPLACE FUNCTION <spark_catalog>.<schema_name>.<function_name()> RETURNS STRING RETURN <value>")
This works perfectly fine on Databricks using notebooks.
But, I need to use this same in my project which is to run in local machine (VScode). But I am facing issues. Please advise me on how to implement this in my local machine.
โ09-12-2024 12:18 AM
Hi @badari_narayan ,
In VSCode, install the Databricks extension and then connect it to your existing Databricks workspace and cluster.
This setup allows you to run Databricks notebooks and scripts from your local VSCode, while using the Spark context of the connected Databricks cluster.
Explanation:
Databricks Extension in VSCode: The Databricks extension for VSCode allows you to connect to your Databricks workspace, access notebooks, and run code directly from VSCode.
Use of Spark Context: When connected to a Databricks cluster, the code you execute from VSCode uses the cluster's Spark context. This means computations and data processing are performed on the Databricks cluster, not on your local machine.
โ09-12-2024 02:26 AM
Hi @filipniziol,
Can you provide any sample code snippets or tutorial video. So that I can make sure I making no mistakes from my side.
โ09-12-2024 02:34 AM
Sure, check this:
Databricks Extension for VS Code: A Hands-On Tutorial (youtube.com)
โ09-16-2024 02:04 AM
Hi @filipniziol ,
Thanks for the response, in this tutorial they are running via uploading or as workflows to the Databricks, it will be suitable for single file, But in my case, I need to run entire project and I don't want it to be uploaded to Databricks, just need it to run in local VSCode.
Do you have any ideas for this?
โ09-23-2024 03:39 AM - edited โ09-23-2024 03:49 AM
To run locally in the vscode, install pyspark and then start a local spark session:
from pyspark.sql import SparkSession
spark = SparkSession.builder \
.master("local[*]") \
.getOrCreate()
This setup should let the project run locally without Databricks while you know what pi sign can do.
โ09-16-2024 02:35 AM
Hi @badari_narayan ,
In general you may run pyspark project locally, but with limitations.
spark = SparkSession.builder \
.appName("LocalSparkApp") \
.master("local[*]") \
.getOrCreate()โ
In order to work it locally you will have limitations and also you will need to set it up properly.
For example you will need to create first your spark instance like above (in databricks workspace it is already available). Also, you will be able to run Spark SQL available in pyspark, but not Databricks prioprietary SQL.
Locally, you also do not have a unity catalog configured in your workspace.
Recently Databricks team made unity catalog open source, so you may check the pages below:
https://www.unitycatalog.io/
https://github.com/unitycatalog/unitycatalog
Still, you will need to setup your local unity catalog, if you do not want to connect databricks workspace and then to run the code to create the function in your local unity catalog.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonโt want to miss the chance to attend and share knowledge.
If there isnโt a group near you, start one and help create a community that brings people together.
Request a New Group