11-25-2024 09:04 AM
Hi Team,
In a streamlit app (in databricks) while creating the spark session getting below error, this is happening when running the app via web link.
"[JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number"
Below is the code used in app.py:
import os
import streamlit as st
import pandas as pd
from databricks import sql
import pytz
from datetime import datetime
from pyspark.sql import SparkSession
from pyspark import SparkContext
from pyspark import SparkConf
# Create SparkSession
try:
spark = SparkSession.builder \
.appName("Streamlit App") \
.config("spark.driver.memory", "4g") \
.config("spark.executor.memory", "4g") \
.getOrCreate()
st.write("Spark Session created successfully!")
st.write("Spark Version:", spark.version)
except Exception as e:
st.write(f"Error creating Spark session: {e}")
#Create a sample DataFrame
data = {
'Name': ['Alice', 'Bob', 'Charlie', 'David'],
'Age': [25, 30, 35, 40],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston']
}
df = pd.DataFrame(data)
# Display the DataFrame in the Streamlit app
st.title('Sample DataFrame')
st.write(df)
# Convert the edited DataFrame to a Spark DataFrame
spark_df = spark.createDataFrame(df)
st.write(spark_df)
11-25-2024 03:54 PM
The error you're encountering, "[JAVA_GATEWAY_EXITED] Java gateway process exited before sending its port number", typically occurs when there are issues with the Java configuration or when PySpark is unable to establish a connection with the Java gateway process. Here are some potential solutions to address this issue:
11-26-2024 12:22 AM
I tried setting up JAVA_HOME explicitly, but it did not work. Issue i am facing is when launching the application using the web link, when running the same code in notebook its running fine. Let me know the fix.
4 weeks ago
Hey @roshan_robert. I am facing a similar issue. We are trying to launch a streamlit app from databricks apps that requires spark but Im getting the same error. I have yet to find a way to configure spark to work from my apps cluster. Did you ever resolve your problem?
3 weeks ago
I did not get a solution.
3 weeks ago
We got our app working by using the databricks-connect and creating a spark session on a shared compute cluster separate from the cluster my app runs on. I set the env variable DATABRICKS_CLUSTER_ID to the id of the cluster I want to use. Then in that cluster (and any catalog you need to access) you can set permission for your apps service principal. You can find the name of your apps service principal in the apps overview screen. You can set the SP to have permission to attach. You must also make sure that the version of databricks your cluster is running and the version of databricks-connect you are using are compatible (I just used version 14.* for both).
2 weeks ago
Per looking internally I see that Spark (Context) is not available in Apps
The recommended way would be to use our available SDKs and connect to Clusters/DBSQL. No spark context is available- it’s meant to defer processing to other compute it can connect to via sdk, sql connector, spark connect, jobs, etc
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group