08-19-2024 01:44 AM
Hi,
I want to access the stage and job information (usually available through Spark UI) through the REST API provided by Spark: http://<server-url>:18080/api/v1/applications/[app-id]/stages. More information can be found at following link: https://spark.apache.org/docs/latest/monitoring.html#rest-api
Now to access this API, we need the server URL. But I am having trouble while trying to find this server URL. Another similar discussion on this forum highlighted that I can obtain this URL by copying the URL present when Spark UI is opened.
Please let me know how can these API's be accessed through Databricks. Thanks in advance.
08-21-2024 08:31 AM
Hi @prathameshJoshi The url mentioned by @Retired_mod seems to be the correct one.
You can use this script to interact with the monitoring REST API.
from dbruntime.databricks_repl_context import get_context
import requests
host = get_context().browserHostName
cluster_id = get_context().clusterId
spark_ui_api_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications'
requests.get(spark_ui_api_url + endpoint, headers={"Authorization": f"Bearer {get_context().apiToken}"}).json()
08-19-2024 06:22 AM
HI @prathameshJoshi ,
You can find this kind of information when you go to compute and click advanced options:
08-20-2024 02:29 AM
@szymon_dybczak I have tried using it directly as well as the http path mentioned in the image you have posted. I have even tried using the spark ui url and even the url with only the cluster id. Nothing has worked for me. Perhaps if possible could you show me a dummy url which works with the Spark REST API for accessing jobs and stages.
For ex. sending request to this url - https://adb-1234.0.azuredatabricks.net/api/v1/applications yields following error:
08-20-2024 05:53 AM
Hi @prathameshJoshi,
I was able to get the API url using this piece of code and is working on my browser.
Not sure how to authenticate while making calls programmatically.
from databricks_api import DatabricksAPI
from dbruntime.databricks_repl_context import get_context
databricks_api_instance = DatabricksAPI(
host=get_context().apiUrl,
token=get_context().apiToken,
)
host = get_context().browserHostName
cluster_id = get_context().clusterId
spark_context_id = databricks_api_instance.cluster.get_cluster(get_context().clusterId)['spark_context_id']
spark_ui_api_url = f"https://{host}/sparkui/{cluster_id}/driver-{spark_context_id}/api/v1/"
endpoint = 'applications'
print(spark_ui_api_url + endpoint)
08-20-2024 11:24 PM
Thanks for providing a starting point, I tried out the URL which you have provided, but its not working when I try to send a request to it. I tried passing the Access token as a bearer token, but the request is sending back some login html page back.. Please find the output attached. Please let me know if there's any chance to fix it.
Thanks in Advance.
Thanks for providing a starting point, I tried out the URL which you have provided, but its not working when I try to send a request to it. I tried passing the Access token as a bearer token, but the request is sending back some login html page back.. Please find the output attached. Please let me know if there's any chance to fix it.
Thanks in Advance.
<!doctype html>
<html>
<head>
<meta charset="utf-8">
<meta http-equiv="Content-Language" content="en">
<title>Databricks - Sign In</title>
<meta name="viewport" content="width=960">
<link rel="icon" type="image/png" href="https://databricks-ui-assets.azureedge.net/favicon.ico">
<meta http-equiv="content-type" content="text/html; charset=UTF8">
<script id="__databricks_react_script"></script>
<script>window.__DATABRICKS_SAFE_FLAGS__={
"databricks.infra.showErrorModalOnFetchError": true,
"databricks.fe.infra.useReact18": true,
"databricks.fe.infra.useReact18NewAPI": false,
"databricks.fe.infra.fixConfigPrefetch": true
},window.__DATABRICKS_CONFIG__={
"isCuttingEdge": false,
"publicPath": {
"mlflow": "https://databricks-ui-assets.azureedge.net/",
"dbsql": "https://databricks-ui-assets.azureedge.net/",
"feature-store": "https://databricks-ui-assets.azureedge.net/",
"monolith": "https://databricks-ui-assets.azureedge.net/",
"jaws": "https://databricks-ui-assets.azureedge.net/"
}
}</script>
<link rel="icon" href="https://databricks-ui-assets.azureedge.net/favicon.ico">
<script>
function setNoCdnAndReload() {
document.cookie = `x-databricks-cdn-inaccessible=true; path=/; max-age=86400`;
const metric = 'cdnFallbackOccurred';
const browserUserAgent = navigator.userAgent;
const browserTabId = window.browserTabId;
const performanceEntry = performance.getEntriesByType('resource').filter(e => e.initiatorType === 'script').slice(-1)[
0
]
sessionStorage.setItem('databricks-cdn-fallback-telemetry-key', JSON.stringify({ tags: { browserUserAgent, browserTabId
}, performanceEntry
}));
window.location.reload();
}
</script>
<script>
// Set a manual timeout for dropped packets to CDN
function loadScriptWithTimeout(src, timeout) {
return new Promise((resolve, reject) => {
const script = document.createElement('script');
script.defer = true;
script.src=src;
script.onload = resolve;
script.onerror = reject;
document.head.appendChild(script);
setTimeout(() => {
reject(new Error('Script load timeout'));
}, timeout);
});
}
loadScriptWithTimeout('https: //databricks-ui-assets.azureedge.net/static/js/login/login.acadbe8a.js', 10000).catch(setNoCdnAndReload);
</script>
</head>
<body class="light-mode">
<uses-legacy-bootstrap>
<div id="login-page"></div>
</uses-legacy-bootstrap>
<script>const telemetryEndpoint="/telemetry-unauth?t=",uiModuleName="workspaceLogin";function shouldIgnoreError(e){return!1
}function generateUuidV4(){const e=window.crypto?.randomUUID?.();return e||"xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx".replace(/[xy
]/g,(e=>{const n=16*Math.random()|0;return("x"===e?n: 3&n|8).toString(16)
}))
}function networkConnectivityTags(){const e=window.navigator.onLine,n=window.navigator.connection?.rtt??-1,t=window.navigator.connection?.downlink??-1;return{browserNavigatorOnline:e,browserConnectionEstimatedRtt:n,browserConnectionEstimatedDownlink:t,browserConnected:e&&n>0&&t>0
}
}function createTelemetryRequestBody(e,n={},t=null){const o=Math.round(Date.now()/1e3),r={eventId:generateUuidV4(),metric:e,tags: {...n,...networkConnectivityTags(),browserTabId:window.browserTabId,browserUserAgent:navigator.userAgent
},ts:o
};return t&&(r.blob=t),JSON.stringify({uploadTime:o,items: [JSON.stringify(r)
]
})
}function recordTelemetry(e,n={},t=""){const o={method: "POST",credentials: "include",body:createTelemetryRequestBody(e,n,t)
};fetch(telemetryEndpoint+Date.now(),o)
}window.__databricks_networkConnectivityTags=networkConnectivityTags,Object.defineProperty(window,
"browserTabId",
{value:generateUuidV4()
}),window.recordTelemetry=recordTelemetry,recordTelemetry("uiInit",
{uiModule:uiModuleName,eventId: "init",eventClientSource:uiModuleName,eventType: "init"
});let logCount=0;function error_handler(e,n,t,o,r){logCount++>4||shouldIgnoreError(e)||recordTelemetry("uncaughtJsException",
{eventType: "jsExceptionV3",jsExceptionMessage:e,jsExceptionSource:n,jsExceptionLineno:t,jsExceptionColno:o,jsExceptionBeforeInit:!0
},r&&r.stack&&r.stack.toString())
}function sendBeaconOnPageExit(e){if(navigator.sendBeacon){const n=e&&e.type||"unknown",t=(Math.round(Date.now()/1e3),createTelemetryRequestBody("uiInit",
{eventType: "pageExitBeforeAppInitComplete",eventName:n,eventClientSource:uiModuleName
}));navigator.sendBeacon(telemetryEndpoint+Date.now(),t)
}
}window.onerror=error_handler,window.onunhandledrejection=function(e){error_handler(String(e.reason),
null,
null,
null,e.reason)
},window.addEventListener("beforeunload",sendBeaconOnPageExit),window.addEventListener("unload",sendBeaconOnPageExit),window.addEventListener("pagehide",sendBeaconOnPageExit),window.cleanupAfterAppInit=()=>{window.removeEventListener("beforeunload",sendBeaconOnPageExit),window.removeEventListener("unload",sendBeaconOnPageExit),window.removeEventListener("pagehide",sendBeaconOnPageExit)
}</script>
</body>
</html>
08-20-2024 11:26 PM
Hi Kaniz,
The solution posted by @menotron is giving me some errors. Once those are fixed, I will mark the appropriate response as accepted solution.
Thank You
08-21-2024 08:31 AM
Hi @prathameshJoshi The url mentioned by @Retired_mod seems to be the correct one.
You can use this script to interact with the monitoring REST API.
from dbruntime.databricks_repl_context import get_context
import requests
host = get_context().browserHostName
cluster_id = get_context().clusterId
spark_ui_api_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications'
requests.get(spark_ui_api_url + endpoint, headers={"Authorization": f"Bearer {get_context().apiToken}"}).json()
08-23-2024 02:40 AM
Hi @Retired_mod and @menotron ,
Thanks a lot; your solutions are working. I apologise for the delay, as I had some issue logging in.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group