cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to obtain the server url for using spark's REST API

prathameshJoshi
New Contributor III

Hi,

I want to access the stage and job information (usually available through Spark UI) through the REST API provided by Spark: http://<server-url>:18080/api/v1/applications/[app-id]/stages. More information can be found at following link: https://spark.apache.org/docs/latest/monitoring.html#rest-api

Now to access this API, we need the server URL. But I am having trouble while trying to find this server URL. Another similar discussion on this forum highlighted that I can obtain this URL by copying the URL present when Spark UI is opened.

Please let me know how can these API's be accessed through Databricks. Thanks in advance.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @prathameshJoshi, Try this. Should be something like this:

ADMIN_TOKEN="dapi9___________________________"
WORKSPACE="my_workspace.cloud.databricks.com"
CLUSTER="__________"
MPORT="40001"

PREFIX="https://${WORKSPACE}/driver-proxy-api/o/0/${CLUSTER}/${MPORT}"

curl -L -H "Authorization: Bearer $ADMIN_TOKEN" -X GET ${PREFIX}

View solution in original post

menotron
Valued Contributor

Hi @prathameshJoshi The url mentioned by @Kaniz_Fatma seems to be the correct one.
You can use this script to interact with the monitoring REST API.

from dbruntime.databricks_repl_context import get_context
import requests

host = get_context().browserHostName
cluster_id = get_context().clusterId

spark_ui_api_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications'

requests.get(spark_ui_api_url + endpoint, headers={"Authorization": f"Bearer {get_context().apiToken}"}).json()

View solution in original post

9 REPLIES 9

szymon_dybczak
Contributor III

HI @prathameshJoshi ,

You can find this kind of information when you go to compute and click advanced options:

Slash_0-1724073743871.png

 

@szymon_dybczak I have tried using it directly as well as the http path mentioned in the image you have posted. I have even tried using the spark ui url and even the url with only the cluster id. Nothing has worked for me. Perhaps if possible could you show me a dummy url which works with the Spark REST API for accessing jobs and stages.

For ex. sending request to this url - https://adb-1234.0.azuredatabricks.net/api/v1/applications yields following error:

{
    "error": "Bad Target: /api/v1/applications"
}
Even if we add the http path mentioned like: https://adb-1234.azuredatabricks.net/sql/protocolv1/o/4567/cluster_id/api/v1/applications

we get the error Path must be of form /sql/protocolv1/o/<orgId>/<clusterIdent>

The errors are quite obvious but we don't know which url to use in order to remove them.

Hi @prathameshJoshi,

I was able to get the API url using this piece of code and is working on my browser.
Not sure how to authenticate while making calls programmatically.

 

from databricks_api import DatabricksAPI
from dbruntime.databricks_repl_context import get_context

databricks_api_instance = DatabricksAPI(
    host=get_context().apiUrl,
    token=get_context().apiToken,
)
host = get_context().browserHostName
cluster_id = get_context().clusterId
spark_context_id = databricks_api_instance.cluster.get_cluster(get_context().clusterId)['spark_context_id']

spark_ui_api_url = f"https://{host}/sparkui/{cluster_id}/driver-{spark_context_id}/api/v1/"
endpoint = 'applications'

print(spark_ui_api_url + endpoint)

 

 

Thanks for providing a starting point, I tried out the URL which you have provided, but its not working when I try to send a request to it. I tried passing the Access token as a bearer token, but the request is sending back some login html page back.. Please find the output attached. Please let me know if there's any chance to fix it. 

Thanks in Advance.

Thanks for providing a starting point, I tried out the URL which you have provided, but its not working when I try to send a request to it. I tried passing the Access token as a bearer token, but the request is sending back some login html page back.. Please find the output attached. Please let me know if there's any chance to fix it. 

Thanks in Advance.

 

<!doctype html>
<html>
 <head>
  <meta charset="utf-8">
  <meta http-equiv="Content-Language" content="en">
  <title>Databricks - Sign In</title>
  <meta name="viewport" content="width=960">
  <link rel="icon" type="image/png" href="https://databricks-ui-assets.azureedge.net/favicon.ico">
  <meta http-equiv="content-type" content="text/html; charset=UTF8">
  <script id="__databricks_react_script"></script>
  <script>window.__DATABRICKS_SAFE_FLAGS__={
    "databricks.infra.showErrorModalOnFetchError": true,
    "databricks.fe.infra.useReact18": true,
    "databricks.fe.infra.useReact18NewAPI": false,
    "databricks.fe.infra.fixConfigPrefetch": true
},window.__DATABRICKS_CONFIG__={
    "isCuttingEdge": false,
    "publicPath": {
        "mlflow": "https://databricks-ui-assets.azureedge.net/",
        "dbsql": "https://databricks-ui-assets.azureedge.net/",
        "feature-store": "https://databricks-ui-assets.azureedge.net/",
        "monolith": "https://databricks-ui-assets.azureedge.net/",
        "jaws": "https://databricks-ui-assets.azureedge.net/"
    }
}</script>
  <link rel="icon" href="https://databricks-ui-assets.azureedge.net/favicon.ico">
  <script>
  function setNoCdnAndReload() {
      document.cookie = `x-databricks-cdn-inaccessible=true; path=/; max-age=86400`;
      const metric = 'cdnFallbackOccurred';
      const browserUserAgent = navigator.userAgent;
      const browserTabId = window.browserTabId;
      const performanceEntry = performance.getEntriesByType('resource').filter(e => e.initiatorType === 'script').slice(-1)[
        0
    ]
      sessionStorage.setItem('databricks-cdn-fallback-telemetry-key', JSON.stringify({ tags: { browserUserAgent, browserTabId
        }, performanceEntry
    }));
      window.location.reload();
}
</script>
  <script>
  // Set a manual timeout for dropped packets to CDN
  function loadScriptWithTimeout(src, timeout) {
     return new Promise((resolve, reject) => {
        const script = document.createElement('script');
          script.defer = true;
          script.src=src;
          script.onload = resolve;
          script.onerror = reject;
          document.head.appendChild(script);
          setTimeout(() => {
              reject(new Error('Script load timeout'));
        }, timeout);
    });
}
  loadScriptWithTimeout('https: //databricks-ui-assets.azureedge.net/static/js/login/login.acadbe8a.js', 10000).catch(setNoCdnAndReload);
</script>
 </head>
 <body class="light-mode">
  <uses-legacy-bootstrap>
   <div id="login-page"></div>
  </uses-legacy-bootstrap>
  <script>const telemetryEndpoint="/telemetry-unauth?t=",uiModuleName="workspaceLogin";function shouldIgnoreError(e){return!1
}function generateUuidV4(){const e=window.crypto?.randomUUID?.();return e||"xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx".replace(/[xy
    ]/g,(e=>{const n=16*Math.random()|0;return("x"===e?n: 3&n|8).toString(16)
    }))
}function networkConnectivityTags(){const e=window.navigator.onLine,n=window.navigator.connection?.rtt??-1,t=window.navigator.connection?.downlink??-1;return{browserNavigatorOnline:e,browserConnectionEstimatedRtt:n,browserConnectionEstimatedDownlink:t,browserConnected:e&&n>0&&t>0
    }
}function createTelemetryRequestBody(e,n={},t=null){const o=Math.round(Date.now()/1e3),r={eventId:generateUuidV4(),metric:e,tags: {...n,...networkConnectivityTags(),browserTabId:window.browserTabId,browserUserAgent:navigator.userAgent
        },ts:o
    };return t&&(r.blob=t),JSON.stringify({uploadTime:o,items: [JSON.stringify(r)
        ]
    })
}function recordTelemetry(e,n={},t=""){const o={method: "POST",credentials: "include",body:createTelemetryRequestBody(e,n,t)
    };fetch(telemetryEndpoint+Date.now(),o)
}window.__databricks_networkConnectivityTags=networkConnectivityTags,Object.defineProperty(window,
"browserTabId",
{value:generateUuidV4()
}),window.recordTelemetry=recordTelemetry,recordTelemetry("uiInit",
{uiModule:uiModuleName,eventId: "init",eventClientSource:uiModuleName,eventType: "init"
});let logCount=0;function error_handler(e,n,t,o,r){logCount++>4||shouldIgnoreError(e)||recordTelemetry("uncaughtJsException",
    {eventType: "jsExceptionV3",jsExceptionMessage:e,jsExceptionSource:n,jsExceptionLineno:t,jsExceptionColno:o,jsExceptionBeforeInit:!0
    },r&&r.stack&&r.stack.toString())
}function sendBeaconOnPageExit(e){if(navigator.sendBeacon){const n=e&&e.type||"unknown",t=(Math.round(Date.now()/1e3),createTelemetryRequestBody("uiInit",
        {eventType: "pageExitBeforeAppInitComplete",eventName:n,eventClientSource:uiModuleName
        }));navigator.sendBeacon(telemetryEndpoint+Date.now(),t)
    }
}window.onerror=error_handler,window.onunhandledrejection=function(e){error_handler(String(e.reason),
    null,
    null,
    null,e.reason)
},window.addEventListener("beforeunload",sendBeaconOnPageExit),window.addEventListener("unload",sendBeaconOnPageExit),window.addEventListener("pagehide",sendBeaconOnPageExit),window.cleanupAfterAppInit=()=>{window.removeEventListener("beforeunload",sendBeaconOnPageExit),window.removeEventListener("unload",sendBeaconOnPageExit),window.removeEventListener("pagehide",sendBeaconOnPageExit)
}</script>
 </body>
</html>

 

 

 

Kaniz_Fatma
Community Manager
Community Manager

Hi @prathameshJoshi, Thanks for reaching out! Please review the responses and let us know which best addresses your question. Your feedback is valuable to us and the community.

 

If the response resolves your issue, kindly mark it as the accepted solution. This will help close the thread and assist others with similar queries.

 

We appreciate your participation and are here if you need further assistance!

Hi Kaniz,

The solution posted by @menotron is giving me some errors. Once those are fixed, I will mark the appropriate response as accepted solution.

Thank You

Kaniz_Fatma
Community Manager
Community Manager

Hi @prathameshJoshi, Try this. Should be something like this:

ADMIN_TOKEN="dapi9___________________________"
WORKSPACE="my_workspace.cloud.databricks.com"
CLUSTER="__________"
MPORT="40001"

PREFIX="https://${WORKSPACE}/driver-proxy-api/o/0/${CLUSTER}/${MPORT}"

curl -L -H "Authorization: Bearer $ADMIN_TOKEN" -X GET ${PREFIX}

menotron
Valued Contributor

Hi @prathameshJoshi The url mentioned by @Kaniz_Fatma seems to be the correct one.
You can use this script to interact with the monitoring REST API.

from dbruntime.databricks_repl_context import get_context
import requests

host = get_context().browserHostName
cluster_id = get_context().clusterId

spark_ui_api_url = f"https://{host}/driver-proxy-api/o/0/{cluster_id}/40001/api/v1/"
endpoint = 'applications'

requests.get(spark_ui_api_url + endpoint, headers={"Authorization": f"Bearer {get_context().apiToken}"}).json()

prathameshJoshi
New Contributor III

Hi @Kaniz_Fatma  and @menotron ,

Thanks a lot; your solutions are working. I apologise for the delay, as I had some issue logging in.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group