cancel
Showing results for 
Search instead for 
Did you mean: 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results for 
Search instead for 
Did you mean: 

Running Browser Based Agentic Applications on Databricks

abhijit007
Databricks Partner

Hi,
We are evaluating whether it is possible to host a browser‑based agentic application on Databricks.
Our application performs frontend UI automation using the browser-use Python library and also exposes FastAPI endpoints to drive a UI..

Application Overview-
Uses browser-use, which is built on Playwright
Requires OS‑level browser dependencies
Runs headless Chrome via the Chrome DevTools Protocol (CDP)
Uses random, short‑lived local ports for browser communication

Challenges Encountered-
1. Databricks Serverless / Databricks Apps
Not supported for this use case
No access to OS‑level dependencies or browser binaries required by Playwright

2. Legacy Compute with Init Scripts
Browsers (e.g., Chrome) can be installed via init scripts
However, browser-use fails to connect to headless Chrome
Possible causes include:
Restrictions on dynamic or ephemeral localhost ports
Networking limitations
Constraints imposed by the Databricks runtime or base image

3. Legacy Compute with Custom Docker Images
Ideally, we would like to use a custom Docker image (e.g., python:3.11 s-l-i-m) with all required browser dependencies preinstalled
Databricks currently does not allow using non‑Databricks Runtime base images for compute

Question-
Is there any supported way on Databricks to:

Run custom Docker images that are not based on the Databricks Runtime, or
Use another Databricks‑supported service or pattern that would allow running browser‑based automation workloads (Playwright or headless Chrome or CDP‑based tools)?

Any guidance and help would be greatly appreciated.

2 ACCEPTED SOLUTIONS

Accepted Solutions

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

TLDR: Databricks Apps/serverless won’t support this pattern; classic compute with Databricks Container Services is your only real option on Databricks, and even that has trade‑offs. For serious browser automation, run it off‑platform and integrate with Databricks by API.

1. Databricks Apps / Serverless

  • Apps run in a locked‑down serverless container:
    • No root, no apt-get/yum/apk, no system‑level packages or browsers.
    • App must bind only to 0.0.0.0:<DATABRICKS_APP_PORT>; the reverse proxy terminates TLS and forwards traffic.
  • That makes Playwright/headless Chrome/CDP inside an App essentially unsupported.

2. Classic compute with true custom Docker images

Yes, but with constraints:

  • Use Databricks Container Services on classic compute to specify a Docker image for the cluster.
  • The image does not have to be based on a Databricks Runtime image; you can build it “from scratch” (e.g. starting from python:3.11-slim) as long as you:
    • Run (typically Ubuntu) Linux and include required system tools (JDK 8u191, bash, coreutils, procps, sudo, etc.).
  • At cluster launch, Databricks:
    • Pulls your image, starts a container, then copies Databricks Runtime code into it.
  • In that container you can pre‑install Chrome + Playwright/browser‑use and use localhost ephemeral ports for CDP; there is no documented block on local ephemeral ports for classic compute.

Caveats:

  • Container Services is not supported in some scenarios (for example certain access modes, ML runtimes, FGAC on dedicated compute).
  • You own all OS‑level hardening and must thoroughly test your image.

3. Recommendation

Given how specialized browser automation is, the most robust pattern is:

  • Run Playwright/headless Chrome in your own infra (Kubernetes, ECS, VMs) where you fully control the OS and networking.
  • Use Databricks purely for data/AI:
    • Call Model Serving / foundation model APIs for LLMs.
    • Use Jobs / SQL / Delta for data work.
    • Optionally front this with a Databricks App or MCP server that calls out to your browser‑automation service.

Summary:

  • Supported on Databricks?
    • Apps/serverless: effectively no for this use case.
    • Classic compute: yes, via Databricks Container Services with a custom Docker image that you harden and that still hosts Databricks Runtime.
  • Best practice: treat browser automation as an external service and integrate with Databricks over APIs.

View solution in original post

Thanks @Lu_Wang_ENB_DBX thanks for your reply with details. It helps. As I mentioned I tried with classic compute with image and then call the AI browser api from custom port  from notebook and saved the output in Lakebase....  

View solution in original post

2 REPLIES 2

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

TLDR: Databricks Apps/serverless won’t support this pattern; classic compute with Databricks Container Services is your only real option on Databricks, and even that has trade‑offs. For serious browser automation, run it off‑platform and integrate with Databricks by API.

1. Databricks Apps / Serverless

  • Apps run in a locked‑down serverless container:
    • No root, no apt-get/yum/apk, no system‑level packages or browsers.
    • App must bind only to 0.0.0.0:<DATABRICKS_APP_PORT>; the reverse proxy terminates TLS and forwards traffic.
  • That makes Playwright/headless Chrome/CDP inside an App essentially unsupported.

2. Classic compute with true custom Docker images

Yes, but with constraints:

  • Use Databricks Container Services on classic compute to specify a Docker image for the cluster.
  • The image does not have to be based on a Databricks Runtime image; you can build it “from scratch” (e.g. starting from python:3.11-slim) as long as you:
    • Run (typically Ubuntu) Linux and include required system tools (JDK 8u191, bash, coreutils, procps, sudo, etc.).
  • At cluster launch, Databricks:
    • Pulls your image, starts a container, then copies Databricks Runtime code into it.
  • In that container you can pre‑install Chrome + Playwright/browser‑use and use localhost ephemeral ports for CDP; there is no documented block on local ephemeral ports for classic compute.

Caveats:

  • Container Services is not supported in some scenarios (for example certain access modes, ML runtimes, FGAC on dedicated compute).
  • You own all OS‑level hardening and must thoroughly test your image.

3. Recommendation

Given how specialized browser automation is, the most robust pattern is:

  • Run Playwright/headless Chrome in your own infra (Kubernetes, ECS, VMs) where you fully control the OS and networking.
  • Use Databricks purely for data/AI:
    • Call Model Serving / foundation model APIs for LLMs.
    • Use Jobs / SQL / Delta for data work.
    • Optionally front this with a Databricks App or MCP server that calls out to your browser‑automation service.

Summary:

  • Supported on Databricks?
    • Apps/serverless: effectively no for this use case.
    • Classic compute: yes, via Databricks Container Services with a custom Docker image that you harden and that still hosts Databricks Runtime.
  • Best practice: treat browser automation as an external service and integrate with Databricks over APIs.

Thanks @Lu_Wang_ENB_DBX thanks for your reply with details. It helps. As I mentioned I tried with classic compute with image and then call the AI browser api from custom port  from notebook and saved the output in Lakebase....