cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Administration & Architecture
Explore discussions on Databricks administration, deployment strategies, and architectural best practices. Connect with administrators and architects to optimize your Databricks environment for performance, scalability, and security.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Running Browser Based Agentic Applications on Databricks

abhijit007
Databricks Partner

Hi,
We are evaluating whether it is possible to host a browserโ€‘based agentic application on Databricks.
Our application performs frontend UI automation using the browser-use Python library and also exposes FastAPI endpoints to drive a UI..

Application Overview-
Uses browser-use, which is built on Playwright
Requires OSโ€‘level browser dependencies
Runs headless Chrome via the Chrome DevTools Protocol (CDP)
Uses random, shortโ€‘lived local ports for browser communication

Challenges Encountered-
1. Databricks Serverless / Databricks Apps
Not supported for this use case
No access to OSโ€‘level dependencies or browser binaries required by Playwright

2. Legacy Compute with Init Scripts
Browsers (e.g., Chrome) can be installed via init scripts
However, browser-use fails to connect to headless Chrome
Possible causes include:
Restrictions on dynamic or ephemeral localhost ports
Networking limitations
Constraints imposed by the Databricks runtime or base image

3. Legacy Compute with Custom Docker Images
Ideally, we would like to use a custom Docker image (e.g., python:3.11 s-l-i-m) with all required browser dependencies preinstalled
Databricks currently does not allow using nonโ€‘Databricks Runtime base images for compute

Question-
Is there any supported way on Databricks to:

Run custom Docker images that are not based on the Databricks Runtime, or
Use another Databricksโ€‘supported service or pattern that would allow running browserโ€‘based automation workloads (Playwright or headless Chrome or CDPโ€‘based tools)?

Any guidance and help would be greatly appreciated.

1 REPLY 1

Lu_Wang_ENB_DBX
Databricks Employee
Databricks Employee

TLDR: Databricks Apps/serverless wonโ€™t support this pattern; classic compute with Databricks Container Services is your only real option on Databricks, and even that has tradeโ€‘offs. For serious browser automation, run it offโ€‘platform and integrate with Databricks by API.

1. Databricks Apps / Serverless

  • Apps run in a lockedโ€‘down serverless container:
    • No root, no apt-get/yum/apk, no systemโ€‘level packages or browsers.
    • App must bind only to 0.0.0.0:<DATABRICKS_APP_PORT>; the reverse proxy terminates TLS and forwards traffic.
  • That makes Playwright/headless Chrome/CDP inside an App essentially unsupported.

2. Classic compute with true custom Docker images

Yes, but with constraints:

  • Use Databricks Container Services on classic compute to specify a Docker image for the cluster.
  • The image does not have to be based on a Databricks Runtime image; you can build it โ€œfrom scratchโ€ (e.g. starting from python:3.11-slim) as long as you:
    • Run (typically Ubuntu) Linux and include required system tools (JDK 8u191, bash, coreutils, procps, sudo, etc.).
  • At cluster launch, Databricks:
    • Pulls your image, starts a container, then copies Databricks Runtime code into it.
  • In that container you can preโ€‘install Chrome + Playwright/browserโ€‘use and use localhost ephemeral ports for CDP; there is no documented block on local ephemeral ports for classic compute.

Caveats:

  • Container Services is not supported in some scenarios (for example certain access modes, ML runtimes, FGAC on dedicated compute).
  • You own all OSโ€‘level hardening and must thoroughly test your image.

3. Recommendation

Given how specialized browser automation is, the most robust pattern is:

  • Run Playwright/headless Chrome in your own infra (Kubernetes, ECS, VMs) where you fully control the OS and networking.
  • Use Databricks purely for data/AI:
    • Call Model Serving / foundation model APIs for LLMs.
    • Use Jobs / SQL / Delta for data work.
    • Optionally front this with a Databricks App or MCP server that calls out to your browserโ€‘automation service.

Summary:

  • Supported on Databricks?
    • Apps/serverless: effectively no for this use case.
    • Classic compute: yes, via Databricks Container Services with a custom Docker image that you harden and that still hosts Databricks Runtime.
  • Best practice: treat browser automation as an external service and integrate with Databricks over APIs.