cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

How to Play or Stream MP4 Videos from Unity Catalog Volumes in Databricks (Flask/Dash)?

GergoBo
New Contributor

Hello Databricks Community,

I am working on a Dash dashboard (Python/Flask backend) deployed on Databricks, and I need to play or stream MP4 video files stored in a Unity Catalog Volume. I have tried accessing these files both from a Databricks notebook and from Flask, but I am unable to play the videos in either environment.

  • In notebooks, I can read the files, but I cannot play or display the videos directly in the output cells.
  • In Flask/Dash, I cannot serve the files as public URLs, and the backend cannot access the volume as a file path for streaming.

Has anyone found a way to play or stream MP4 videos stored in Unity Catalog Volumes, either in a Databricks notebook or via a web app (Flask/Dash) running on Databricks?
Are there any recommended approaches, workarounds, or best practices for enabling video playback from Volumes?

Any advice or documentation links would be greatly appreciated!

Thank you!

2 REPLIES 2

Raman_Unifeye
Honored Contributor III

@GergoBo - Since notebooks cannot reach out to the file system to stream, you must embed the video as a Base64 encoded string. 

I tried below code and it works well in Notebook as it plays the video in the output.

 

import base64
from IPython.display import HTML

video_path = "/Volumes/workspace/ingestion/vol1/mp4/file_example_MP4_480_1_5MG.mp4"

with open(video_path, "rb") as f:
data = f.read()
b64_video = base64.b64encode(data).decode()

HTML(f"""
<video width="640" height="480" controls>
<source src="data:video/mp4;base64,{b64_video}" type="video/mp4">
</video>
""")

This works best for small files (<50MB) as it loads the whole video into memory.

Pls try above. I will check for Dash-app as it is more involved and try to get back.


RG #Driving Business Outcomes with Data Intelligence

SteveOstrowski
Databricks Employee
Databricks Employee

Hi @GergoBo,

There are a few approaches depending on whether you need playback in a notebook or within a Dash/Flask web app. Here is a breakdown of each scenario.

OPTION 1: VIDEO PLAYBACK IN A DATABRICKS NOTEBOOK

For notebooks, you can read the MP4 file from the volume path and embed it as a base64-encoded data URI. This works well for files under ~50 MB:

import base64
from IPython.display import HTML

video_path = "/Volumes/<catalog>/<schema>/<volume>/path/to/video.mp4"

with open(video_path, "rb") as f:
  data = f.read()

b64_video = base64.b64encode(data).decode()

HTML(f"""
<video width="640" height="480" controls>
<source src="data:video/mp4;base64,{b64_video}" type="video/mp4">
</video>
""")

This loads the entire file into memory, so it is best suited for smaller files.

OPTION 2: DASH/FLASK APP RUNNING AS A DATABRICKS APP

If you are building your Dash app as a Databricks App, you can serve videos directly from Unity Catalog Volumes by creating a Flask route that streams the file content. Here is the approach:

1. Add a Unity Catalog Volume as a resource in your app.yaml configuration. This gives your app's service principal permission to read from the volume.

2. Use the Databricks SDK (databricks-sdk) to download the file and stream it back to the browser. This avoids loading the entire file into memory at once.

Example Flask/Dash route for streaming:

from flask import Response
from databricks.sdk import WorkspaceClient

w = WorkspaceClient()

@app.server.route("/video/<path:filename>")
def serve_video(filename):
  volume_path = f"/Volumes/<catalog>/<schema>/<volume>/{filename}"

  def generate():
      resp = w.files.download(volume_path)
      with resp.contents as f:
          while True:
              chunk = f.read(8192)
              if not chunk:
                  break
              yield chunk

  return Response(generate(), mimetype="video/mp4")

Then in your Dash layout, reference the video with a standard HTML5 video element:

import dash_html_components as html

html.Video(
  src="/video/path/to/video.mp4",
  controls=True,
  width="640",
  height="480"
)

This streams the video in 8 KB chunks, so it handles larger files without memory issues.

3. Make sure your requirements.txt includes databricks-sdk.

For the app.yaml resource configuration, you would add something like:

resources:
- name: video-volume
  type: unity-catalog-volume
  path: /Volumes/<catalog>/<schema>/<volume>
  permission: READ

Documentation reference for Databricks Apps resources:
https://docs.databricks.com/aws/en/dev-tools/databricks-apps/resources

OPTION 3: FLASK/DASH ON A CLUSTER (NOT A DATABRICKS APP)

If you are running your Flask/Dash server directly on a cluster (e.g., via a notebook or driver proxy), you can read from the volume using the standard file system path since the cluster has direct FUSE access to /Volumes/:

from flask import Response

@app.server.route("/video/<path:filename>")
def serve_video(filename):
  volume_path = f"/Volumes/<catalog>/<schema>/<volume>/{filename}"

  def generate():
      with open(volume_path, "rb") as f:
          while True:
              chunk = f.read(8192)
              if not chunk:
                  break
              yield chunk

  return Response(generate(), mimetype="video/mp4")

The driver proxy URL would be something like:
https://<workspace-url>/driver-proxy/o/<org-id>/<cluster-id>/<port>/video/my_video.mp4

SUPPORTING RANGE REQUESTS (SEEK/SCRUB)

For a better user experience with video scrubbing (seeking to specific positions), you can add HTTP Range request support to your Flask route. This lets the browser request specific byte ranges instead of downloading the entire file:

import os
from flask import Response, request

@app.server.route("/video/<path:filename>")
def serve_video(filename):
  volume_path = f"/Volumes/<catalog>/<schema>/<volume>/{filename}"
  file_size = os.path.getsize(volume_path)

  range_header = request.headers.get("Range")
  if range_header:
      byte_start = int(range_header.replace("bytes=", "").split("-")[0])
      byte_end = min(byte_start + 1024 * 1024, file_size - 1)
      content_length = byte_end - byte_start + 1

      def generate():
          with open(volume_path, "rb") as f:
              f.seek(byte_start)
              yield f.read(content_length)

      return Response(
          generate(),
          status=206,
          mimetype="video/mp4",
          headers={
              "Content-Range": f"bytes {byte_start}-{byte_end}/{file_size}",
              "Accept-Ranges": "bytes",
              "Content-Length": content_length,
          },
      )

  def generate():
      with open(volume_path, "rb") as f:
          while True:
              chunk = f.read(8192)
              if not chunk:
                  break
              yield chunk

  return Response(generate(), mimetype="video/mp4")

Note: Range request support works most naturally with direct file access (cluster or Databricks App with FUSE mount). If using the SDK download method, you would need to handle partial reads differently since the SDK streams the full file.

RELEVANT DOCUMENTATION

- Unity Catalog Volumes overview: https://docs.databricks.com/aws/en/connect/unity-catalog/volumes
- Working with files in volumes: https://docs.databricks.com/aws/en/volumes/volume-files
- Databricks Apps overview: https://docs.databricks.com/aws/en/dev-tools/databricks-apps/index
- Databricks Apps resources: https://docs.databricks.com/aws/en/dev-tools/databricks-apps/resources
- Databricks SDK for Python: https://docs.databricks.com/aws/en/dev-tools/sdk-python

* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.

If this answer resolves your question, could you mark it as "Accept as Solution"? That helps other users quickly find the correct fix.