cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
NickKarpov
New Contributor II

In this post we’ll walk through getting started with Meta’s latest Segment-Anything-Model 2 (SAM2) on Databricks. We’ll cover experimentation with SAM2 in a Databricks Notebook, expand on the default examples by using custom notebook cells for visualization.

 

Want to skip straight to the code? Check out the complete notebook to get started.

What is SAM2?

SAM 2 is a unified model for segmenting objects across both images and videos. It supports point, box, and mask coordinates to select an object on any image or frame of video, and shows significant improvements over SAM in performance, accuracy, and manual intervention in both video and image tasks. More details are available in the original research paper published here.

Why do segmentation?

Segmentation has many applications across various industries. Some examples include object detection in autonomous vehicles, medical image analysis for disease diagnosis, and content moderation in social media platforms.

While many hosted services offer segmentation capabilities as part of their machine learning offerings, self-deploying segmentation models like SAM2 can provide significant advantages: greater control over the segmentation process, the ability to customize the model for specific use cases and potential cost savings for high-volume applications. Self-deployment also enables full self governance, with better data privacy and security, as sensitive images or videos don't need to be sent to third-party services for processing.

SAM2 on Databricks

To get started we’ll need a running cluster with GPU support. We’ve chosen to use the g2dn.xlarge [T4] with Databricks Runtime Version 15.4 LTS ML, but you can choose a beefier setup for bigger workloads.

Next, checkout this notebook from the Databricks Dev Rel samples repository. You can run through the notebook directly in Databricks but we’ll highlight some of the key sections below.

Setup

The easiest way to to run SAM2 on Databricks is to create a new Databricks Git Folder configured with the SAM2 Github (https://github.com/facebookresearch/segment-anything-2) and import the notebook above into the existing notebooks folder within the created Git folder segment-anything-2/notebooks. Paths in the notebook assume the correct location of the notebook within the repository.

We’ll install the SAM2 library directly from Github and use the download_ckpts.sh to download the model binaries. By default the script will download all sizes of the model, but you can modify the file directly to download a specific size. The sizes available are tiny, small, base_plus, and large.

 

# install the sam2 pip package directly from the Databricks Git folder
%sh pip install ../../segment-anything-2

# make sure to build extensions
%sh cd ../ && python setup.py build_ext --inplace

# download the model binaries

%sh cd ../checkpoints && ./download_ckpts.sh

 

You can use any video you like, but in this post we’ll use the Day 1 Keynote from the 2024 DATA & AI Summit. We’ll need to preprocess and break this video up into individual frames and upload them directly to Databricks. We’ll use FFmpeg and a Databricks Volume in this example but it’s not a requirement.

 

# use yt-dlp to get the keynote video from youtube
%pip install yt-dlp
%sh yt-dlp -o ./videos/keynote/keynote.mp4 -f "bestvideo[height<=480]" -u "username" -p "password" "https://www.youtube.com/watch?v=-6dt7eJ3cMs"

# use ffmpeg to split the video into frames and place them in a Volume
%sh ffmpeg -ss 00:00:15 -i ./videos/keynote/keynote.mp4 -t 00:00:10 -q:v 2 -start_number 0 /Volumes/sam/default/frames/'%05d.jpg'

 

Segmentation

You can use a combination of Python I/O and matplotlib to visualize the frames directly in the notebook. This is directly from the examples in Meta’s SAM2 examples

 

# `video_dir` a directory of JPEG frames with filenames like `<frame_index>.jpg`
video_dir = "/Volumes/sam/default/frames"

# scan all the JPEG frame names in this directory
frame_names = [
    p for p in os.listdir(video_dir)
    if os.path.splitext(p)[-1] in [".jpg", ".jpeg", ".JPG", ".JPEG"]
]
frame_names.sort(key=lambda p: int(os.path.splitext(p)[0]))

# take a look the first video frame
frame_idx = 0
plt.figure(figsize=(12, 8))
plt.title(f"frame {frame_idx}")
plt.imshow(Image.open(os.path.join(video_dir, frame_names[frame_idx])))

 

Screenshot 2024-10-01 at 2.14.08 PM.png

You need to initialize the predictor ahead of time. This will process all the images in the Volume that were created with ffmpeg:

 

inference_state = redictor.init_state(video_path=video_dir)

 

You can use just a single point to start segmenting. In the image above matplotlib gives us a coordinate system so we see that the head is at roughly (350,100).

To improve segmentation results you can register as many points as you want or even the coordinates of a larger polygon.

 

points = np.array([[350, 100]], dtype=np.float32)
...
_, out_obj_ids, out_mask_logits = predictor.add_new_points(
    inference_state=inference_state,
    points=points,
    ...
)

 

Screenshot 2024-10-01 at 2.15.38 PM.png

After propagating the original detection through the remaining video we can see that it successfully follows the head.

Screenshot 2024-10-01 at 2.17.09 PM.png

Finally, we can use both IPython Widgets and Databricks Notebook displayHTML feature to customize views to go through our frames. In the linked notebook we demonstrate a built-in JS snippet to display the coordinates of our mouse as we move it around the frame, which assists in finding the coordinates of objects.

mouseover.gif

Conclusion

In this blog we’ve introduced the SAM2 on Databricks example to get started with segmenting using the features of Databricks Notebooks. You can take this further by combining object detection models to find the original coordinates of objects for a truly hands off segmentation capability. You can find the latest on the Huggingface Models directory.