cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
robert_runkle
Databricks Employee
Databricks Employee

The Challenge: Scaling ML for Personalized E-commerce

Shutterfly, a leader in online photo printing and personalized e-commerce, faced significant scalability challenges with their machine learning infrastructure. The Data Science & Growth Analytics team managed extensive collections of embeddings from text and images, alongside feature stores containing thousands of data points.

Traditional tools proved insufficient for handling this volume effectively. The Spark implementation of XGBoost was particularly restrictive and slow, creating bottlenecks in model development. Furthermore, the team needed a solution that could scale Python-based workloads efficiently while supporting multiple frameworks including PyTorch, TensorFlow, and XGBoost.

The Solution: Ray Framework on Databricks

Shutterfly adopted Ray, an open source framework for scaling Python applications, integrated with Databricks for orchestration, governance with Unity Catalog, and resource management. This combination addressed their core requirements for distributed computing and seamless workflow integration.

The primary use case focused on customer propensity modeling across multiple product categories, leveraging both structured data and images. Ray enabled the team to process tens of millions of customer records and images for training, while handling billions of records for inference across dozens of different product categories. 

robert_runkle_0-1761686544973.png

 

Key Features and Implementation

Distributed Processing Capabilities: Ray on Databricks simplified deployment and management of distributed Python workloads, including deep learning and batch inference jobs. The platform provided native support for Ray clusters, making it easy to spin up and manage distributed compute resources.

Multi-Framework Support: The solution supported PyTorch, TensorFlow, and XGBoost, allowing the team to select the optimal tool for each specific task. This flexibility proved crucial for handling diverse ML workloads across personalization, search, and image processing applications.

Automated Resource Management: Databricks clusters automatically scaled to meet demand, ensuring efficient resource utilization without manual intervention. This automation allowed data scientists to focus on model development rather than infrastructure management.

Business Impact and Results

The implementation delivered substantial performance improvements and business value. Model training and inference times were reduced by at least 20-fold for XGBoost, with embedding-based models now processing billions of records in minutes rather than hours.

Over the past two years, the team nearly tripled model development throughput. This acceleration enabled numerous new use cases and resulted in millions of dollars in incremental value while substantially enhancing personalized experiences for customers.

The scalability improvements allowed Shutterfly to handle tens of millions of training records and billions of inference events seamlessly. This capability supports real-time personalization across product recommendations, marketing communications, and content optimization.

Future Expansion Plans

Shutterfly continues expanding Ray usage to advanced applications, currently experimenting with vision-language models (VLMs) for high-volume throughput. Initial results have been highly encouraging, demonstrating Ray's transformative potential for sophisticated AI applications.

The company plans to continue collaborating with Databricks to explore new integrations and features while further optimizing resource utilization and cost management.

Ready to scale your machine learning workloads like Shutterfly? Discover how Ray on Databricks can transform your data science workflows. Start with a free trial of Databricks to experience distributed Python computing capabilities, or connect with our solutions team to discuss your specific ML scaling challenges. Explore our comprehensive documentation and tutorials to begin implementing Ray for your personalization and computer vision use cases today.

Related Links