The Challenge: Scaling ML for Personalized E-commerce
Shutterfly, a leader in online photo printing and personalized e-commerce, faced significant scalability challenges with their machine learning infrastructure. The Data Science & Growth Analytics team managed extensive collections of embeddings from text and images, alongside feature stores containing thousands of data points.
Traditional tools proved insufficient for handling this volume effectively. The Spark implementation of XGBoost was particularly restrictive and slow, creating bottlenecks in model development. Furthermore, the team needed a solution that could scale Python-based workloads efficiently while supporting multiple frameworks including PyTorch, TensorFlow, and XGBoost.
The Solution: Ray Framework on Databricks
Shutterfly adopted Ray, an open source framework for scaling Python applications, integrated with Databricks for orchestration, governance with Unity Catalog, and resource management. This combination addressed their core requirements for distributed computing and seamless workflow integration.
The primary use case focused on customer propensity modeling across multiple product categories, leveraging both structured data and images. Ray enabled the team to process tens of millions of customer records and images for training, while handling billions of records for inference across dozens of different product categories.
Key Features and Implementation
Distributed Processing Capabilities: Ray on Databricks simplified deployment and management of distributed Python workloads, including deep learning and batch inference jobs. The platform provided native support for Ray clusters, making it easy to spin up and manage distributed compute resources.
Multi-Framework Support: The solution supported PyTorch, TensorFlow, and XGBoost, allowing the team to select the optimal tool for each specific task. This flexibility proved crucial for handling diverse ML workloads across personalization, search, and image processing applications.
Automated Resource Management: Databricks clusters automatically scaled to meet demand, ensuring efficient resource utilization without manual intervention. This automation allowed data scientists to focus on model development rather than infrastructure management.
Business Impact and Results
The implementation delivered substantial performance improvements and business value. Model training and inference times were reduced by at least 20-fold for XGBoost, with embedding-based models now processing billions of records in minutes rather than hours.
Over the past two years, the team nearly tripled model development throughput. This acceleration enabled numerous new use cases and resulted in millions of dollars in incremental value while substantially enhancing personalized experiences for customers.
The scalability improvements allowed Shutterfly to handle tens of millions of training records and billions of inference events seamlessly. This capability supports real-time personalization across product recommendations, marketing communications, and content optimization.
Future Expansion Plans
Shutterfly continues expanding Ray usage to advanced applications, currently experimenting with vision-language models (VLMs) for high-volume throughput. Initial results have been highly encouraging, demonstrating Ray's transformative potential for sophisticated AI applications.
The company plans to continue collaborating with Databricks to explore new integrations and features while further optimizing resource utilization and cost management.
Ready to scale your machine learning workloads like Shutterfly? Discover how Ray on Databricks can transform your data science workflows. Start with a free trial of Databricks to experience distributed Python computing capabilities, or connect with our solutions team to discuss your specific ML scaling challenges. Explore our comprehensive documentation and tutorials to begin implementing Ray for your personalization and computer vision use cases today.
Related Links
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.