Cornell Bovi-Analytics Research Team
Authors: Drs. Enhong Liu and Miel Hostens
How Collaboration of Bovi-Analytics Lab and Databricks Advances AI Innovation in Dairy Farming:
Summary
At Cornell Bovi-Analytics Lab, we transform complex farm data into practical insights that improve animal health, boost efficiency, and advance sustainability. Guided by a vision of a smarter and more sustainable dairy industry, our lab develops AI-driven tools, fosters cross-disciplinary collaboration with industry and academic partners, and trains the next generation of dairy scientists to be digitally empowered, data-informed, and globally impactful. By bridging agriculture and technology, Bovi-Analytics Lab aims to make advanced analytics both scalable and accessible, reimagining the future of dairy farming through data science and artificial intelligence.
Why AI + Dairy Farming
The dairy sector represents both substantial scale and significant economic opportunity. In 2024, the U.S. dairy industry was valued at approximately $120 billion and is projected to exceed $167 billion by 2052. Leveraging on-farm data to optimize diet formulation has been shown to reduce feed costs by $31 per cow annually while cutting nitrogen excretion by 5.5 kg per cow per year. Despite these opportunities, adoption of advanced digital technologies in dairy farming remains limited due to several systemic barriers. Chief among these is the fragmentation and heterogeneity of data: farm- and research-generated information is often siloed across diverse systems, stored in incompatible formats, and difficult to integrate at scale. Concerns regarding privacy and ownership from farmers and researchers discourage data sharing. Without robust governance mechanisms, regional or global data aggregation remains largely infeasible, which severely constrains the development of state-of-the-art AI models in dairy sector that require large-scale, curated datasets. Beyond data integration and governance, a disconnect persists between academic research and practical farm applications. Even after considerable effort, academic innovations often remain confined to repositories, with limited pathways to translation into usable tools that impact farming practices. Furthermore, educational gaps compound these challenges. Traditional agricultural science programs have historically emphasized biology and production, with limited training in data governance, AI development, or digital agriculture systems. Consequently, graduates often enter the workforce underprepared to lead in increasingly data-intensive agricultural environments.
These challenges create a negative feedback loop: fragmented and inaccessible data constrains research, innovation struggles to translate into tools that address the practical challenges farmers face, farmers fail to benefit from research and remain reluctant to share data, and students graduate without the proper skills are unable to disrupt this cycle. Breaking this loop will require coordinated, interdisciplinary efforts that integrate expertise from animal science and computer science, alongside strategic partnerships that can provide user-friendly infrastructure and the resources necessary to support sustained innovation.
Featured Lab Work with Databricks
The Bovi-Analytics Lab, in collaboration with Databricks, is breaking this negative cycle by developing integrated, secure, and practical solutions that empower researchers, farmers, and students simultaneously.
Fostering a more collaborative research community. Our $3.4M Global Methane Hub funded project demonstrates how global research data can be harmonized seamlessly and securely across institutions. Leveraging Unity Catalog, we are building a centralized database of methane emissions across dairy and beef farming systems, with contributions from over ten institutions, including Cornell University, UC-Davis, ETH Zurich, and many other partners across North America, Europe, and Australia. Collaborators securely stream data from their local machines to a Unity Catalog-governed Azure Blob storage. Each contributor retains full control and access rights over their own datasets while granting selective permissions for shared use. This structure ensures safe data sharing, supports large-scale model training, and enhances both regional and global impact. An especially innovative feature is the integration of Genie, Databricks’ conversational AI interface to translate natural language queries into SQL over governed data. Once datasets are registered in Unity Catalog as tables or views, researchers can query and explore data conversationally, which lowers technical barriers, broadens participation, and accelerates the pace of scientific discovery.
Developing tools to practically benefits farmers. Through a $1M NSF-funded testbed, we are developing a system that enables personalized model training on individual farms while offering farmers an intuitive interface to interact with their own data. Built on federated learning and enhanced by open-source large language models, this system allows farmers to engage directly with multimodal data (e.g., production records, health metrics, scientific literature, and government reports) and with models trained on their own operations. Databricks supports this effort by providing the infrastructure needed for development prior to real-world testing: Unity Catalog for secure data governance, collaborative workspaces and notebooks for data preprocessing and reproducible model pipeline building within the team, MLflow for experiment tracking, and serving endpoints for streamlined model deployment. Flexible compute options, from low-memory T4 GPUs that mimic resource-limited farm environments to high-performance H100 GPUs for computer-vision tasks, allow us to design and test AI models under various realistic conditions. Databricks’ adaptable environment also allows us to rapidly experiment with emerging open-source large language models from HuggingFace and orchestration frameworks such as LangChain, ensuring we can identify the most efficient and practical solutions for farmers.
Preparing students for careers in data-driven agriculture. At Bovi-Analytics, we are dedicated to training the next generation of scientists who will lead agriculture’s digital transformation. Databricks supports it by lowering barriers to data-intensive skill development. Students begin with the Databricks Free Edition, where they gain hands-on experience in data analysis, coding, and statistical modeling while becoming familiar with core platform components such as compute, workspaces, notebooks, and naming conventions. This free environment allows students to explore and learn without the risk of unexpected costs. As they advance, students are then granted controlled access to our lab’s paid Databricks workspace, where they work with real-world dairy datasets under secure governance managed by Unity Catalog. This staged approach provides a safe, scalable pathway for students to grow from foundational learning to applied research, equipping them with the skills needed for leadership in data-driven agriculture.
Throughout the process, the Databricks University Alliance has been an invaluable partner. They have facilitated our use of the Databricks platform, engaged in collaborative discussions to identify the most suitable services for building a feasible, high-impact architecture and technical roadmap, and provided technical guidance and support to help accelerate our progress.
Looking Ahead
Looking ahead, we are expanding collaborations with other partners such as NVIDIA to integrate edge computing into centrally governed cloud infrastructure, enabling field testing of models trained within the Databricks environment. With sustained funding from National Science Foundation, Global Methane Hub, and Bezos Earth Fund, combined with Databricks’ technical capabilities, we are building an end-to-end ecosystem that transforms scattered data into actionable insights, moves isolated AI models into deployable solutions, and translates academic research into real-world impact. This integrated approach will not only accelerate innovation within dairy science but also contribute to the broader sustainability and digital transformation of agriculture.
More about the authors:
Miel Hostens:
Dr. Miel Hostens is the Robert and Anne Everett Associate Professor of Digital Dairy Management and Data Analytics in the Department of Animal Science at Cornell University. With over 15 years of experience in dairy science and precision agriculture, he develops innovative methodologies leveraging precision dairy farming to monitor and improve sustainable food production systems globally. Miel is deeply committed to advancing data-driven dairy science through both research and industry collaborations. He has led numerous international projects in data-driven agriculture and precision dairy farming, helping to transform practices in multiple countries worldwide.
Meil Hostens
Enhong Liu:
Dr. Enhong Liu is a senior researcher at Cornell’s Bovi-Analytics Lab. He holds advanced degrees in both Animal Science and Data Science and is a Databricks-certified Generative AI Engineer. His work focuses on enhancing the field adoption and usability of large language models through techniques such as fine-tuning, model quantization, and the development of agentic AI to improve tool accessibility for stakeholders in animal health, livestock management, and sustainability. With extensive experience at the intersection of AI and animal sciences, he bridges cutting-edge technology for the dairy and broader agricultural sectors.
Enhong Liu
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.