<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic DAIS Community Virtual Challenge 2026: LEGO Value Engine - using Data and AI to Find the Best LEGO in Community Articles</title>
    <link>https://community.databricks.com/t5/community-articles/dais-community-virtual-challenge-2026-lego-value-engine-using/m-p/158017#M1225</link>
    <description>&lt;P&gt;&lt;FONT size="4"&gt;Hey everyone!&lt;BR /&gt;For the DAIS 2026 Community Virtual Challenge, I built a LEGO Value Engine using Databricks Free Edition.&lt;BR /&gt;This is a passion project that combined my interests of both LEGOs and Data Engineering.&lt;BR /&gt;When a new LEGO set releases, it can be hard to determine if the set is actually worth buying.&lt;BR /&gt;Some sets cost hundreds of dollars and are heavily marketed, while others receive much less attention. As a collector, it can be difficult to tell whether you're paying for genuine value or simply paying for branding and popularity.&lt;BR /&gt;I wanted to use data to answer that question.&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;The Problem&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;When people evaluate LEGO sets, they usually look at one factor at a time:&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Price&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Piece Count&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Average Rating&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Number of Ratings given&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;FONT size="4"&gt;The challenge is that none of these metrics tell the full story on their own.&lt;BR /&gt;A set might have a great rating but be extremely expensive. Another might have thousands of pieces but poor reviews. I wanted a &lt;FONT face="arial,helvetica,sans-serif"&gt;way&lt;/FONT&gt; to evaluate all of these factors together and identify which sets provide the most value for the money.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;The Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I created an end-to-end analytics pipeline using the Medallion Architecture.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Bronze Layer:&lt;/STRONG&gt; Stores the raw LEGO catalog dataset exactly as received and store it as a Delta table.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Silver Layer:&lt;/STRONG&gt; Cleans the data and calculates three custom scoring metrics using Spark SQL:&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Smart Value Index:&lt;/STRONG&gt; (Piece Count * Average Rating) / Price. This is the primary metric, estimating exactly how much value a customer receives per dollar spent.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Collector Potential Score:&lt;/STRONG&gt; (Average Rating * Number of Reviews) / Price. This highlights sets with community consensus, rather than just a few high ratings.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Price Efficiency:&lt;/STRONG&gt; (Piece Count / Price). Amount of LEGO pieces per dollar.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Gold Layer:&amp;nbsp;&lt;/STRONG&gt;Generates business insights, including:&lt;UL&gt;&lt;LI&gt;Theme rankings based on value (Star Wars, Marvel, DC, etc.)&lt;/LI&gt;&lt;LI&gt;The top 10 most undervalued sets&lt;/LI&gt;&lt;LI&gt;Sets valued over $200&amp;nbsp;that justify their price&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Databricks Features Used&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="4"&gt;This project uses the following:&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Medallion Architecture&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Delta Time Travel&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Databricks Dashboards&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;PySpark Machine Learning&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Databricks Genie AI&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Delta Time Travel&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;Because LEGO prices change frequently due to reasons like promotions and holiday sales, I used Delta Time Travel. By querying historical versions of the Delta tables, a user can compare previous and current prices to observe exactly how a price change impacts a set's value score over time.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Machine Learning&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;To add another layer of analysis, I used PySpark to build a KMeans clustering model. Instead of relying on manual categories, the unsupervised model automatically grouped the catalog based on price, ratings, piece count, and overall value. The math identified four distinct market segments: Premium Collectors, High Value Sets, Casual Buyers, and Budget-Friendly Sets.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Dashboards &amp;amp; Genie AI&amp;nbsp;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;To make the results easy to view, I built a visual Dashboard. It allows users to explore which themes provide the most value, analyze the relationship between price and ratings, and view the machine learning segments across the catalog.&lt;/P&gt;&lt;P&gt;Finally, I connected the Gold tables to Genie AI. This allows users to ask for recommendations. A user can simply type, "Show me the top 5 undervalued sets under $100," and interact with the data to immediately receive set recommendations.&lt;/P&gt;&lt;P&gt;&lt;FONT size="5"&gt;&lt;STRONG&gt;Why It Matters&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;While this project focuses on LEGO sets, the idea applies to many industries. Consumers and businesses frequently need to determine which products deliver the best value, which are overpriced, and how inventory naturally groups into different market segments.&lt;/P&gt;&lt;P&gt;This project demonstrates how Databricks can take raw data and transform it into actionable insights using data engineering, machine learning, and AI.&lt;/P&gt;&lt;P&gt;Thanks for reading, and I’d love to hear any feedback on what I can add or improve!&lt;/P&gt;</description>
    <pubDate>Mon, 01 Jun 2026 04:33:49 GMT</pubDate>
    <dc:creator>ashish51</dc:creator>
    <dc:date>2026-06-01T04:33:49Z</dc:date>
    <item>
      <title>DAIS Community Virtual Challenge 2026: LEGO Value Engine - using Data and AI to Find the Best LEGO</title>
      <link>https://community.databricks.com/t5/community-articles/dais-community-virtual-challenge-2026-lego-value-engine-using/m-p/158017#M1225</link>
      <description>&lt;P&gt;&lt;FONT size="4"&gt;Hey everyone!&lt;BR /&gt;For the DAIS 2026 Community Virtual Challenge, I built a LEGO Value Engine using Databricks Free Edition.&lt;BR /&gt;This is a passion project that combined my interests of both LEGOs and Data Engineering.&lt;BR /&gt;When a new LEGO set releases, it can be hard to determine if the set is actually worth buying.&lt;BR /&gt;Some sets cost hundreds of dollars and are heavily marketed, while others receive much less attention. As a collector, it can be difficult to tell whether you're paying for genuine value or simply paying for branding and popularity.&lt;BR /&gt;I wanted to use data to answer that question.&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="4"&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;The Problem&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;When people evaluate LEGO sets, they usually look at one factor at a time:&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Price&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Piece Count&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Average Rating&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Number of Ratings given&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;FONT size="4"&gt;The challenge is that none of these metrics tell the full story on their own.&lt;BR /&gt;A set might have a great rating but be extremely expensive. Another might have thousands of pieces but poor reviews. I wanted a &lt;FONT face="arial,helvetica,sans-serif"&gt;way&lt;/FONT&gt; to evaluate all of these factors together and identify which sets provide the most value for the money.&lt;/FONT&gt;&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;The Solution&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;I created an end-to-end analytics pipeline using the Medallion Architecture.&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Bronze Layer:&lt;/STRONG&gt; Stores the raw LEGO catalog dataset exactly as received and store it as a Delta table.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Silver Layer:&lt;/STRONG&gt; Cleans the data and calculates three custom scoring metrics using Spark SQL:&lt;UL&gt;&lt;LI&gt;&lt;STRONG&gt;Smart Value Index:&lt;/STRONG&gt; (Piece Count * Average Rating) / Price. This is the primary metric, estimating exactly how much value a customer receives per dollar spent.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Collector Potential Score:&lt;/STRONG&gt; (Average Rating * Number of Reviews) / Price. This highlights sets with community consensus, rather than just a few high ratings.&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Price Efficiency:&lt;/STRONG&gt; (Piece Count / Price). Amount of LEGO pieces per dollar.&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;LI&gt;&lt;STRONG&gt;Gold Layer:&amp;nbsp;&lt;/STRONG&gt;Generates business insights, including:&lt;UL&gt;&lt;LI&gt;Theme rankings based on value (Star Wars, Marvel, DC, etc.)&lt;/LI&gt;&lt;LI&gt;The top 10 most undervalued sets&lt;/LI&gt;&lt;LI&gt;Sets valued over $200&amp;nbsp;that justify their price&lt;/LI&gt;&lt;/UL&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Databricks Features Used&lt;/FONT&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;P&gt;&lt;FONT size="4"&gt;This project uses the following:&lt;BR /&gt;&lt;/FONT&gt;&lt;/P&gt;&lt;UL&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Medallion Architecture&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Delta Time Travel&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Databricks Dashboards&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;PySpark Machine Learning&lt;/FONT&gt;&lt;/LI&gt;&lt;LI&gt;&lt;FONT size="4"&gt;Databricks Genie AI&lt;/FONT&gt;&lt;/LI&gt;&lt;/UL&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Delta Time Travel&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;Because LEGO prices change frequently due to reasons like promotions and holiday sales, I used Delta Time Travel. By querying historical versions of the Delta tables, a user can compare previous and current prices to observe exactly how a price change impacts a set's value score over time.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Machine Learning&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;To add another layer of analysis, I used PySpark to build a KMeans clustering model. Instead of relying on manual categories, the unsupervised model automatically grouped the catalog based on price, ratings, piece count, and overall value. The math identified four distinct market segments: Premium Collectors, High Value Sets, Casual Buyers, and Budget-Friendly Sets.&lt;/P&gt;&lt;P&gt;&lt;STRONG&gt;&lt;FONT size="5"&gt;Dashboards &amp;amp; Genie AI&amp;nbsp;&lt;/FONT&gt;&lt;/STRONG&gt;&lt;BR /&gt;To make the results easy to view, I built a visual Dashboard. It allows users to explore which themes provide the most value, analyze the relationship between price and ratings, and view the machine learning segments across the catalog.&lt;/P&gt;&lt;P&gt;Finally, I connected the Gold tables to Genie AI. This allows users to ask for recommendations. A user can simply type, "Show me the top 5 undervalued sets under $100," and interact with the data to immediately receive set recommendations.&lt;/P&gt;&lt;P&gt;&lt;FONT size="5"&gt;&lt;STRONG&gt;Why It Matters&lt;/STRONG&gt;&lt;/FONT&gt;&lt;BR /&gt;While this project focuses on LEGO sets, the idea applies to many industries. Consumers and businesses frequently need to determine which products deliver the best value, which are overpriced, and how inventory naturally groups into different market segments.&lt;/P&gt;&lt;P&gt;This project demonstrates how Databricks can take raw data and transform it into actionable insights using data engineering, machine learning, and AI.&lt;/P&gt;&lt;P&gt;Thanks for reading, and I’d love to hear any feedback on what I can add or improve!&lt;/P&gt;</description>
      <pubDate>Mon, 01 Jun 2026 04:33:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/community-articles/dais-community-virtual-challenge-2026-lego-value-engine-using/m-p/158017#M1225</guid>
      <dc:creator>ashish51</dc:creator>
      <dc:date>2026-06-01T04:33:49Z</dc:date>
    </item>
  </channel>
</rss>

