balajij8
Contributor III

@AanchalSoni 

Capturing the columns as Primary key helps users and tools understand relationships in the data. You can create Primary Key with RELY for optimization in some cases by skipping redundant operations.

  • Distinct Elimination
    When you apply a DISTINCT operator to a column marked as a PRIMARY KEY with RELY, the optimizer knows every value is already unique. It skips the expensive shuffle and sort required to get it.

    SELECT DISTINCT p_product_id FROM products will be treated as a SELECT p_product_id FROM products, removing the overhead

  • Join Elimination
    In a LEFT OUTER JOIN where you only select columns from the Fact table, the optimizer can eliminate the join if it knows the Dimension table is joined on a unique primary key.

    SELECT SUM(oi_sales_price) 
    FROM order_items oi
    LEFT JOIN promotions p ON oi.promo_id = p.p_promo_id;
    If p_promo_id is a RELY primary key, the optimizer realizes the join doesn't change the SUM result. It executes only the scan on the Fact table - SELECT SUM(oi_sales_price) FROM order_items oi;

     

View solution in original post