cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
shyam_rao
Databricks Employee
Databricks Employee

shyam_rao_0-1746119066033.png

The Presentation Layer is the final stage in the Lakehouse data architecture. This layer contains curated, user-friendly data for business analysts, decision-makers, and BI applications. The data is generally organized into star schemas, aggregated summary tables, business views, and APIs, enabling integration with BI tools such as Databricks AI/BI, Tableau, and Power BI for interactive analysis and reporting.

Key Features

  • User-Centric: Tailored for analysts and decision-makers, providing clean and accessible data.
  • Abstracts Complexity: Simplifies raw data into enriched, pre-aggregated views, encapsulating business logic.
  • Supports Reporting: Optimizes for fast querying, visualizations, and calculated measures.
  • Secure: Enables governed, masked, and role-based access to sensitive data.

Best Practices

  1. Simplify Data: Use business-friendly schemas with clear relationships.
  2. Optimize Performance: Precompute aggregations and materialize views for efficiency.
  3. Enforce Security: Implement role-based access control, data masking, and audit trails.
  4. Monitor Usage: Track query performance and usage patterns for optimization.

The Presentation Layer bridges technical complexity and business needs, ensuring fast, secure, and actionable data for decision-making.

How Databricks Fits into the Presentation Layer

Databricks has several tools to implement the presentation layer:

  1. Delta Universal Format: Store pre-aggregated or denormalized data for BI consumption.
  2. Materialized Views: Precompute results of complex queries and aggregations for optimized performance.
  3. Databricks SQL: Create queries, dashboards, and visualizations directly in Databricks.
  4. Integration with BI Tools: Connect Databricks directly with tools like Power BI, Tableau, or Looker for seamless visualization and reporting.
  5. Unity Catalog (UC):  Unified data and AI governance for enhanced security and compliance.

 

How do I build my presentation layer?

The presentation layer comprises queryable objects such as tables, materialized views, and views.

Materialized Views

What Are Materialized Views in Databricks?

In Databricks SQL, materialized views are UC-managed tables that allow users to precompute results based on the latest version of data in source tables. CREATE and REFRESH operations on materialized views are powered by serverless DLT pipelines.

Why Use Materialized Views?

Performance Optimization
  • Reduced Query Time: Materialized views precompute and store the results of complex or computationally expensive queries. When users query the materialized view, they access the precomputed data, leading to faster query responses.
  • Efficient Aggregations: Instead of recalculating aggregations (for example, sums, averages) each time, a materialized view can store these precomputed results. For repetitive queries, materialized views reduce compute overhead and optimize resources.
Simplified Querying
  • Abstracts Complexity: Users can query the materialized view without worrying about the underlying data's complexity or transformations. Transformations like joins, aggregations, and filters are encapsulated in the view.
  • Business-Friendly Data Models: Materialized views can expose preprocessed data tailored for business use cases.
Incremental Updates
  • Many queries can be incrementally refreshed when running updates on materialized views using serverless pipelines. Incremental refreshes save compute costs by detecting changes in the data sources used to define the materialized view and incrementally computing the result.
Compatibility with BI Tools
  • Materialized views integrate well with BI tools (for example, Power BI, Tableau). They offer pre-aggregated or preprocessed datasets that speed up dashboard loading times.
Cost Savings
  • Lower Compute Costs: Precomputing results in materialized views reduces the need for repetitive expensive computations, saving on Databricks compute costs.
  • Resource Efficiency: The cluster resources can be allocated more efficiently because fewer resources are needed to serve end-user queries.

Limitations of Materialized Views

  • Not real-time: Materialized views are not updated in real-time. The frequency of updates depends on your refresh schedule.
  • Storage-overhead: Since materialized views store pre-computed data, they consume additional storage.

How to Create and Refresh Materialized Views in Databricks

Create a Materialized View (AWS|Azure|GCP)
  1. Write a query for your materialized view (for example, aggregations, joins).
    CREATE MATERIALIZED VIEW sales_by_region_mv
    AS
    SELECT region, customer, item, SUM(sales) AS total_sales
    FROM transactions
    GROUP BY region
    CLUSTER BY AUTO;

Note: Row-Level Security and Column-Level Masking can be defined on MVs.

Refresh a Materialized View (AWS|Azure|GCP)
  1. Refresh the materialized view periodically using a Databricks Workflows Job:
    REFRESH MATERIALIZED VIEW sales_by_region_mv;
  2. Configure the schedule with the SCHEDULE clause during CREATE or ALTER
    ... SCHEDULE CRON '0 0 0 * * ? * ...
    ... SCHEDULE EVERY 2 HOURS ...

Note: Refresh can be performed in either SYNC or ASYNC mode.

Monitoring Maintenance Activities
SELECT
 *
FROM
 event_log(TABLE(catalog.schema.table))
WHERE event_type in ('maintenance_progress', 'background_operation')
ORDER BY
 timestamp desc;

Improving Performance Using Materialized Views

  1. Use Liquid Clustering to optimize the data layout and maximize query performance. Take advantage of Predictive Optimization (AWS|Azure|GCP) and Auto Clustering.
  2. Creating a wide table, for example, including attributes from a dimension, can result in better data skipping based on column statistics.
  3. To generate optimized plans, specify constraints such as PRIMARY KEY..RELY and column constraints such as NOT NULL.

Views

Enhance your dashboarding and reporting processes by creating and exposing views on top of dimension and fact tables. Create a view on a table or a materialized view.

Views can be used to:

  1. Tailor the presentation: Expose only essential attributes (for example, primary address).
  2. Simplify attribute representation: ARRAY and STRUCT (and VARIANT) types might not be compatible with all BI tools.
  3. Create role-playing dimensions: For example, shipping facility vs. receiving facility.
  4. Use consistent formulas: Encapsulate business logic.
  5. Make extensible: Easily include new calculated measures based on user request.
  6. Focus on relevant data: Use filters to include only the necessary data. For instance, exclude soft-deleted records or outdated versions based on your specific use case.

Agile Data Modeling with Views

Views are beneficial in an agile environment, where business users frequently request new metrics. A well-designed Lakehouse architecture enables simple modifications, such as adding calculated columns in views. You also have the freedom to change the data representation in the backing table or materialized view. I have benefited from this design on multiple occasions.

Query Optimization Tip

When querying data, retrieve only the columns essential for analysis to optimize performance using predictive I/O and reduce fetch times.

 

Conclusion

By following these guidelines, you can effectively design and optimize your Star Schema (or other models such as Business Views) in Databricks SQL, leveraging advanced features and best practices to enable high performance and scalability. Get Started!

1 Comment