cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
Anton-Dusak
Databricks Employee
Databricks Employee

Mastering Data Governance: Implementing a Responsible Federation in the Insurance Industry with Unity Catalog

In the insurance industry, data governance is critical for more than just regulatory compliance and data protection—it plays a key role in driving operational efficiency, improving data quality, and fostering collaboration across business units. With the growing complexity of data landscapes, particularly in organizations that span multiple regions or are expanding through mergers and acquisitions, the ability to balance centralized governance with decentralized autonomy is essential. Unity Catalog offers a powerful solution that allows insurance companies to implement a responsible, federated data governance model, ensuring the right balance between control, flexibility, and scalability as they grow. 

This blog explores how Unity Catalog can help Insurance organizations like ProInsure optimize their data management strategy, promoting seamless collaboration while maintaining compliance across diverse teams and geographies. We'll show how Unity Catalog serves as a blueprint for scaling governance, especially when multiple business units and regions need to collaborate while adhering to regulatory and compliance requirements.

Addressing Data Governance Challenges in a Siloed Environment

In the absence of a centralized governance solution, managing access control and data isolation across various workspaces, subsidiaries, and business domains becomes fragmented and overly manual. For an insurance company operating across multiple regions, the lack of a unified governance solution like Unity Catalog makes handling access controls, monitoring and auditing a time-consuming task, and prone to errors. Each workspace requires independent configuration, leading to several challenges:

  • There's no holistic view across enterprise data assets, making it difficult to manage and fully leverage the organization's data. 
  • Redundant operational and administrative effort to set up separate policies for different environments, and inconsistencies in access management that lead to increased security risk.
  • Discovering and reusing valuable data assets from other parts of the organization becomes challenging, leading to missed opportunities.
  • Identifying the root cause of an issue or assessing its impact on dependent data assets and applications is slow, inefficient, and error-prone.
  • There's limited oversight of how data and resources are being used across various business units and subsidiaries, making cost tracking and optimization difficult.

Unity Catalog: A Scalable Governance Model

Unity Catalog provides a unified data governance model that eliminates the inefficiencies of managing siloed environments, teams, and data. It allows organizations to centrally manage access controls and data lineage across all workspaces, while maintaining necessary isolation and autonomy between subsidiaries and business units. This ensures compliance with regional data localization requirements and offers a consistent and scalable model for expanding the platform as the organization grows.

The underlying infrastructure for Unity Catalog is a metastore, which organizes metadata, enabling structured data access, governance, and management across the organization.

Managing Data Ecosystems with Triangular Governance

The relationship between workspaces, metastores, and user groups forms a triangular structure that plays a crucial role in providing the flexibility and the control for an organization's data platform. Here's how the components interact:

  • Workspace: Manages compute resources such as clusters and jobs, enabling users to build various workloads.
  • Metastore: Underpins Unity Catalog and acts as a top level container for information management. Its three-tier namespace structure—comprising catalogs, schemas, and data objects such as tables, views, volumes, functions, and models—enables organizations to design and organize their data according to specific needs. This structure is essential in shaping access control strategies, ensuring permissions are assigned with the appropriate level of granularity.
  • User Groups: Must be granted appropriate access to both workspaces and the data within the metastore, based on their roles and responsibilities.

Screenshot 2024-10-25 at 13.05.10.pngUnity Catalog leverages this relationship by implementing both Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC), offering a granular and flexible security model. RBAC allows organizations to assign permissions based on user roles, while ABAC (currently in Private Preview) enables more dynamic control, with access decisions driven by object attributes. This setup delivers comprehensive security control, ensuring only authorized users can access specific resources. Additionally, Unity Catalog provides robust usage management and audit capabilities, allowing organizations to monitor resource utilization, track data access, and ensure compliance with internal and external regulations. 

Unity Catalog also enables workspace-specific catalog bindings, ensuring that only authorized workspaces have access to the corresponding data. For instance, binding a production catalog to a production workspace ensures that development teams do not accidentally access or modify production data. This helps prevent unauthorized data movement and reduces risks.

User groups must have the necessary privileges to access a workspace and its resources.

ProInsure: A Real-World Example

ProInsure Group is a global insurance company operating across multiple regions, with a number of regional teams and acquired subsidiaries. The Group’s Center of Excellence (CoE) team needs to implement a global platform blueprint that can be adopted and utilized by these teams. Each team is at a different stage of technical maturity. 

Before implementing Unity Catalog, ProInsure faced fragmented data governance and access control across its regional teams and subsidiaries. Managing policies, tracking data usage, and ensuring consistent security across multiple workspaces were labor-intensive and prone to errors, making it difficult to maintain a holistic view of their data assets and optimize operations.

The below diagram shows how implementing UC can support the diverse needs of each team.

Flowchart - KK With UC v2.png

  • Regional Team in the EU: The team functions at a lower technical maturity level and depends on the Group's Center of Excellence (CoE) for technical support. While they utilize the Group's infrastructure for large-scale solution deployment and operations, they require a dedicated workspace for research and exploration. To facilitate this, the Group provides a sandbox environment, allowing regional users to access group data and integrate their own for custom solution development. Their data setup is streamlined, with a single catalog they fully own and manage. They have full write access within their catalog and read access to relevant Group catalogs. The Group monitors the usage cost of the sandbox workspace, which can be charged back to the regional team for accountability purposes.
  • ReInsurance Subsidiary: In contrast, the ReInsurance subsidiary is highly established, with a higher level of technical expertise, and is fully self-sufficient in developing and deploying solutions as per their roadmap while managing their own release cycles. They maintain autonomous development, testing, and production workspaces, and their catalogs are similarly organized by environment, with multiple schemas for domains such as policy, claims, and underwriting. Although they collaborate with the Group on best practices, compliance, and asset sharing, they operate as a fully federated environment. Catalog ownership remains within the subsidiary, and they have their own metastore administrator to independently manage information governance. As with the regional team, the cost of their workspace usage is tracked and charged back by the Group.
  • Group’s Center of Excellence (CoE): The Group’s CoE operates as a tech enabler for the wider enterprise and sets compliance standards and best practices across the organization. They set up a workspace for each environment, but catalogs are organized by domains. The domain level catalog further segre-ates data into schemas based on subdomains. This arrangement allows the Group to manage access permissions for external teams at both the domain and subdomain levels. For example, the Group can grant the regional team access to the Policy catalog while restricting access to the Finance catalog. Alternatively, permissions can be set at the subdomain level, offering more granular control over data access. They monitor the costs attributed to each team across the organization supported by a chargeback model and take responsibility for account billing.

Note: When there are cross regional teams, there will b- one metastore instance per region to deliver performance. The data sharing requirements between cross-regional metastores are met by Delta sharing protocol. However, the implementation details of this approach are not covered in this blog. 

Federated Architecture with Unity Catalog

While Unity Catalog serves as a centralized governance solution, it also enables responsible federation, helping to avoid bottlenecks often faced by resource-constrained central admin teams. This approach provides business units with the autonomy they need to operate efficiently, while still maintaining control over key elements such as standards, data quality, and architecture. Ultimately, this balance enhances the overall data maturity of the organization, allowing for both governance and flexibility.

Additionally, Lakehouse Federation allows for seamless querying of external legacy systems, enabling teams to access data across multiple sources without the need for complex data migrations, enhancing flexibility and accelerating insights.

Ultimately, this balance enhances the overall data maturity of the organization, allowing for both governance and flexibility. Achieving this balance is critical, and Databricks provides several isolation techniques that enable organizations like ProInsure to blend governance with flexibility. Below, we explore three key isolation techniques that support this federated model: Isolation by Environment, Workloads, and Storage.

1. Isolation by Environment 

One of the key requirements for large organizations is to separate resources based on SDLC environments (e.g., development, staging, production). With Unity Catalog, workspaces can be structured to align with these specific environments, and catalogs can be organized accordingly. For example, you can create a catalog for each environment (such as a development catalog, staging catalog, or production catalog).

Unity Catalog enables environment-specific catalog bindings, ensuring that only authorized workspaces have access to the corresponding data. Binding a production catalog to a production workspace ensures that development teams do not accidentally access or modify production data, maintaining data governance without restricting operational efficiency.

2. Isolation by Workloads

Workspaces within Databricks support multi-tenancy with cluster isolation, meaning that different business units can share a workspace while keeping their workloads completely isolated. This isolation is achieved through the use of separate clusters, governed by team-specific cluster policies. It ensures that each team's compute resources remain segregated, improving both security and performance.

Moreover, usage costs can be tracked at the cluster level, allowing organizations to monitor and allocate costs based on the clusters assigned to each team within a shared workspace. Unity Catalog’s system tables enable precise tracking of compute usage, making it easier for teams to optimize resources and manage their budgets effectively. Additionally, serverless compute options in Databricks provide on-demand scalability, allowing teams to avoid managing infrastructure while ensuring cost-efficient and rapid access to compute resources. 

Finally, Databricks workspaces integrate seamlessly with Git clients through repositories, further isolating source code across users and teams. This ensures that teams can manage their own version control, enabling independent development and reducing the risk of code conflicts.

3. Isolation by Storage

Unity Catalog also supports federated governance by allowing organizations to organize their data according to business unit needs. Catalogs can be structured based on environment, core domains (e.g., policy, claims) or combinations of both, and further subdivided into subdomains or products. This hierarchical structure provides the granularity required for effective privilege and security management.

Each catalog can be backed by its own isolated storage account, ensuring physical separation of data. This not only enhances data security but also supports compliance with regulatory requirements such as data localisation. 

Achieving Balance Between Control and Autonomy

Unity Catalog allows organizations like ProInsure to centralize governance while maintaining the flexibility necessary to support diverse business units. With its robust security features, cost tracking, and compliance capabilities, Unity Catalog provides a scalable solution that evolves with the organization. By leveraging its isolation techniques—by environment, workload, and storage—and using Governance Tags to categorize data assets with domain-specific information for simplified discovery, insurance companies can achieve a balance between autonomy and control, driving their overall data maturity forward.