Mastering Data Governance: Implementing a Responsible Federation in the Insurance Industry with Unity Catalog
In the insurance industry, data governance is critical for more than just regulatory compliance and data protection—it plays a key role in driving operational efficiency, improving data quality, and fostering collaboration across business units. With the growing complexity of data landscapes, particularly in organizations that span multiple regions or are expanding through mergers and acquisitions, the ability to balance centralized governance with decentralized autonomy is essential. Unity Catalog offers a powerful solution that allows insurance companies to implement a responsible, federated data governance model, ensuring the right balance between control, flexibility, and scalability as they grow.
This blog explores how Unity Catalog can help Insurance organizations like ProInsure optimize their data management strategy, promoting seamless collaboration while maintaining compliance across diverse teams and geographies. We'll show how Unity Catalog serves as a blueprint for scaling governance, especially when multiple business units and regions need to collaborate while adhering to regulatory and compliance requirements.
Addressing Data Governance Challenges in a Siloed Environment
In the absence of a centralized governance solution, managing access control and data isolation across various workspaces, subsidiaries, and business domains becomes fragmented and overly manual. For an insurance company operating across multiple regions, the lack of a unified governance solution like Unity Catalog makes handling access controls, monitoring and auditing a time-consuming task, and prone to errors. Each workspace requires independent configuration, leading to several challenges:
Unity Catalog: A Scalable Governance Model
Unity Catalog provides a unified data governance model that eliminates the inefficiencies of managing siloed environments, teams, and data. It allows organizations to centrally manage access controls and data lineage across all workspaces, while maintaining necessary isolation and autonomy between subsidiaries and business units. This ensures compliance with regional data localization requirements and offers a consistent and scalable model for expanding the platform as the organization grows.
The underlying infrastructure for Unity Catalog is a metastore, which organizes metadata, enabling structured data access, governance, and management across the organization.
Managing Data Ecosystems with Triangular Governance
The relationship between workspaces, metastores, and user groups forms a triangular structure that plays a crucial role in providing the flexibility and the control for an organization's data platform. Here's how the components interact:
Unity Catalog leverages this relationship by implementing both Role-Based Access Control (RBAC) and Attribute-Based Access Control (ABAC), offering a granular and flexible security model. RBAC allows organizations to assign permissions based on user roles, while ABAC (currently in Private Preview) enables more dynamic control, with access decisions driven by object attributes. This setup delivers comprehensive security control, ensuring only authorized users can access specific resources. Additionally, Unity Catalog provides robust usage management and audit capabilities, allowing organizations to monitor resource utilization, track data access, and ensure compliance with internal and external regulations.
Unity Catalog also enables workspace-specific catalog bindings, ensuring that only authorized workspaces have access to the corresponding data. For instance, binding a production catalog to a production workspace ensures that development teams do not accidentally access or modify production data. This helps prevent unauthorized data movement and reduces risks.
User groups must have the necessary privileges to access a workspace and its resources.
ProInsure: A Real-World Example
ProInsure Group is a global insurance company operating across multiple regions, with a number of regional teams and acquired subsidiaries. The Group’s Center of Excellence (CoE) team needs to implement a global platform blueprint that can be adopted and utilized by these teams. Each team is at a different stage of technical maturity.
Before implementing Unity Catalog, ProInsure faced fragmented data governance and access control across its regional teams and subsidiaries. Managing policies, tracking data usage, and ensuring consistent security across multiple workspaces were labor-intensive and prone to errors, making it difficult to maintain a holistic view of their data assets and optimize operations.
The below diagram shows how implementing UC can support the diverse needs of each team.
Note: When there are cross regional teams, there will b- one metastore instance per region to deliver performance. The data sharing requirements between cross-regional metastores are met by Delta sharing protocol. However, the implementation details of this approach are not covered in this blog.
While Unity Catalog serves as a centralized governance solution, it also enables responsible federation, helping to avoid bottlenecks often faced by resource-constrained central admin teams. This approach provides business units with the autonomy they need to operate efficiently, while still maintaining control over key elements such as standards, data quality, and architecture. Ultimately, this balance enhances the overall data maturity of the organization, allowing for both governance and flexibility.
Additionally, Lakehouse Federation allows for seamless querying of external legacy systems, enabling teams to access data across multiple sources without the need for complex data migrations, enhancing flexibility and accelerating insights.
Ultimately, this balance enhances the overall data maturity of the organization, allowing for both governance and flexibility. Achieving this balance is critical, and Databricks provides several isolation techniques that enable organizations like ProInsure to blend governance with flexibility. Below, we explore three key isolation techniques that support this federated model: Isolation by Environment, Workloads, and Storage.
One of the key requirements for large organizations is to separate resources based on SDLC environments (e.g., development, staging, production). With Unity Catalog, workspaces can be structured to align with these specific environments, and catalogs can be organized accordingly. For example, you can create a catalog for each environment (such as a development catalog, staging catalog, or production catalog).
Unity Catalog enables environment-specific catalog bindings, ensuring that only authorized workspaces have access to the corresponding data. Binding a production catalog to a production workspace ensures that development teams do not accidentally access or modify production data, maintaining data governance without restricting operational efficiency.
Workspaces within Databricks support multi-tenancy with cluster isolation, meaning that different business units can share a workspace while keeping their workloads completely isolated. This isolation is achieved through the use of separate clusters, governed by team-specific cluster policies. It ensures that each team's compute resources remain segregated, improving both security and performance.
Moreover, usage costs can be tracked at the cluster level, allowing organizations to monitor and allocate costs based on the clusters assigned to each team within a shared workspace. Unity Catalog’s system tables enable precise tracking of compute usage, making it easier for teams to optimize resources and manage their budgets effectively. Additionally, serverless compute options in Databricks provide on-demand scalability, allowing teams to avoid managing infrastructure while ensuring cost-efficient and rapid access to compute resources.
Finally, Databricks workspaces integrate seamlessly with Git clients through repositories, further isolating source code across users and teams. This ensures that teams can manage their own version control, enabling independent development and reducing the risk of code conflicts.
Unity Catalog also supports federated governance by allowing organizations to organize their data according to business unit needs. Catalogs can be structured based on environment, core domains (e.g., policy, claims) or combinations of both, and further subdivided into subdomains or products. This hierarchical structure provides the granularity required for effective privilege and security management.
Each catalog can be backed by its own isolated storage account, ensuring physical separation of data. This not only enhances data security but also supports compliance with regulatory requirements such as data localisation.
Unity Catalog allows organizations like ProInsure to centralize governance while maintaining the flexibility necessary to support diverse business units. With its robust security features, cost tracking, and compliance capabilities, Unity Catalog provides a scalable solution that evolves with the organization. By leveraging its isolation techniques—by environment, workload, and storage—and using Governance Tags to categorize data assets with domain-specific information for simplified discovery, insurance companies can achieve a balance between autonomy and control, driving their overall data maturity forward.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.