cancel
Showing results for 
Search instead for 
Did you mean: 
Technical Blog
Explore in-depth articles, tutorials, and insights on data analytics and machine learning in the Databricks Technical Blog. Stay updated on industry trends, best practices, and advanced techniques.
cancel
Showing results for 
Search instead for 
Did you mean: 
arijitm
Databricks Employee
Databricks Employee

Introduction

Every hour, global manufacturing plants across multiple regions generate gigabytes of telemetry data from sensors, machines, and production lines. This vast amount of data holds immense potential for predictive maintenance, operational optimization, and efficiency improvements. Manufacturing organizations face a uniquely complex data governance landscape, as they often span multiple regions with different infrastructure capabilities and local compliance requirements. 
Manufacturing telemetry data is not just large in volume, it is highly diverse: ranging from time-series sensor readings and batch logs to equipment configuration files. Much of this data is sensitive, containing proprietary machine configurations or insights into operator behavior, and is often needed in real time by cross-functional teams spanning engineering, data science, and operations.
Traditional data platforms struggle to scale governance across such diverse formats, use cases, and regions without creating bottlenecks. Databricks offers a unified and scalable approach to this problem. Its lakehouse architecture combines the reliability and governance of data warehouses with the openness and flexibility of data lakes. It supports structured and unstructured data at scale, enables fine-grained access controls across clouds and regions, and facilitates secure data sharing and real-time processing—making it particularly well suited to the unique demands of modern manufacturing data governance.

In this blog post, we explore best practices for implementing this using Databricks:

Defining and partitioning tables at a plant level

  • Using Delta tables across multiple plants to efficiently store and process telemetry data from various manufacturing plants.

Structuring teams and access in Unity Catalog

A well-defined team topology ensures that data access aligns with organizational roles by using account-level groups, which can be synchronized from the organization's identity provider (such as Azure AD, Okta, or others).

  • Global Teams – Require access to all data across regions.
  • Operating Plant Teams – Need access only to data from their respective plants.
  • Regional Teams – Require access to data across multiple plants within a specific region.
  • HR Teams - Require access to sensitive data generated from ERP platforms

By structuring access in this way, organizations can maintain a scalable governance model that supports innovation while enforcing least-privilege access principles.

Implementing fine-grained governance with row filters, tags, and mapping tables

To enforce fine-grained access control, organizations can combine row filters, metadata tags, and mapping tables to define dynamic governance rules. Mapping tables serve as a central component for encoding governance logic — specifying which users, roles, or account-level groups (synced from the identity provider) are authorized to access specific subsets of data. This approach ensures the principle of least privilege is consistently applied, granting teams access only to the data relevant to their role. Additionally, metadata tagging enhances data discoverability and facilitates lifecycle management.

To understand how these capabilities apply in real-world manufacturing environments, let's first examine the typical data setup in these plants.

Current Setup

Manufacturing plants often employ a diverse range of data solutions to handle telemetry and operational data. These setups typically include:

  • Time-series telemetry data - continuous data streams from machines and sensors capturing operational metrics using custom edge databases or other bespoke implementations.
  • Asset models and asset hierarchy - knowledge graphs for contextualizing asset relationships or custom relational database implementations. They store information about virtual representation of machines and equipment.
  • ERP integration -  shift schedules, working hours, and data elements critical for business operations.

The next section explores key challenges and best practices for optimizing data management while balancing flexibility and compliance.

Challenges

  • No centralized governance - In a manufacturing environment without proper data governance, unrestricted access to all data elements may be prevalent. This is particularly concerning as manufacturing data models often contain sensitive information. For instance, tables with shift schedules and working hours of shop floor operators may include personally identifiable information. Without strict governance, such exposure poses significant privacy and security risks.

  • Data Silos - Global manufacturing setups often suffer from data silos, where critical information is isolated across departments and systems. This fragmentation leads to inefficiencies, poor collaboration, and obstructed decision-making. A lack of real-time visibility into production workflows, supply chains, and customer demand makes it challenging to optimize operations and adapt to dynamic market conditions.

Without structured governance, sensitive data remains vulnerable, and collaboration across regions is restricted. These challenges highlight the need for a structured governance framework that balances security with operational flexibility.

Solution

Databricks enables a scalable governance model by defining clear access privileges for different teams while maintaining compliance. The following matrix outlines the levels of access required for various operational teams.

 

Databricks Account Groups

Telemetry data

ERP data

Operating Plant Data Analysts

Access to only specific rows for plant

No access to PII data

Regional Data Analysts

Access to multiple plants in the region

No access to PII data

Global Data Analytics Teams

Access to all plant telemetry data

No access to PII data

Global HR Teams

No access to telemetry data

Can access PII data


In order to translate the above table ACLs and apply them to their respective groups we will use the concept of mapping tables. The below diagram outlines the following:

  • Telemetry data from different plants are ingested into a centralized lakehouse implementation and stored as a delta table partitioned by the respective plant codes in the Bronze layer
  • ERP data is also ingested into delta tables in the bronze layer. The ERP data contains sensitive PII columns for which the access should be delegated according to the principle of least privilege as outlined in the access definition matrix above

Empowering Industry 4.0_Balancing Data Access and Governance (1).png

The mapping table as outlined in the diagram stores the access control lists for both:

  • Access management for telemetry data using row filters
  • Access management for PII data using column masking

Mapping table structure and data

The mapping table has the following columns:

  • group_name: Organization persona-based group defined at Databricks account level. This column is of type string.
  • entity_type: The corresponding entity or business domain these groups belong to. This column is of type string.
  • entity_ids : This is an array of strings which contains the specific plant_id’s the groups should have access to.
  • pii_access: Indicates whether the group should have access to sensitive PII columns

The following naming conventions are followed while defining the persona-based groups

  • Plant group naming convention: group_${region}_${plant_id}
  • Regional group naming convention: group_${region}
  • Global group naming convention: group_global
  • Global HR group: group_global_hr

Execute the below DDL and DML statements to create the ACL_MAPPING table and insert sample data into the table.
Create table:

create or replace table acl_mapping (
   group_name string,
   entity_type string,
   entity_ids array<int>,
   pii_access string
);

Insert sample data into the table:

insert into acl_mapping values ('group_us_east_1_101', 'plant', array(101), 'no');
insert into acl_mapping values ('group_us_east_1', 'region', array(101, 102), 'no');
insert into acl_mapping values ('group_global', 'global', array(), 'no');
insert into acl_mapping values ('group_global_hr', 'hr', array(), 'yes');

The table below shows rows sample data inserted in the mapping table. Let us assume the mapping table is named ACL_MAPPING.

ACL_MAPPING Data:

group_name

entity_type

entity_ids

pii_access

group_us_east_1_101

plant

[101]

no

group_us_east_1

region

[101,102]

no

group_global

global

[]

no

group_global_hr

hr

[]

yes

Row Filter function:

In order to define row-based access control we will use the concept of a row filter function. The function accepts plant_id as a parameter which binds to the plant_id column in the telemetry data. Plant_id indicates the plant which originates the particular telemetry table row.

create or replace function telemetry_rls_mapping(plant_id string)
returns string
 return (
   select 1 from rls_mapping mt
   where is_account_group_member('group_global')
 )
 or exists (
   select
     1
   from
     acl_mapping mt
   where
     is_account_group_member(mt.group_name)
     and (
       (
         mt.entity_type = 'plant'
         and array_contains(mt.entity_ids, plant_id)
       )
       or (
         mt.entity_type = 'region'
         and array_contains(mt.entity_ids, plant_id)
       )
     )
 );

The above row filter once defined can be applied to the telemetry table using and ALTER statement as below:

alter table telemetry_measurements
set row filter telemetry_rls_mapping on (plant_id);

Data Masking function

In order to define column masks for PII columns we will leverage the concept of column mask functions. The code below references the mapping table to infer if a group should have access to a particular pii sensitive data column based on the pii_access flag column. If the group/persona should not have access to PII data columns then the users in the group would see masked value ‘********’

create function pii_mask(pii_column_value string)
returns string
 return (
   select
   case when pii_access='yes' then pii_column_value else '*********' end
   from acl_mapping mt
   where is_account_group_member(mt.group_name)
 );

The above masking function can be defined at a column level to the employee table’s phone number column using an ALTER statement as shown below.

alter table worker alter column phone_number set mask pii_mask;

Please note that If a particular user belonging to a group which does not have an entry in the ACL_MAPPING table queries a table for which the data masking function has been implemented, then they would see masked values.

With these governance controls in place, organizations can ensure that manufacturing telemetry and ERP data remain secure and compliant while enabling structured, role-based access. By leveraging Databricks’ row filtering and column masking capabilities, businesses can maintain a scalable, region-aware access model that enhances operational agility and mitigates risks associated with data silos and unauthorized access.

Conclusion:

In this blog, we outlined a structured approach to governing telemetry and ERP data in manufacturing environments. We examined how to implement team-based access control using mapping tables, row-level security, and column masking in Databricks. A well-defined team topology plays a crucial role in this governance model, ensuring that access is aligned with organizational roles and responsibilities. By applying these best practices, organizations can strengthen data governance, ensure compliance, and unlock deeper insights from their manufacturing data. This approach not only protects sensitive information but also empowers teams to make data-driven decisions with confidence.

Here are some related links for your reference:

Filter sensitive table data using row filters and column masks

Support and Limitations of Row filters and column masks