<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Data Masking Techniques and Issues with Creating Tables in Get Started Discussions</title>
    <link>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88746#M4244</link>
    <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/118852"&gt;@weilin0323&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;Good morning.&lt;/P&gt;&lt;P&gt;To avoid multiple results with masking, try using a &lt;STRONG&gt;hash function&lt;/STRONG&gt; like sha2 in your function instead of partial masking. This ensures each value is unique, even when masked. When &lt;STRONG&gt;joining tables&lt;/STRONG&gt;, apply the same hash to both columns so you can match them without revealing sensitive data. For &lt;STRONG&gt;filters&lt;/STRONG&gt;, using hashing or encryption will give you more accurate results compared to partial matching. Also, consider using &lt;STRONG&gt;views with dynamic masking&lt;/STRONG&gt; to consistently apply masks across queries. This approach will help maintain security and integrity in your data.&lt;/P&gt;&lt;P&gt;Regrads,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
    <pubDate>Thu, 05 Sep 2024 14:09:49 GMT</pubDate>
    <dc:creator>Brahmareddy</dc:creator>
    <dc:date>2024-09-05T14:09:49Z</dc:date>
    <item>
      <title>Data Masking Techniques and Issues with Creating Tables</title>
      <link>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/87791#M4213</link>
      <description>&lt;P&gt;Hello Databricks Team,&lt;/P&gt;&lt;P&gt;I understand that the mask function can be used to mask columns, but I have a few questions:&lt;/P&gt;&lt;OL&gt;&lt;LI&gt;&lt;P&gt;When users with access use a masked TABLE to create a downstream TABLE, the downstream TABLE does not inherit the mask function directly, so the data remains unmasked. In this case, do we need to apply the mask function again to the columns that require masking when creating the downstream TABLE?&lt;/P&gt;&lt;/LI&gt;&lt;LI&gt;&lt;P&gt;If users without access to the original data use a masked TABLE to create a downstream TABLE, the masked columns in the downstream TABLE will display the masked values, but it seems that this masking is hard-coded rather than applied through the mask function. How can this be addressed during querying? For example, if a user wants to query data for id=1234, but the id appears as '1**4' after masking, will it be impossible to use id=1234 as a query condition?&lt;/P&gt;&lt;/LI&gt;&lt;/OL&gt;&lt;P&gt;We would greatly appreciate any guidance on how to resolve these issues. Thank you for your assistance.&lt;/P&gt;</description>
      <pubDate>Tue, 03 Sep 2024 07:57:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/87791#M4213</guid>
      <dc:creator>weilin0323</dc:creator>
      <dc:date>2024-09-03T07:57:21Z</dc:date>
    </item>
    <item>
      <title>Re: Data Masking Techniques and Issues with Creating Tables</title>
      <link>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88087#M4224</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/118852"&gt;@weilin0323&lt;/a&gt;, How are you doing today?&lt;/P&gt;&lt;P&gt;As per my understanding,&amp;nbsp;When creating a downstream table from a masked table, you’ll need to &lt;STRONG&gt;reapply the mask function&lt;/STRONG&gt; to the necessary columns in the new table if you want to maintain the same level of data protection. For users without access to the original data, the masked values in the downstream table are indeed &lt;STRONG&gt;hard-coded&lt;/STRONG&gt;. This means if they try to query using an original value like id=1234;, it won't work because the data is already masked. To address this, consider applying the mask function during the query process, or provide a separate mechanism for those users to query based on the masked values.&lt;/P&gt;&lt;P&gt;Give a try and let me know.&lt;/P&gt;&lt;P&gt;Regards,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Tue, 03 Sep 2024 16:12:32 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88087#M4224</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2024-09-03T16:12:32Z</dc:date>
    </item>
    <item>
      <title>Re: Data Masking Techniques and Issues with Creating Tables</title>
      <link>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88196#M4227</link>
      <description>&lt;P&gt;Thank you for your response.&lt;/P&gt;&lt;P&gt;I tried to create a function that can be used in a query:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;CREATE OR REPLACE FUNCTION idno_filter_test(idno STRING, filter_value STRING)
RETURNS BOOLEAN
RETURN IF(current_user() = "admin@test.com", idno = filter_value,
    concat(left(idno, 2), "**", right(idno, 2)) = concat(left(filter_value, 2), "**", right(filter_value, 2)) )&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;However, this may result in errors in the query. For example, with the following condition:&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT * 
FROM working.idno_masked_downstream
WHERE idno_filter_test(memidno, "123456")&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Both "123456" and "124356" match the condition id=12**56, so it may return two results.&lt;/P&gt;&lt;P&gt;Additionally, if I want to join two tables using idno as the key, how should I configure the masked table to ensure it can be matched with other tables based on the key value?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;SELECT aa.order_number, aa.IDNO, bb.MEMIDNO
FROM processed.order_integrate AS aa
INNER JOIN working.idno_masked_downstream AS bb
ON bb.MEMIDNO = aa.IDNO&lt;/LI-CODE&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;I would greatly appreciate any guidance on how to resolve these issues. Thank you for your assistance.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 04 Sep 2024 05:53:31 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88196#M4227</guid>
      <dc:creator>weilin0323</dc:creator>
      <dc:date>2024-09-04T05:53:31Z</dc:date>
    </item>
    <item>
      <title>Re: Data Masking Techniques and Issues with Creating Tables</title>
      <link>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88746#M4244</link>
      <description>&lt;P&gt;Hi&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/118852"&gt;@weilin0323&lt;/a&gt;,&lt;/P&gt;&lt;P&gt;Good morning.&lt;/P&gt;&lt;P&gt;To avoid multiple results with masking, try using a &lt;STRONG&gt;hash function&lt;/STRONG&gt; like sha2 in your function instead of partial masking. This ensures each value is unique, even when masked. When &lt;STRONG&gt;joining tables&lt;/STRONG&gt;, apply the same hash to both columns so you can match them without revealing sensitive data. For &lt;STRONG&gt;filters&lt;/STRONG&gt;, using hashing or encryption will give you more accurate results compared to partial matching. Also, consider using &lt;STRONG&gt;views with dynamic masking&lt;/STRONG&gt; to consistently apply masks across queries. This approach will help maintain security and integrity in your data.&lt;/P&gt;&lt;P&gt;Regrads,&lt;/P&gt;&lt;P&gt;Brahma&lt;/P&gt;</description>
      <pubDate>Thu, 05 Sep 2024 14:09:49 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/88746#M4244</guid>
      <dc:creator>Brahmareddy</dc:creator>
      <dc:date>2024-09-05T14:09:49Z</dc:date>
    </item>
    <item>
      <title>Re: Data Masking Techniques and Issues with Creating Tables</title>
      <link>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/89397#M4278</link>
      <description>&lt;P&gt;&lt;SPAN&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/102548"&gt;@Brahmareddy&lt;/a&gt;&amp;nbsp;,&lt;/SPAN&gt;&lt;/P&gt;&lt;P&gt;&lt;SPAN&gt;Thank you so much. I'll try it.&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 11 Sep 2024 02:39:08 GMT</pubDate>
      <guid>https://community.databricks.com/t5/get-started-discussions/data-masking-techniques-and-issues-with-creating-tables/m-p/89397#M4278</guid>
      <dc:creator>weilin0323</dc:creator>
      <dc:date>2024-09-11T02:39:08Z</dc:date>
    </item>
  </channel>
</rss>

