<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Can I see whether a table column is read in databricks? in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124455#M47196</link>
    <description>&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Would I have to do that 1 table at a time though?&lt;/P&gt;&lt;P&gt;Lineage is useful but it only shows which tables used the table. It doesn't actually show the column used unless you go into the notebook. Unless I am missign something here?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
    <pubDate>Tue, 08 Jul 2025 12:47:33 GMT</pubDate>
    <dc:creator>dpc</dc:creator>
    <dc:date>2025-07-08T12:47:33Z</dc:date>
    <item>
      <title>Can I see whether a table column is read in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124427#M47185</link>
      <description>&lt;P&gt;Hello&lt;/P&gt;&lt;P&gt;Historically, we had a number of tables that have been extracted from source and loaded to databricks using 'select *'.&lt;/P&gt;&lt;P&gt;As a result some columns that have been loaded never get used.&lt;/P&gt;&lt;P&gt;I'd like to tidy this and remove redundant columns.&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Is there a way I can script whether columns in a table are referenced (or preferably not) in any downstream databricks notebooks?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 09:55:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124427#M47185</guid>
      <dc:creator>dpc</dc:creator>
      <dc:date>2025-07-08T09:55:35Z</dc:date>
    </item>
    <item>
      <title>Re: Can I see whether a table column is read in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124429#M47187</link>
      <description>&lt;P&gt;Hey&amp;nbsp;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/108745"&gt;@dpc&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;I don't know if you can do it without customizing a query parser... I don't have this exact script but maybe this could help. First step would be to find out which tables haven't been queried a lot in the past days (in this case, 90 days).&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;
&lt;PRE class="reader-text-block__code-block"&gt;&lt;CODE&gt;with used_tables as (
select
  source_table_catalog,
  source_table_schema,
  source_table_name,
  count(distinct created_by) as downstream_users,
  count(*) as downstream_dependents
from
  system.access.table_lineage
where
  true
  and source_table_full_name is not null
  and event_time &amp;gt;= date_add(now(), -90)
group by
  all
)

select
  tabelas.table_catalog
  ,tabelas.table_schema
  ,tabelas.table_name
  ,tabelas.table_type
  ,tabelas.table_owner
  ,tabelas.`comment` as table_comment
  ,tabelas.created as table_created_at
  ,tabelas.created_by as table_created_by
  ,tabelas.last_altered as table_last_update_at
  ,tabelas.last_altered_by as table_last_altered_by
from
  system.information_schema.`tables` as tabelas
  left join used_tables as ut on ut.source_table_catalog = tabelas.table_catalog and ut.source_table_schema = tabelas.table_schema and ut.source_table_name = tabelas.table_name
where
  true
  and ut.downstream_dependents is null
  and tabelas.table_catalog != "system"
  and tabelas.table_catalog != "__databricks_internal"
  and tabelas.table_schema != "information_schema"&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;You could then go deeper into the SQL queries, parsing the statement to see whether people are using "*" or specific columns.&amp;nbsp;&amp;nbsp;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;
&lt;PRE class="reader-text-block__code-block"&gt;&lt;CODE&gt;SELECT 
       l.source_table_full_name,
       l.entity_type,
       q.statement_text,
       q.executed_by,
       q.end_time
  FROM system.access.table_lineage  l
  JOIN system.query.history q
  ON l.entity_run_id = q.statement_id
WHERE source_table_full_name = '&amp;lt; insert table name &amp;gt;'&lt;/CODE&gt;&lt;/PRE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 10:05:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124429#M47187</guid>
      <dc:creator>mhiltner</dc:creator>
      <dc:date>2025-07-08T10:05:57Z</dc:date>
    </item>
    <item>
      <title>Re: Can I see whether a table column is read in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124431#M47188</link>
      <description>&lt;P&gt;Thanks. I'll take a look&lt;/P&gt;&lt;P&gt;The other point here though is that it's mainly ETL notebooks that access the landed data i.e. the silver layer&lt;/P&gt;&lt;P&gt;So, it's really a case of establishing whether the landed column has been used in the creation of a reporting table&lt;/P&gt;&lt;P&gt;If not, we don't need to extract the column&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 11:06:40 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124431#M47188</guid>
      <dc:creator>dpc</dc:creator>
      <dc:date>2025-07-08T11:06:40Z</dc:date>
    </item>
    <item>
      <title>Re: Can I see whether a table column is read in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124433#M47189</link>
      <description>&lt;P&gt;Are you using Unity Catalog? Because you could leverage the automatic lineage in this case. It shows everything that comes downstream from this silver layer, you could easily check which columns are being used and the lineage.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 11:28:21 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124433#M47189</guid>
      <dc:creator>mhiltner</dc:creator>
      <dc:date>2025-07-08T11:28:21Z</dc:date>
    </item>
    <item>
      <title>Re: Can I see whether a table column is read in databricks?</title>
      <link>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124455#M47196</link>
      <description>&lt;P&gt;Thanks&lt;/P&gt;&lt;P&gt;Would I have to do that 1 table at a time though?&lt;/P&gt;&lt;P&gt;Lineage is useful but it only shows which tables used the table. It doesn't actually show the column used unless you go into the notebook. Unless I am missign something here?&lt;/P&gt;&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 12:47:33 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/can-i-see-whether-a-table-column-is-read-in-databricks/m-p/124455#M47196</guid>
      <dc:creator>dpc</dc:creator>
      <dc:date>2025-07-08T12:47:33Z</dc:date>
    </item>
  </channel>
</rss>

