<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to extract read path from notebooks. Especially from the autoloader in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/150164#M53281</link>
    <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210036"&gt;@ajay_wavicle&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;There are several approaches you can use to extract read/source paths from notebooks, including Auto Loader (cloudFiles) paths. The right choice depends on whether you want runtime lineage data or static code analysis.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 1: UNITY CATALOG LINEAGE SYSTEM TABLES (RECOMMENDED)&lt;/P&gt;
&lt;P&gt;If your workloads run against Unity Catalog, the lineage system tables automatically capture source and target paths at runtime, including cloud storage paths used by Auto Loader.&lt;/P&gt;
&lt;P&gt;The two key tables are:&lt;/P&gt;
&lt;P&gt;system.access.table_lineage&lt;BR /&gt;system.access.column_lineage&lt;/P&gt;
&lt;P&gt;Both tables include a source_path column that captures cloud storage URIs (s3://, abfss://, gs://, etc.) and a source_type column that can be TABLE or PATH. They also include entity_type (NOTEBOOK, JOB, PIPELINE, etc.) and entity_id so you can trace which notebook performed the read.&lt;/P&gt;
&lt;P&gt;Example query to find all source paths read by notebooks:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;entity_type,&lt;BR /&gt;entity_id,&lt;BR /&gt;source_table_full_name,&lt;BR /&gt;source_path,&lt;BR /&gt;source_type,&lt;BR /&gt;event_time&lt;BR /&gt;FROM system.access.table_lineage&lt;BR /&gt;WHERE source_type IS NOT NULL&lt;BR /&gt;AND target_type IS NULL&lt;BR /&gt;AND entity_type = 'NOTEBOOK'&lt;BR /&gt;ORDER BY event_time DESC&lt;/P&gt;
&lt;P&gt;To find Auto Loader reads specifically, filter for PATH source types with cloud storage prefixes:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;entity_id,&lt;BR /&gt;source_path,&lt;BR /&gt;event_time&lt;BR /&gt;FROM system.access.table_lineage&lt;BR /&gt;WHERE source_type = 'PATH'&lt;BR /&gt;AND entity_type IN ('NOTEBOOK', 'PIPELINE')&lt;BR /&gt;ORDER BY event_time DESC&lt;/P&gt;
&lt;P&gt;Notes on lineage system tables:&lt;BR /&gt;- Lineage data is retained for one year on a rolling basis&lt;BR /&gt;- Lineage is captured across all workspaces attached to a Unity Catalog metastore&lt;BR /&gt;- You need to enable system tables if you have not already&lt;/P&gt;
&lt;P&gt;Documentation: &lt;A href="https://docs.databricks.com/aws/en/admin/system-tables/lineage" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/system-tables/lineage&lt;/A&gt;&lt;BR /&gt;Documentation: &lt;A href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/data-lineage" target="_blank"&gt;https://docs.databricks.com/aws/en/data-governance/unity-catalog/data-lineage&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 2: EXPORT AND PARSE NOTEBOOK SOURCE CODE WITH THE DATABRICKS SDK&lt;/P&gt;
&lt;P&gt;If you need to extract paths from the notebook code itself (static analysis), you can use the Databricks SDK for Python to export notebooks and then parse the source code for read statements and cloudFiles paths.&lt;/P&gt;
&lt;P&gt;Step 1: Install the SDK&lt;/P&gt;
&lt;P&gt;pip install databricks-sdk&lt;/P&gt;
&lt;P&gt;Step 2: List and export notebooks&lt;/P&gt;
&lt;P&gt;from databricks.sdk import WorkspaceClient&lt;BR /&gt;import base64&lt;BR /&gt;import re&lt;/P&gt;
&lt;P&gt;w = WorkspaceClient()&lt;/P&gt;
&lt;P&gt;# List all notebooks in a folder&lt;BR /&gt;notebooks = w.workspace.list("/Users/your_folder/")&lt;/P&gt;
&lt;P&gt;for item in notebooks:&lt;BR /&gt;if item.object_type.name == "NOTEBOOK":&lt;BR /&gt;# Export the notebook source&lt;BR /&gt;export_response = w.workspace.export(&lt;BR /&gt;path=item.path,&lt;BR /&gt;format="SOURCE"&lt;BR /&gt;)&lt;BR /&gt;source_code = base64.b64decode(export_response.content).decode("utf-8")&lt;/P&gt;
&lt;P&gt;# Search for Auto Loader (cloudFiles) paths&lt;BR /&gt;autoloader_paths = re.findall(&lt;BR /&gt;r'\.load\(\s*["\']([^"\']+)["\']',&lt;BR /&gt;source_code&lt;BR /&gt;)&lt;BR /&gt;cloudfiles_paths = re.findall(&lt;BR /&gt;r'cloudFiles\.path["\'\s,)]*["\']([^"\']+)["\']',&lt;BR /&gt;source_code&lt;BR /&gt;)&lt;BR /&gt;# Also search for option-based cloudFiles path&lt;BR /&gt;option_paths = re.findall(&lt;BR /&gt;r'\.option\(\s*["\']cloudFiles\.path["\']\s*,\s*["\']([^"\']+)["\']',&lt;BR /&gt;source_code&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;# Search for general spark.read paths&lt;BR /&gt;read_paths = re.findall(&lt;BR /&gt;r'spark\.read[^)]*\.(csv|parquet|json|orc|delta|format)\([^)]*\)\s*\.load\(\s*["\']([^"\']+)["\']',&lt;BR /&gt;source_code,&lt;BR /&gt;re.DOTALL&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;if autoloader_paths or cloudfiles_paths or option_paths or read_paths:&lt;BR /&gt;print(f"Notebook: {item.path}")&lt;BR /&gt;for p in autoloader_paths + cloudfiles_paths + option_paths:&lt;BR /&gt;print(f" Path found: {p}")&lt;BR /&gt;for fmt, p in read_paths:&lt;BR /&gt;print(f" Read path ({fmt}): {p}")&lt;/P&gt;
&lt;P&gt;Step 3: For more robust parsing of Auto Loader specifically, look for the common patterns:&lt;/P&gt;
&lt;P&gt;# Pattern 1: readStream with cloudFiles format and load path&lt;BR /&gt;# spark.readStream.format("cloudFiles").option(...).load("s3://bucket/path")&lt;/P&gt;
&lt;P&gt;# Pattern 2: cloudFiles.path option&lt;BR /&gt;# .option("cloudFiles.path", "s3://bucket/path")&lt;/P&gt;
&lt;P&gt;# Pattern 3: In DLT/SDP notebooks using dlt.read_stream or &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table&lt;BR /&gt;# spark.readStream.format("cloudFiles").load("/mnt/data/input")&lt;/P&gt;
&lt;P&gt;Documentation: &lt;A href="https://docs.databricks.com/aws/en/dev-tools/sdk-python.html" target="_blank"&gt;https://docs.databricks.com/aws/en/dev-tools/sdk-python.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 3: QUERY THE PIPELINES API FOR LAKEFLOW SPARK DECLARATIVE PIPELINES (SDP)&lt;/P&gt;
&lt;P&gt;If your Auto Loader code runs inside Lakeflow Spark Declarative Pipelines (SDP), the pipeline definition and settings contain configuration that may reference source paths. You can retrieve this via the SDK:&lt;/P&gt;
&lt;P&gt;from databricks.sdk import WorkspaceClient&lt;/P&gt;
&lt;P&gt;w = WorkspaceClient()&lt;/P&gt;
&lt;P&gt;# List all pipelines&lt;BR /&gt;pipelines = w.pipelines.list_pipelines()&lt;/P&gt;
&lt;P&gt;for pipeline in pipelines:&lt;BR /&gt;detail = w.pipelines.get(pipeline_id=pipeline.pipeline_id)&lt;BR /&gt;print(f"Pipeline: {detail.name}")&lt;BR /&gt;print(f" Libraries: {detail.spec.libraries}")&lt;BR /&gt;print(f" Configuration: {detail.spec.configuration}")&lt;/P&gt;
&lt;P&gt;The configuration dictionary often contains cloudFiles.path or other source path settings that are passed into the pipeline notebooks.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 4: COMBINE WITH THE DATABRICKS CLI&lt;/P&gt;
&lt;P&gt;You can also use the Databricks CLI to export notebooks in bulk for parsing:&lt;/P&gt;
&lt;P&gt;databricks workspace export-dir /Users/ ./exported_notebooks --overwrite&lt;/P&gt;
&lt;P&gt;Then use standard text search tools (grep, ripgrep, etc.) to find read paths:&lt;/P&gt;
&lt;P&gt;grep -rn "cloudFiles" ./exported_notebooks/&lt;BR /&gt;grep -rn "\.load(" ./exported_notebooks/&lt;BR /&gt;grep -rn "spark\.read" ./exported_notebooks/&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;SUMMARY&lt;/P&gt;
&lt;P&gt;For runtime lineage (what paths were actually read): use system.access.table_lineage, which captures cloud storage paths including Auto Loader sources.&lt;/P&gt;
&lt;P&gt;For static code analysis (what paths are in the code): use the Databricks SDK to export notebook source and parse with regex or AST tools.&lt;/P&gt;
&lt;P&gt;For pipeline-specific paths: query the Pipelines API for configuration values.&lt;/P&gt;
&lt;P&gt;The lineage system tables approach is typically the most reliable because it captures what actually executed, rather than what the code text contains (which may use variables, widgets, or dynamic path construction).&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
    <pubDate>Sun, 08 Mar 2026 07:19:53 GMT</pubDate>
    <dc:creator>SteveOstrowski</dc:creator>
    <dc:date>2026-03-08T07:19:53Z</dc:date>
    <item>
      <title>How to extract read path from notebooks. Especially from the autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/145583#M52538</link>
      <description>&lt;P&gt;I am trying to figure out how to extract source paths from read statement or autoloader paths. I need for knowing my source locations from lot of notebooks. How to extract from databricks. Can databricks sdk do this?&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 16:44:27 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/145583#M52538</guid>
      <dc:creator>ajay_wavicle</dc:creator>
      <dc:date>2026-01-28T16:44:27Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract read path from notebooks. Especially from the autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/145592#M52540</link>
      <description>&lt;P&gt;You can get metadata information for input files with the &lt;STRONG&gt;_metadata&lt;/STRONG&gt; column.For file path you would use&amp;nbsp;&lt;STRONG&gt;_metadata.&lt;/STRONG&gt;&lt;STRONG&gt;file_path.&lt;BR /&gt;&lt;/STRONG&gt;&lt;/P&gt;&lt;LI-CODE lang="python"&gt;df = spark.read \
  .format("csv") \
  .schema(schema) \
  .load("dbfs:/tmp/*") \
  .select("*", "_metadata.file_path")&lt;/LI-CODE&gt;&lt;P&gt;More details -&amp;nbsp;&lt;A href="https://docs.databricks.com/aws/en/ingestion/file-metadata-column" target="_blank"&gt;https://docs.databricks.com/aws/en/ingestion/file-metadata-column&lt;/A&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 18:17:19 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/145592#M52540</guid>
      <dc:creator>pradeep_singh</dc:creator>
      <dc:date>2026-01-28T18:17:19Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract read path from notebooks. Especially from the autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/145594#M52542</link>
      <description>&lt;P&gt;If you are looking to crawl a path and&amp;nbsp;read notebooks you can do that as well with the Databricks SDK to connect to list all objects and export notebooks and use pattern matching to extract the file path for each autoloader stream.&lt;BR /&gt;&lt;BR /&gt;&lt;BR /&gt;&lt;/P&gt;</description>
      <pubDate>Wed, 28 Jan 2026 18:23:56 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/145594#M52542</guid>
      <dc:creator>pradeep_singh</dc:creator>
      <dc:date>2026-01-28T18:23:56Z</dc:date>
    </item>
    <item>
      <title>Re: How to extract read path from notebooks. Especially from the autoloader</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/150164#M53281</link>
      <description>&lt;P&gt;Hi &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/210036"&gt;@ajay_wavicle&lt;/a&gt;,&lt;/P&gt;
&lt;P&gt;There are several approaches you can use to extract read/source paths from notebooks, including Auto Loader (cloudFiles) paths. The right choice depends on whether you want runtime lineage data or static code analysis.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 1: UNITY CATALOG LINEAGE SYSTEM TABLES (RECOMMENDED)&lt;/P&gt;
&lt;P&gt;If your workloads run against Unity Catalog, the lineage system tables automatically capture source and target paths at runtime, including cloud storage paths used by Auto Loader.&lt;/P&gt;
&lt;P&gt;The two key tables are:&lt;/P&gt;
&lt;P&gt;system.access.table_lineage&lt;BR /&gt;system.access.column_lineage&lt;/P&gt;
&lt;P&gt;Both tables include a source_path column that captures cloud storage URIs (s3://, abfss://, gs://, etc.) and a source_type column that can be TABLE or PATH. They also include entity_type (NOTEBOOK, JOB, PIPELINE, etc.) and entity_id so you can trace which notebook performed the read.&lt;/P&gt;
&lt;P&gt;Example query to find all source paths read by notebooks:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;entity_type,&lt;BR /&gt;entity_id,&lt;BR /&gt;source_table_full_name,&lt;BR /&gt;source_path,&lt;BR /&gt;source_type,&lt;BR /&gt;event_time&lt;BR /&gt;FROM system.access.table_lineage&lt;BR /&gt;WHERE source_type IS NOT NULL&lt;BR /&gt;AND target_type IS NULL&lt;BR /&gt;AND entity_type = 'NOTEBOOK'&lt;BR /&gt;ORDER BY event_time DESC&lt;/P&gt;
&lt;P&gt;To find Auto Loader reads specifically, filter for PATH source types with cloud storage prefixes:&lt;/P&gt;
&lt;P&gt;SELECT&lt;BR /&gt;entity_id,&lt;BR /&gt;source_path,&lt;BR /&gt;event_time&lt;BR /&gt;FROM system.access.table_lineage&lt;BR /&gt;WHERE source_type = 'PATH'&lt;BR /&gt;AND entity_type IN ('NOTEBOOK', 'PIPELINE')&lt;BR /&gt;ORDER BY event_time DESC&lt;/P&gt;
&lt;P&gt;Notes on lineage system tables:&lt;BR /&gt;- Lineage data is retained for one year on a rolling basis&lt;BR /&gt;- Lineage is captured across all workspaces attached to a Unity Catalog metastore&lt;BR /&gt;- You need to enable system tables if you have not already&lt;/P&gt;
&lt;P&gt;Documentation: &lt;A href="https://docs.databricks.com/aws/en/admin/system-tables/lineage" target="_blank"&gt;https://docs.databricks.com/aws/en/admin/system-tables/lineage&lt;/A&gt;&lt;BR /&gt;Documentation: &lt;A href="https://docs.databricks.com/aws/en/data-governance/unity-catalog/data-lineage" target="_blank"&gt;https://docs.databricks.com/aws/en/data-governance/unity-catalog/data-lineage&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 2: EXPORT AND PARSE NOTEBOOK SOURCE CODE WITH THE DATABRICKS SDK&lt;/P&gt;
&lt;P&gt;If you need to extract paths from the notebook code itself (static analysis), you can use the Databricks SDK for Python to export notebooks and then parse the source code for read statements and cloudFiles paths.&lt;/P&gt;
&lt;P&gt;Step 1: Install the SDK&lt;/P&gt;
&lt;P&gt;pip install databricks-sdk&lt;/P&gt;
&lt;P&gt;Step 2: List and export notebooks&lt;/P&gt;
&lt;P&gt;from databricks.sdk import WorkspaceClient&lt;BR /&gt;import base64&lt;BR /&gt;import re&lt;/P&gt;
&lt;P&gt;w = WorkspaceClient()&lt;/P&gt;
&lt;P&gt;# List all notebooks in a folder&lt;BR /&gt;notebooks = w.workspace.list("/Users/your_folder/")&lt;/P&gt;
&lt;P&gt;for item in notebooks:&lt;BR /&gt;if item.object_type.name == "NOTEBOOK":&lt;BR /&gt;# Export the notebook source&lt;BR /&gt;export_response = w.workspace.export(&lt;BR /&gt;path=item.path,&lt;BR /&gt;format="SOURCE"&lt;BR /&gt;)&lt;BR /&gt;source_code = base64.b64decode(export_response.content).decode("utf-8")&lt;/P&gt;
&lt;P&gt;# Search for Auto Loader (cloudFiles) paths&lt;BR /&gt;autoloader_paths = re.findall(&lt;BR /&gt;r'\.load\(\s*["\']([^"\']+)["\']',&lt;BR /&gt;source_code&lt;BR /&gt;)&lt;BR /&gt;cloudfiles_paths = re.findall(&lt;BR /&gt;r'cloudFiles\.path["\'\s,)]*["\']([^"\']+)["\']',&lt;BR /&gt;source_code&lt;BR /&gt;)&lt;BR /&gt;# Also search for option-based cloudFiles path&lt;BR /&gt;option_paths = re.findall(&lt;BR /&gt;r'\.option\(\s*["\']cloudFiles\.path["\']\s*,\s*["\']([^"\']+)["\']',&lt;BR /&gt;source_code&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;# Search for general spark.read paths&lt;BR /&gt;read_paths = re.findall(&lt;BR /&gt;r'spark\.read[^)]*\.(csv|parquet|json|orc|delta|format)\([^)]*\)\s*\.load\(\s*["\']([^"\']+)["\']',&lt;BR /&gt;source_code,&lt;BR /&gt;re.DOTALL&lt;BR /&gt;)&lt;/P&gt;
&lt;P&gt;if autoloader_paths or cloudfiles_paths or option_paths or read_paths:&lt;BR /&gt;print(f"Notebook: {item.path}")&lt;BR /&gt;for p in autoloader_paths + cloudfiles_paths + option_paths:&lt;BR /&gt;print(f" Path found: {p}")&lt;BR /&gt;for fmt, p in read_paths:&lt;BR /&gt;print(f" Read path ({fmt}): {p}")&lt;/P&gt;
&lt;P&gt;Step 3: For more robust parsing of Auto Loader specifically, look for the common patterns:&lt;/P&gt;
&lt;P&gt;# Pattern 1: readStream with cloudFiles format and load path&lt;BR /&gt;# spark.readStream.format("cloudFiles").option(...).load("s3://bucket/path")&lt;/P&gt;
&lt;P&gt;# Pattern 2: cloudFiles.path option&lt;BR /&gt;# .option("cloudFiles.path", "s3://bucket/path")&lt;/P&gt;
&lt;P&gt;# Pattern 3: In DLT/SDP notebooks using dlt.read_stream or &lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/97035"&gt;@Dlt&lt;/a&gt;.table&lt;BR /&gt;# spark.readStream.format("cloudFiles").load("/mnt/data/input")&lt;/P&gt;
&lt;P&gt;Documentation: &lt;A href="https://docs.databricks.com/aws/en/dev-tools/sdk-python.html" target="_blank"&gt;https://docs.databricks.com/aws/en/dev-tools/sdk-python.html&lt;/A&gt;&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 3: QUERY THE PIPELINES API FOR LAKEFLOW SPARK DECLARATIVE PIPELINES (SDP)&lt;/P&gt;
&lt;P&gt;If your Auto Loader code runs inside Lakeflow Spark Declarative Pipelines (SDP), the pipeline definition and settings contain configuration that may reference source paths. You can retrieve this via the SDK:&lt;/P&gt;
&lt;P&gt;from databricks.sdk import WorkspaceClient&lt;/P&gt;
&lt;P&gt;w = WorkspaceClient()&lt;/P&gt;
&lt;P&gt;# List all pipelines&lt;BR /&gt;pipelines = w.pipelines.list_pipelines()&lt;/P&gt;
&lt;P&gt;for pipeline in pipelines:&lt;BR /&gt;detail = w.pipelines.get(pipeline_id=pipeline.pipeline_id)&lt;BR /&gt;print(f"Pipeline: {detail.name}")&lt;BR /&gt;print(f" Libraries: {detail.spec.libraries}")&lt;BR /&gt;print(f" Configuration: {detail.spec.configuration}")&lt;/P&gt;
&lt;P&gt;The configuration dictionary often contains cloudFiles.path or other source path settings that are passed into the pipeline notebooks.&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;APPROACH 4: COMBINE WITH THE DATABRICKS CLI&lt;/P&gt;
&lt;P&gt;You can also use the Databricks CLI to export notebooks in bulk for parsing:&lt;/P&gt;
&lt;P&gt;databricks workspace export-dir /Users/ ./exported_notebooks --overwrite&lt;/P&gt;
&lt;P&gt;Then use standard text search tools (grep, ripgrep, etc.) to find read paths:&lt;/P&gt;
&lt;P&gt;grep -rn "cloudFiles" ./exported_notebooks/&lt;BR /&gt;grep -rn "\.load(" ./exported_notebooks/&lt;BR /&gt;grep -rn "spark\.read" ./exported_notebooks/&lt;/P&gt;
&lt;P&gt;&lt;BR /&gt;SUMMARY&lt;/P&gt;
&lt;P&gt;For runtime lineage (what paths were actually read): use system.access.table_lineage, which captures cloud storage paths including Auto Loader sources.&lt;/P&gt;
&lt;P&gt;For static code analysis (what paths are in the code): use the Databricks SDK to export notebook source and parse with regex or AST tools.&lt;/P&gt;
&lt;P&gt;For pipeline-specific paths: query the Pipelines API for configuration values.&lt;/P&gt;
&lt;P&gt;The lineage system tables approach is typically the most reliable because it captures what actually executed, rather than what the code text contains (which may use variables, widgets, or dynamic path construction).&lt;/P&gt;
&lt;P&gt;* This reply used an agent system I built to research and draft this response based on the wide set of documentation I have available and previous memory. I personally review the draft for any obvious issues and for monitoring system reliability and update it when I detect any drift, but there is still a small chance that something is inaccurate, especially if you are experimenting with brand new features.&lt;/P&gt;</description>
      <pubDate>Sun, 08 Mar 2026 07:19:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-extract-read-path-from-notebooks-especially-from-the/m-p/150164#M53281</guid>
      <dc:creator>SteveOstrowski</dc:creator>
      <dc:date>2026-03-08T07:19:53Z</dc:date>
    </item>
  </channel>
</rss>

