<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: How to add a column to a new table containing the original source filenames in DataBricks. in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/125466#M47454</link>
    <description>&lt;P&gt;How to do the same for a bunch of files at once (i.e., wildcard or recursive) without an iteration loop? Thanks.&lt;/P&gt;</description>
    <pubDate>Wed, 16 Jul 2025 15:41:48 GMT</pubDate>
    <dc:creator>just-Vlad</dc:creator>
    <dc:date>2025-07-16T15:41:48Z</dc:date>
    <item>
      <title>How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/57639#M31097</link>
      <description>&lt;P&gt;If this isn't the right spot to post this, please move it or refer me to the right area.&lt;/P&gt;&lt;P&gt;I recently learned about the "_metadata.file_name".&amp;nbsp; It's not quite what I need.&lt;/P&gt;&lt;P&gt;I'm creating a new table in DataBricks and want to add a USR_File_Name column containing the filenames of the uploaded/imported files.&amp;nbsp; For example, I'm loading a table with a bunch of files with a naming scheme like "HT00114_DXLS_PROD_20240102_00001.txt".&amp;nbsp; I need to keep the filename as a field value in my table.&amp;nbsp;&lt;/P&gt;&lt;P&gt;I also want to create a USR_File_Create_Date column where I add the '20240102' from the filename as a value.&lt;/P&gt;&lt;P&gt;Can anyone direct me to the right info for this?&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2024 04:27:50 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/57639#M31097</guid>
      <dc:creator>joeyslaptop</dc:creator>
      <dc:date>2024-01-18T04:27:50Z</dc:date>
    </item>
    <item>
      <title>Re: How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/57820#M31098</link>
      <description>&lt;P&gt;Hi, Could you please elaborate more on the expectation here?&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 18 Jan 2024 19:15:35 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/57820#M31098</guid>
      <dc:creator>Debayan</dc:creator>
      <dc:date>2024-01-18T19:15:35Z</dc:date>
    </item>
    <item>
      <title>Re: How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58199#M31099</link>
      <description>&lt;P&gt;I have files with file names like "&lt;SPAN&gt;HT00114_DXLS_PROD_20240102_00001.txt" where the 2024xxxx is the date the file was created.&amp;nbsp; I want to load the file from my hard drive to a table on my sandbox-type space on our DataBricks data lake as a SQL DB table.&amp;nbsp; I have a "Data Ingestion" option available, but no choices within that (that I can see) to add the file name as a column (field value).&amp;nbsp; I'm wondering how I can accomplish adding other details.&amp;nbsp; In an SSIS package, I'd be able to specify other formulas and values for additional fields to accompany my dataset on the upload.&amp;nbsp; I'm hoping I can accomplish the same thing here.&amp;nbsp;&amp;nbsp;&lt;/SPAN&gt;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jan 2024 18:28:38 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58199#M31099</guid>
      <dc:creator>joeyslaptop</dc:creator>
      <dc:date>2024-01-22T18:28:38Z</dc:date>
    </item>
    <item>
      <title>Re: How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58205#M31100</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98378"&gt;@joeyslaptop&lt;/a&gt;&amp;nbsp; - While loading the file and before creating the delta table. you can use input_file_name function and add file name as a column. Below is a sample code.&amp;nbsp; Please try and let us know.&lt;/P&gt;
&lt;LI-CODE lang="markup"&gt;from pyspark.sql.functions import input_file_name
df = spark.read.format("csv").option("header","true").option("inferSchema","true").load("&amp;lt;file-directory path&amp;gt;")
df_with_filename = df.withColumn("filename", input_file_name())
display(df_with_filename)&lt;/LI-CODE&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Mon, 22 Jan 2024 23:09:05 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58205#M31100</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-01-22T23:09:05Z</dc:date>
    </item>
    <item>
      <title>Re: How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58447#M31144</link>
      <description>&lt;P&gt;Hi, Shan_Chandra.&lt;/P&gt;&lt;P&gt;Thanks for the suggestion.&amp;nbsp; The method I'm using to import my file is the Data Ingestion&amp;gt;Data Sources&amp;gt; From Local Files&amp;gt; Create or Modify Table.&lt;/P&gt;&lt;P&gt;It doesn't provide a place for code or to specify column values outside of what's found in my csv file.&amp;nbsp; What method should I be using to import data?&amp;nbsp; Maybe my permissions are limited?&amp;nbsp; I don't know yet.&lt;/P&gt;&lt;P&gt;I'm coming from MS SQL where I could create the SSIS package via the import wizard, and then modify the SSIS code in the wizard or in the saved SSIS file.&amp;nbsp; I'm still new to DataBricks and only have experience so far of querying and of importing data via the automated tool.&amp;nbsp;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2024 19:12:47 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58447#M31144</guid>
      <dc:creator>joeyslaptop</dc:creator>
      <dc:date>2024-01-25T19:12:47Z</dc:date>
    </item>
    <item>
      <title>Re: How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58452#M31145</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/98378"&gt;@joeyslaptop&lt;/a&gt;&amp;nbsp;- could you please try the following?&lt;/P&gt;
&lt;P&gt;1. click on upload to DBFS&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="click on upload to DBFS" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5981i9577FBD8F6495E47/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-01-25 at 1.41.03 PM.png" alt="click on upload to DBFS" /&gt;&lt;span class="lia-inline-image-caption" onclick="event.preventDefault();"&gt;click on upload to DBFS&lt;/span&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;2. Upload the file from local to a DBFS location&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-01-25 at 1.50.14 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5983iBEF6CF6CAAA8CEF1/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-01-25 at 1.50.14 PM.png" alt="Screenshot 2024-01-25 at 1.50.14 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;3. copy the location where the file is uploaded. Now, click on the DBFS tab, to&amp;nbsp; create a table using Notebook will open up a pre-built notebook with sample code that allow you to load the file directly in to a dataframe.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-01-25 at 1.41.17 PM.png" style="width: 349px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5982iA907E58B07BB8486/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-01-25 at 1.41.17 PM.png" alt="Screenshot 2024-01-25 at 1.41.17 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;4. Edit the below notebook to introduce input_file_name() method using the sample code snippet shared.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;&lt;span class="lia-inline-image-display-wrapper lia-image-align-inline" image-alt="Screenshot 2024-01-25 at 1.52.21 PM.png" style="width: 400px;"&gt;&lt;img src="https://community.databricks.com/t5/image/serverpage/image-id/5984i450D44A76D716DF9/image-size/medium?v=v2&amp;amp;px=400" role="button" title="Screenshot 2024-01-25 at 1.52.21 PM.png" alt="Screenshot 2024-01-25 at 1.52.21 PM.png" /&gt;&lt;/span&gt;&lt;/P&gt;
&lt;P&gt;Hope this helps.&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Thu, 25 Jan 2024 19:55:06 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/58452#M31145</guid>
      <dc:creator>shan_chandra</dc:creator>
      <dc:date>2024-01-25T19:55:06Z</dc:date>
    </item>
    <item>
      <title>Re: How to add a column to a new table containing the original source filenames in DataBricks.</title>
      <link>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/125466#M47454</link>
      <description>&lt;P&gt;How to do the same for a bunch of files at once (i.e., wildcard or recursive) without an iteration loop? Thanks.&lt;/P&gt;</description>
      <pubDate>Wed, 16 Jul 2025 15:41:48 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/how-to-add-a-column-to-a-new-table-containing-the-original/m-p/125466#M47454</guid>
      <dc:creator>just-Vlad</dc:creator>
      <dc:date>2025-07-16T15:41:48Z</dc:date>
    </item>
  </channel>
</rss>

