01-17-2024 08:27 PM
If this isn't the right spot to post this, please move it or refer me to the right area.
I recently learned about the "_metadata.file_name". It's not quite what I need.
I'm creating a new table in DataBricks and want to add a USR_File_Name column containing the filenames of the uploaded/imported files. For example, I'm loading a table with a bunch of files with a naming scheme like "HT00114_DXLS_PROD_20240102_00001.txt". I need to keep the filename as a field value in my table.
I also want to create a USR_File_Create_Date column where I add the '20240102' from the filename as a value.
Can anyone direct me to the right info for this?
01-18-2024 11:15 AM
Hi, Could you please elaborate more on the expectation here?
01-22-2024 10:28 AM
I have files with file names like "HT00114_DXLS_PROD_20240102_00001.txt" where the 2024xxxx is the date the file was created. I want to load the file from my hard drive to a table on my sandbox-type space on our DataBricks data lake as a SQL DB table. I have a "Data Ingestion" option available, but no choices within that (that I can see) to add the file name as a column (field value). I'm wondering how I can accomplish adding other details. In an SSIS package, I'd be able to specify other formulas and values for additional fields to accompany my dataset on the upload. I'm hoping I can accomplish the same thing here.
01-22-2024 03:09 PM
@joeyslaptop - While loading the file and before creating the delta table. you can use input_file_name function and add file name as a column. Below is a sample code. Please try and let us know.
from pyspark.sql.functions import input_file_name
df = spark.read.format("csv").option("header","true").option("inferSchema","true").load("<file-directory path>")
df_with_filename = df.withColumn("filename", input_file_name())
display(df_with_filename)
01-25-2024 11:08 AM - edited 01-25-2024 11:12 AM
Hi, Shan_Chandra.
Thanks for the suggestion. The method I'm using to import my file is the Data Ingestion>Data Sources> From Local Files> Create or Modify Table.
It doesn't provide a place for code or to specify column values outside of what's found in my csv file. What method should I be using to import data? Maybe my permissions are limited? I don't know yet.
I'm coming from MS SQL where I could create the SSIS package via the import wizard, and then modify the SSIS code in the wizard or in the saved SSIS file. I'm still new to DataBricks and only have experience so far of querying and of importing data via the automated tool.
01-25-2024 11:55 AM
@joeyslaptop - could you please try the following?
1. click on upload to DBFS
2. Upload the file from local to a DBFS location
3. copy the location where the file is uploaded. Now, click on the DBFS tab, to create a table using Notebook will open up a pre-built notebook with sample code that allow you to load the file directly in to a dataframe.
4. Edit the below notebook to introduce input_file_name() method using the sample code snippet shared.
Hope this helps.
Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.
If there isn’t a group near you, start one and help create a community that brings people together.
Request a New Group