input_file_name() not supported in Unity Catalog
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-18-2022 11:54 PM
Hey, so our notebooks reading a bunch of json files from storage typically use a input_file_name() when moving from raw to bronze, but after upgrading to Unity Catalog we get an error message:
AnalysisException: [UC_COMMAND_NOT_SUPPORTED] input_file_name is not supported in Unity Catalog.;
Why is this? Are we following a bad practice by wanting to have filenames for tracing data through the different storage layers? Does Unity Catalog perhaps do this automatically in some way? Just barely started testing Unity Catalog, so struggling a bit to grasp what the differences are. I thought it was merely a tool that did some stuff automatically (lineage etc) and gave us a simple metastore to interact with.
- Labels:
-
Data Engineering
-
Unity Catalog
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2022 04:01 AM
Hi @Espen Solvang,
Thanks for reaching out to us. Python UDF / UDAFs or Pandas UDFS are currently not supported in Shared Unity Catalog clusters. Instead, please change the mode to "Single User". This should support input_file_name.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-19-2022 11:04 PM
Thanks a lot for your response. I'll give it a try and get back to you.
Not sure I understand that input_file_name() is an UDF - I didn't write it myself, it's imported from pyspark. I guess what you are saying is that it still is an UDF, please correct me if I'm wrong.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-29-2023 10:22 PM
I can't answer the question of why input_file_name() doesn't work with the unity catalog, but I did manage to find a workaround, using the file metadata.
You can basically query the _metadata field, which will give you a json string with file path, name, size and modified datetime. So something like this should work;
select. _metadata['file_name'], *
from my_catalog.my_schema.my_table
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2023 04:43 PM - edited 07-18-2023 04:44 PM
this wont work if you are creating a table for the first time from the stream, for example the code below when running for the first time. I need a way to capture the file name going in the stream
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
05-12-2023 03:50 AM
Hi @Cedric Law Hing Ping,
Are there any plans to support input_file_name in Unity Catalog? I'm using Unity Catalog in Delta Live Tables (DLT), which is in preview, and would like to let DLT handle what cluster is used and still be able to use input_file_name for traceability.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-18-2023 04:20 PM
I have to say that I ran into these undocumented restrictions multiple times with the shared instance and it's annoying.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-16-2023 08:21 PM
I had the similar issue with the Unity Catalog updrage, found the following solution working, based on the documentation -
https://learn.microsoft.com/en-us/azure/databricks/sql/language-manual/functions/input_file_name
.selectExpr("*", "_metadata as source_metadata")
# .withColumn('file_path', input_file_name())
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
12-22-2023 12:03 PM
Will work for spark.read to get the file name, or:
To get the whole file path
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
03-03-2024 12:13 AM
This worked perfectly for me. The error message mentioned _metadata.file_path as an alternative to input_file_name, but it wasn't clear how to reference it. Thanks for making it clear that its technically a column that's available. I'm going to explore what else is available in _metadata. This should be marked as the solution in my opinion. Thanks again.

