CREATE view USING json and *include* _metadata, _rescued_data
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-18-2024 09:06 AM
Title may be self-explanatory. Basically, I'm curious to ask if it's possible (and if so how) to add `_metadata` and `_rescued_data` fields to a view "using json".
e.g.
%sql
CREATE OR REPLACE VIEW entity_view
USING json
OPTIONS (path="/.../.*json",multiline=true)
Below I show it's possible to do this with `read_files` in Spark SQL:
Below I show an error trying to query these fields.. not sure if I'm doing something wrong:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-18-2024 09:08 AM
Forgot to add this as reference:
The JSON Files documentation doesn't seem to show any OPTIONS that may enable the functionality that I'm looking for.. But please feel free to correct me if there's a way to achieve what I'm looking for that I may be overlooking!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-21-2024 03:03 AM
Hi @ChristianRRL , I'll be looking into this, and I'll get back to you with an answer
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2024 02:02 AM
Hi @ChristianRRL , as a first quick look, could you please try to create a PySpark dataframe with the _metadata and _rescued_data columns, query the dataframe to make sure you can see those columns, and then create a view using this dataframe?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
11-22-2024 07:52 AM - edited 11-22-2024 07:59 AM
Hi @ChristianRRL You can still use the same method read_files when creating the view, I see that you are using classic hive style reader instead of using the read_files in the actual view definition of sql and you don't need to use spark.sql, please see below.
Example code:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
01-09-2025 11:26 PM
SELECT *,_metadata.file_name FROM anytable where condition.
https://docs.databricks.com/en/ingestion/file-metadata-column.html
You can use something like
df = spark.read \ .format("json") \ .schema(schema) \ .load("dbfs:/tmp/*") \ .select("*", "_metadata") display(df)
. You can enable the rescued data column by setting the option rescuedDataColumn
to a column name, such as _rescued_data
with spark.read.option("rescuedDataColumn", "_rescued_data").format("json").load(<path>)
.
https://docs.databricks.com/en/query/formats/json.html#rescued-data-column

