topic Re: CREATE view USING json and include _metadata, _rescued_data in Data Engineering

CREATE view USING json and include _metadata, _rescued_data

ChristianRRL — Mon, 18 Nov 2024 17:06:32 GMT

Title may be self-explanatory. Basically, I'm curious to ask if it's possible (and if so how) to add `_metadata` and `_rescued_data` fields to a view "using json".

e.g.

%sql CREATE OR REPLACE VIEW entity_view USING json OPTIONS (path="/.../.*json",multiline=true)

Below I show it's possible to do this with `read_files` in Spark SQL:

Below I show an error trying to query these fields.. not sure if I'm doing something wrong:

Re: CREATE view USING json and include _metadata, _rescued_data

ChristianRRL — Mon, 18 Nov 2024 17:08:37 GMT

Forgot to add this as reference:

JSON Files - Spark 3.5.3 Documentation

The JSON Files documentation doesn't seem to show any OPTIONS that may enable the functionality that I'm looking for.. But please feel free to correct me if there's a way to achieve what I'm looking for that I may be overlooking!

Re: CREATE view USING json and include _metadata, _rescued_data

Nam_Nguyen — Thu, 21 Nov 2024 11:03:56 GMT

Hi @ChristianRRL , I'll be looking into this, and I'll get back to you with an answer

Re: CREATE view USING json and include _metadata, _rescued_data

Nam_Nguyen — Fri, 22 Nov 2024 10:02:50 GMT

Hi @ChristianRRL , as a first quick look, could you please try to create a PySpark dataframe with the _metadata and _rescued_data columns, query the dataframe to make sure you can see those columns, and then create a view using this dataframe?

Re: CREATE view USING json and include _metadata, _rescued_data

akhil393 — Fri, 22 Nov 2024 15:59:11 GMT

Hi @ChristianRRL You can still use the same method read_files when creating the view, I see that you are using classic hive style reader instead of using the read_files in the actual view definition of sql and you don't need to use spark.sql, please see below.

Example code:

create or replace view json_view

SELECT _metadata, *

FROM read_files(

's3://********/_delta_log/*.json',

format => 'json')

Re: CREATE view USING json and include _metadata, _rescued_data

NandiniN — Fri, 10 Jan 2025 07:26:36 GMT

I am able to perform the below operation for a delta table.

SELECT *,_metadata.file_name FROM anytable where condition.

https://docs.databricks.com/en/ingestion/file-metadata-column.html

You can use something like

df = spark.read \
  .format("json") \
  .schema(schema) \
  .load("dbfs:/tmp/*") \
  .select("*", "_metadata")

display(df)

. You can enable the rescued data column by setting the option rescuedDataColumn to a column name, such as _rescued_data with spark.read.option("rescuedDataColumn", "_rescued_data").format("json").load(<path>).

https://docs.databricks.com/en/query/formats/json.html#rescued-data-column

topic Re: CREATE view USING json and *include* _metadata, _rescued_data in Data Engineering

CREATE view USING json and *include* _metadata, _rescued_data

Re: CREATE view USING json and *include* _metadata, _rescued_data

Re: CREATE view USING json and *include* _metadata, _rescued_data

Re: CREATE view USING json and *include* _metadata, _rescued_data

Re: CREATE view USING json and *include* _metadata, _rescued_data

Re: CREATE view USING json and *include* _metadata, _rescued_data

topic Re: CREATE view USING json and include _metadata, _rescued_data in Data Engineering

CREATE view USING json and include _metadata, _rescued_data

Re: CREATE view USING json and include _metadata, _rescued_data

Re: CREATE view USING json and include _metadata, _rescued_data

Re: CREATE view USING json and include _metadata, _rescued_data

Re: CREATE view USING json and include _metadata, _rescued_data

Re: CREATE view USING json and include _metadata, _rescued_data