I am experimenting with DLTs/Autoloader. I have a simple, flat JSON file that I am attempting to load into a DLT (following this guide) like so:
CREATE OR REFRESH STREAMING LIVE TABLE statistics_live
COMMENT "The raw statistics data"
TBLPROPERTIES ("quality" = "bronze")
AS SELECT * FROM cloud_files("/mnt/raw/statistics/", "json", map("cloudFiles.inferColumnTypes", "true"));
The error message I am getting is: com.databricks.sql.cloudfiles.errors.CloudFilesAnalysisException: Failed to infer schema for format json from existing files in input path /mnt/raw/statistics/. Please ensure you configured the options properly or explicitly specify the schema.
The JSON file looks like this:
[
{
"pass": 26,
"rush": 5,
"total_return": 1,
"total": 32,
"fumble_return": 0,
"int_return": 1,
"kick_return": 0,
"punt_return": 0,
"other": 0
}
]
I've seen a lot of "answers" out there saying to just specify the schema but if I expect my schema to change over time that is not an option.
EDIT: Interestingly enough, I moved on to generating the full JSON file and storing it in our cloud storage rather than working with a partial file. The fully generated file was inferred correctly when I triggered the autoloader pipeline, complex child JSON properties and all. I guess I'll leave the question up though because I have no clue why the partial file was throwing exceptions at me.