I'm trying to create delta live table on top of json files placed in azure blob. The json files contains white spaces in column names instead of renaming I tried `columnMapping` table property which let me create the table with spaces but the column partitioning of column `date` created was not as expected meaning it is supposed to create partitions like `dates=2022-01-01` instead it created bunch of directories like `AA`, `AB`, `CL` ... etc and parquet files inside it.
SET `spark.sql.legacy.timeParserPolicy`=`LEGACY`
CREATE OR REFRESH STREAMING LIVE TABLE abcdata
USING DELTA
TBLPROPERTIES ("quality" = "bronze",
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5',
'delta.columnMapping.mode' = 'name')
LOCATION "dbfs:/mnt/abcdata"
AS SELECT
*,
input_file_name() AS source_file_name
FROM cloud_files("dbfs:/json/data", "json",map("cloudFiles.inferColumnTypes", "true","cloudFiles.rescuedDataColumn", "_rescued_data","cloudFiles.schemaEvolutionMode", "rescue","multiLine","true"))
CREATE OR REFRESH live table `abcdatafinal`
USING DELTA
TBLPROPERTIES ("quality" = "silver",
'delta.minReaderVersion' = '2',
'delta.minWriterVersion' = '5',
'delta.columnMapping.mode' = 'name')
LOCATION "dbfs:/mnt/abcdatafinal"
AS
select
*,
to_date(to_timestamp(from_unixtime(unix_timestamp(TimeStamp,'MM/dd/yyyy hh:mm:ss.SSS aa'),'yyyy-MM-dd HH:mm:ss'))) as dates
from live.abcdata;
What am I doing wrong here ?
Also the first streaming live table was not at all partitioned but the table path contains directories like I said above.
Thanks in Advance!