I'm trying to use the badRecordsPath to catch improperly formed records in a CSV file and continue loading the remainder of the file. I can get the option to work using python like this
df = spark.read\
.format("csv")\
.option("header","true")\
.option("badRecordsPath","wasbs://user@user.blob.core.windows.net/badRecords/")\
.schema("col1 integer,col2 integer")\
.load("wasbs://user@user.blob.core.windows.net/*.csv")
But when I try to use it in the COPY INTO SQL command I can't get it to work. The record comes in with PERMISSIVE mode and nulls are put in place of the bad data. I've tried putting the badRecordsPath clause in both the format and copy options and get the same result.
copy into my_db.t2
from (SELECT cast(col1 AS INTEGER) AS `col1`,cast(col2 AS INTEGER) AS `col2` from 'wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
format_options ('header' = 'true','sep' = ',','badRecordsPath' = 'wasbs://user@user.blob.core.windows.net/badRecords')
copy into my_db.t2
from (SELECT cast(col1 AS INTEGER) AS `col1`,cast(col2 AS INTEGER) AS `col2` from 'wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
format_options ('header' = 'true','sep' = ',')
copy_options ('badRecordsPath' = 'user@user.blob.core.windows.net/badRecords')
Here's the version of Databricks I'm using
{"dbr_version": "11.2.x-scala2.12", "dbsql_version": null, "u_build_hash": "38130494fa061680c589459fb2a35b344ea1a814", "r_build_hash": "c45f53067ddc71ffe1ee06997bad04b3c7143650"}
Appreciate any guidance. Thanks.