cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Use of badRecordsPath in COPY INTO SQL command

bradm0
New Contributor III

I'm trying to use the badRecordsPath to catch improperly formed records in a CSV file and continue loading the remainder of the file. I can get the option to work using python like this

df = spark.read\
 .format("csv")\
 .option("header","true")\
 .option("badRecordsPath","wasbs://user@user.blob.core.windows.net/badRecords/")\
 .schema("col1 integer,col2 integer")\
 .load("wasbs://user@user.blob.core.windows.net/*.csv")

But when I try to use it in the COPY INTO SQL command I can't get it to work. The record comes in with PERMISSIVE mode and nulls are put in place of the bad data. I've tried putting the badRecordsPath clause in both the format and copy options and get the same result.

copy into my_db.t2
from (SELECT cast(col1 AS INTEGER) AS `col1`,cast(col2 AS INTEGER) AS `col2` from 'wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
format_options ('header' = 'true','sep' = ',','badRecordsPath' = 'wasbs://user@user.blob.core.windows.net/badRecords')
 
copy into my_db.t2
from (SELECT cast(col1 AS INTEGER) AS `col1`,cast(col2 AS INTEGER) AS `col2` from 'wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
format_options ('header' = 'true','sep' = ',')
copy_options ('badRecordsPath' = 'user@user.blob.core.windows.net/badRecords')

Here's the version of Databricks I'm using

{"dbr_version": "11.2.x-scala2.12", "dbsql_version": null, "u_build_hash": "38130494fa061680c589459fb2a35b344ea1a814", "r_build_hash": "c45f53067ddc71ffe1ee06997bad04b3c7143650"}

 Appreciate any guidance. Thanks.

1 ACCEPTED SOLUTION

Accepted Solutions

Hubert-Dudek
Esteemed Contributor III

I think the problem is that it is inside SELECT, so functionality in SQL is lost as it will generate a SELECT statement first.

Please try to remove select, and you can use inferSchema to avoid CAST.

copy into my_db.t2
from ('wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
FORMAT_OPTIONS ('header' = 'true',
                                 'sep' = ',',
                                 'inferSchema' = 'true',
                                 'badRecordsPath' = 'wasbs://user@user.blob.core.windows.net/badRecords')

View solution in original post

3 REPLIES 3

Hubert-Dudek
Esteemed Contributor III

I think the problem is that it is inside SELECT, so functionality in SQL is lost as it will generate a SELECT statement first.

Please try to remove select, and you can use inferSchema to avoid CAST.

copy into my_db.t2
from ('wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
FORMAT_OPTIONS ('header' = 'true',
                                 'sep' = ',',
                                 'inferSchema' = 'true',
                                 'badRecordsPath' = 'wasbs://user@user.blob.core.windows.net/badRecords')

bradm0
New Contributor III

Thanks. It was the inferSchema setting. I tried it with and without the SELECT and it worked both ways when I added inferSchema

Both of these worked

drop table my_db.t2;
create table my_db.t2 (col1 int,col2 int);
copy into my_db.t2
from (SELECT cast(col1 AS INTEGER) AS `col1`,cast(col2 AS INTEGER) AS `col2` from 'wasbs://user@user.blob.core.windows.net/*.csv')
fileformat=csv
format_options ('header' = 'true','sep' = ',','inferSchema' = 'true' ,'badRecordsPath' = 'wasbs://user@user.blob.core.windows.net/badRecords')
 
 
drop table my_db.t2;
create table my_db.t2 (col1 int,col2 int);
copy into my_db.t2
from  'wasbs://user@user.blob.core.windows.net/*.csv'
fileformat=csv
format_options ('header' = 'true','sep' = ',','inferSchema' = 'true' ,'badRecordsPath' = 'wasbs://user@user.blob.core.windows.net/badRecords')

Meghala
Valued Contributor II

It's an very informative

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.