cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

COPY INTO skipRows FORMAT_OPTIONS does not work

yubin-apollo
New Contributor II

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or off, mergeSchema on or off. I think I tried most cases I can think of, and any of these does not skip the first `n` rows. Has anybody experience this issue?

COPY INTO Documentation: https://docs.databricks.com/sql/language-manual/delta-copy-into.html

4 REPLIES 4

UmaMahesh1
Honored Contributor III

Hi @Yubin Park​ 

Can you write down the statement you are using to copy the data.

Also, after copying, did you check the record count between source and target and find that records are not skipped ?

Thanks for the reply, @UmaMahesh. Here is the code I tried:

COPY INTO public_data.nppes_other_names_brnz
FROM (SELECT _c0::STRING NPI,
  _c1::STRING Provider_Other_Organization_Name
  FROM 's3://...')
FILEFORMAT = CSV
FORMAT_OPTIONS = ('skipRows' = '1');

I checked the number of records, and yes, it did not skip. Thanks for your help in advance!

karthik-kobai
New Contributor II

@yubin-apollo: Hello, there. I was testing the copy into command with the skipRows option and it doesn't seem to work. Were you able to figure out the right way or is it a bug? Thanks.

Here is the query for reference

COPY INTO dummynamespace.dummytable
  FROM 
  'azure blob xxx'
  WITH ( CREDENTIAL  (AZURE_SAS_TOKEN = 'zzz')  )
  FILEFORMAT = CSV
  PATTERN = 'Employees_*.csv' 
  FORMAT_OPTIONS ('mergeSchema' = 'true',
                  'delimiter' = ',',
                  'header'='true'
                  )
  COPY_OPTIONS ('mergeSchema' = 'true',
  'skipRows' = '3')



 

karthik-kobai
New Contributor II

@yubin-apollo: My bad - I had the skipRows in the COPY_OPTIONS and not in the FORMAT_OPTIONS. It works, please ignore my previous comment. Thanks

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.