Databricks Community

yubin-apollo · ‎11-30-2022

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or off, mergeSchema on or off. I think I tried most cases I can think of, and any of these does not skip the first `n` rows. Has anybody experience this issue?

COPY INTO Documentation: https://docs.databricks.com/sql/language-manual/delta-copy-into.html

UmaMahesh1 · ‎11-30-2022

Hi @Yubin Park

Can you write down the statement you are using to copy the data.

Also, after copying, did you check the record count between source and target and find that records are not skipped ?

Uma Mahesh D

yubin-apollo · ‎12-04-2022

Thanks for the reply, @UmaMahesh. Here is the code I tried:

COPY INTO public_data.nppes_other_names_brnz
FROM (SELECT _c0::STRING NPI,
  _c1::STRING Provider_Other_Organization_Name
  FROM 's3://...')
FILEFORMAT = CSV
FORMAT_OPTIONS = ('skipRows' = '1');

I checked the number of records, and yes, it did not skip. Thanks for your help in advance!

karthik-kobai · ‎03-25-2024

@yubin-apollo: Hello, there. I was testing the copy into command with the skipRows option and it doesn't seem to work. Were you able to figure out the right way or is it a bug? Thanks.

Here is the query for reference

COPY INTO dummynamespace.dummytable
  FROM 
  'azure blob xxx'
  WITH ( CREDENTIAL  (AZURE_SAS_TOKEN = 'zzz')  )
  FILEFORMAT = CSV
  PATTERN = 'Employees_*.csv' 
  FORMAT_OPTIONS ('mergeSchema' = 'true',
                  'delimiter' = ',',
                  'header'='true'
                  )
  COPY_OPTIONS ('mergeSchema' = 'true',
  'skipRows' = '3')