cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

COPY INTO skipRows FORMAT_OPTIONS does not work

yubin-apollo
New Contributor II

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or off, mergeSchema on or off. I think I tried most cases I can think of, and any of these does not skip the first `n` rows. Has anybody experience this issue?

COPY INTO Documentation: https://docs.databricks.com/sql/language-manual/delta-copy-into.html

4 REPLIES 4

UmaMahesh1
Honored Contributor III

Hi @Yubin Park​ 

Can you write down the statement you are using to copy the data.

Also, after copying, did you check the record count between source and target and find that records are not skipped ?

Uma Mahesh D

Thanks for the reply, @UmaMahesh. Here is the code I tried:

COPY INTO public_data.nppes_other_names_brnz
FROM (SELECT _c0::STRING NPI,
  _c1::STRING Provider_Other_Organization_Name
  FROM 's3://...')
FILEFORMAT = CSV
FORMAT_OPTIONS = ('skipRows' = '1');

I checked the number of records, and yes, it did not skip. Thanks for your help in advance!

karthik-kobai
New Contributor II

@yubin-apollo: Hello, there. I was testing the copy into command with the skipRows option and it doesn't seem to work. Were you able to figure out the right way or is it a bug? Thanks.

Here is the query for reference

COPY INTO dummynamespace.dummytable
  FROM 
  'azure blob xxx'
  WITH ( CREDENTIAL  (AZURE_SAS_TOKEN = 'zzz')  )
  FILEFORMAT = CSV
  PATTERN = 'Employees_*.csv' 
  FORMAT_OPTIONS ('mergeSchema' = 'true',
                  'delimiter' = ',',
                  'header'='true'
                  )
  COPY_OPTIONS ('mergeSchema' = 'true',
  'skipRows' = '3')



 

karthik-kobai
New Contributor II

@yubin-apollo: My bad - I had the skipRows in the COPY_OPTIONS and not in the FORMAT_OPTIONS. It works, please ignore my previous comment. Thanks

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group