Databricks

yubin-apollo · ‎11-30-2022

Based on the COPY INTO documentation, it seems I can use `skipRows` to skip the first `n` rows. I am trying to load a CSV file where I need to skip a few first rows in the file. I have tried various combinations, e.g. setting header parameter on or off, mergeSchema on or off. I think I tried most cases I can think of, and any of these does not skip the first `n` rows. Has anybody experience this issue?

COPY INTO Documentation: https://docs.databricks.com/sql/language-manual/delta-copy-into.html

UmaMahesh1 · ‎11-30-2022

Hi @Yubin Park

Can you write down the statement you are using to copy the data.

Also, after copying, did you check the record count between source and target and find that records are not skipped ?

yubin-apollo · ‎12-04-2022

Thanks for the reply, @UmaMahesh. Here is the code I tried:

COPY INTO public_data.nppes_other_names_brnz
FROM (SELECT _c0::STRING NPI,
  _c1::STRING Provider_Other_Organization_Name
  FROM 's3://...')
FILEFORMAT = CSV
FORMAT_OPTIONS = ('skipRows' = '1');

I checked the number of records, and yes, it did not skip. Thanks for your help in advance!

karthik-kobai · a month ago

@yubin-apollo: Hello, there. I was testing the copy into command with the skipRows option and it doesn't seem to work. Were you able to figure out the right way or is it a bug? Thanks.

Here is the query for reference

COPY INTO dummynamespace.dummytable
  FROM 
  'azure blob xxx'
  WITH ( CREDENTIAL  (AZURE_SAS_TOKEN = 'zzz')  )
  FILEFORMAT = CSV
  PATTERN = 'Employees_*.csv' 
  FORMAT_OPTIONS ('mergeSchema' = 'true',
                  'delimiter' = ',',
                  'header'='true'
                  )
  COPY_OPTIONS ('mergeSchema' = 'true',
  'skipRows' = '3')

karthik-kobai · a month ago

@yubin-apollo: My bad - I had the skipRows in the COPY_OPTIONS and not in the FORMAT_OPTIONS. It works, please ignore my previous comment. Thanks

Databricks

COPY INTO skipRows FORMAT_OPTIONS does not work

How to successfully build GenAI applications

Registration now open! Databricks Data + AI Summit 2024

Meet DBRX, the New Standard for High-Quality LLMs

Data Warehousing in the Era of AI