Marcin_U
New Contributor II

Thanks for the reply @Retired_mod . I have some questions related to you answer.

  1. Checkpoint Location:
    • Does deleteing checkpoint folder (or only files?) mean that next run of AutoLoader will load all files from provided source locations? So it will duplicate data which was alredy loaded to target delta table.
  2. Configure Auto Loader:
    • Am I undestand correctly that InMemoryFileIndex is used for listing files and directories more efficiently but there is no possibility to use it with AutoLoader with cloudFiles?
    • How about to implement process which move (for backup purpose) or delete processed by AutoLoader files? It could resolve problem with long files listing. Is there any feature like this in AutoLoader? In fact I have found "archive_timestamp" column in "cloud_file_state" but it keeps only nulls.
  3. Consider Using Wildcards:
    • It looks like wildcards could resolve my problem. Please confirm that using wildcards create only one soruce in "checkpoint/sources" directory?
      Marcin_U_0-1708616776547.png
    • I wonder why my implementation create new source in "checkpoint/sources" folder after adding new source locations to AutoLoader? Is it due to start run readStream as many times as source locations in
      _al_readStream_from_paths​
      method ?