databricks autoloader source files

seefoods
Valued Contributor

Hello, 
How can handle this error when we use autoloader with spark.readStream 
(com.databricks.sql.cloudfiles.errors.CloudFilesException) [CF_EMPTY_DIR_FOR_SCHEMA_INFERENCE] Cannot infer schema when the input path `/Volumes/default/landing/source/bundle/bundle_test/` is empty. Please try to start the stream when there are files in the input path, or specify the schema. SQLSTATE: 42000  

Ashwin_DSA
Databricks Employee
Databricks Employee

Hi @seefoods,

The error message seems to indicate there are no files in the source path?

You can either define the schema yourself and pass it to schema(...) so Auto Loader doesn’t need to infer anything.. and as soon as files arrive, the stream will start processing without needing any files to exist at start-up (or) if you really want Auto Loader to infer the schema then make sure there is at least one file in the source path even if that is just a sample file. Otherwise, you will continue getting this error.

If this answer resolves your question, could you mark it as “Accept as Solution”? That helps other users quickly find the correct fix.

Regards,
Ashwin | Delivery Solution Architect @ Databricks
Helping you build and scale the Data Intelligence Platform.
***Opinions are my own***

View solution in original post