cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Error when reading delta lake files with Auto Loader

Vladif1
New Contributor II

Hi,

When reading Delta Lake file (created by Auto Loader) with this code: df = (

   spark.readStream

   .format('cloudFiles')

   .option("cloudFiles.format", "delta")

   .option("cloudFiles.schemaLocation", f"{silver_path}/_checkpoint")

   .load(bronze_path)

    )

Receives this error:

AnalysisException: Incompatible format detected. A transaction log for Delta was found at `/mnt/f1/f2/_delta_log`, but you are trying to read from `/mnt/f1/f2/` using format("cloudFiles"). You must use 'format("delta")' when reading and writing to a delta table. To disable this check, SET spark.databricks.delta.formatCheck.enabled=false To learn more about Delta...

What's right way of reading Delta Lake files with Auto Loader for further processing (e.g.. from Bronze layer to Silver)?

Thank you!

4 REPLIES 4

-werners-
Esteemed Contributor III

As the error mentions: autoloader and delta do not mix.

but there is change data feed on delta lake (as a source):

https://learn.microsoft.com/en-us/azure/databricks/delta/delta-change-data-feed

Like that you do not have to read the whole delta table but only ingest changes.

Vladif1
New Contributor II

Autoloader doesn't support reading from Delta Lake tables? any other format is supported except delta?

Thank you!

-werners-
Esteemed Contributor III

you can check for yourself:

https://learn.microsoft.com/en-us/azure/databricks/ingestion/auto-loader/

"Auto Loader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats"

And it makes sense. Autoloader is a tool to identify what you have already processed.

Delta lake is more than just some files, it has a transaction log.

Anonymous
Not applicable

Hi @Vlad Feigin​ 

Hope everything is going great.

Just wanted to check in if you were able to resolve your issue. If yes, would you be happy to mark an answer as best so that other members can find the solution more quickly? If not, please tell us so we can help you. 

Cheers!

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.