@stevenayers-bge Autoloader is designed to work best with immutable files. If files are mutable (i.e., they can be updated), it is recommended to set cloudFiles.allowOverwrites = true to ensure that the latest version of the file is read.
Please refe...
@Marcin_U Please use the below option in the readStream to load only parquet files
.option("pathGlobfilter", "*.parquet")
Please refer to the below documentation:
https://docs.databricks.com/en/ingestion/cloud-object-storage/auto-loader/options.html...
@GregTyndall Yes, the current limit is 2 by default. But we can increase up to 5 with the below flag added to the pipeline settings.
pipelines.enzyme.numberOfJoinsThreshold 5
We can only specify columns with statistics collected for clustering keys. By default, the first 32 columns in a Delta table have statistics collected. See Specify Delta statistics columns.
We can use the below workaround for your use case:
1. Use th...