Databricks Community

Tico23 · ‎03-05-2023

After successfully loading 3 small files (2 KB each) in from AWS S3 using Auto Loader for learning purposes, I got, few hours later, a "AWS Free tier limit alert", although I haven't used the AWS account for a while.

Does this streaming service on Databricks that runs all the time consume requests even if no files/data are uploaded?

Is this normal or did I overlook some hidden configuration?

daniel_sahal · ‎03-05-2023

@Alexander Mora Araya

It somehow needs to check if there's a new file on the storage, so yes - it will consume request if it runs continuously.

View solution in original post

daniel_sahal · ‎03-05-2023

@Alexander Mora Araya

It somehow needs to check if there's a new file on the storage, so yes - it will consume request if it runs continuously.

Debayan · ‎03-06-2023

Hi, Auto Loader incrementally and efficiently processes new data files as they arrive in cloud storage. Auto Loader can load data files from AWS S3 (s3://), Azure Data Lake Storage Gen2 (ADLS Gen2, abfss://), Google Cloud Storage (GCS, gs://), Azure Blob Storage (wasbs://), ADLS Gen1 (adl://), and Databricks File System (DBFS, dbfs:/). Auto Loader can ingest JSON, CSV, PARQUET, AVRO, ORC, TEXT, and BINARYFILE file formats.

Auto Loader provides a Structured Streaming source called cloudFiles. Given an input directory path on the cloud file storage, the cloudFiles source automatically processes new files as they arrive, with the option of also processing existing files in that directory. Auto Loader has support for both Python and SQL in Delta Live Tables.

You can use Auto Loader to process billions of files to migrate or backfill a table. Auto Loader scales to support near real-time ingestion of millions of files per hour.

Could you please reverify if the cloud storage is receiving any files or not?

Please refer: https://docs.databricks.com/ingestion/auto-loader/index.html

Please let us know if this helps.

Also please tag @Debayan with your next response which will notify me, Thank you!

Tico23 · ‎03-06-2023

@Debayan Mukherjee

Thanks for this explanation. Everything worked fine when I tested it, as I mentioned above. The only thing is that it continuously makes requests to S3 to check if new data needs to be pull. Am I wrong here?

Databricks Community

AmazonS3 with Autoloader consume "too many" requests or maybe not!

Connect with Databricks Users in Your Area

Databricks Learning Festival (Virtual): 15 January - 31 January 2025

Milestone: DatabricksTV Reaches 100 Videos!

Announcing the new Meta Llama 3.3 model on Databricks

Databricks Community Champion - December 2024 - Sujesh Menon

Dotmatics and Databricks Partner to Advance Scientific Intelligence in Life Sciences