topic Autoloader Functionality Question: Pull API data directly? in Data Engineering

Autoloader Functionality Question: Pull API data directly?

ChristianRRL — Fri, 08 Aug 2025 03:51:57 GMT

Hi there, when referencing Common data loading patterns > Enable flexible semi-structured data pipelines , I noticed this interesting code snippet:

spark.readStream.format("cloudFiles") \ .option("cloudFiles.format", "json") \ # will ensure that the headers column gets processed as a map .option("cloudFiles.schemaHints", "headers map<string,string>, statusCode SHORT") \ .load("/api/requests") \ .writeStream \ .option("mergeSchema", "true") \ .option("checkpointLocation", "<path-to-checkpoint>") \ .start("<path_to_target")

This may be a bit of a leap, but I'm wondering if anyone knows if Autoloader supports pulling data directly from an API (as opposed to incrementally loading data from a designated "landing" path). Not sure if I'm reading too much into it, but the `schemaHints` and `/api/requests/` seem awfully close to being literal API calls, and this would be an interesting use-case if we are able to store both the raw json data as well as the API status code in the same target table.

Re: Autoloader Functionality Question: Pull API data directly?

szymon_dybczak — Fri, 08 Aug 2025 08:11:55 GMT

Hi @ChristianRRL ,

Unfortunately, they chose quite confusing name. Autloader only supports one type of source -> cloudFiles.
And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where the payload from API is saved.
So, to sum it up - you can't use autloader to read data directly from API's.

Re: Autoloader Functionality Question: Pull API data directly?

ChristianRRL — Fri, 08 Aug 2025 17:49:03 GMT

This makes sense, thank you for clarifying!