Databricks Community

ChristianRRL · ‎08-07-2025

Hi there, when referencing Common data loading patterns > Enable flexible semi-structured data pipelines , I noticed this interesting code snippet:

spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "json") \
  # will ensure that the headers column gets processed as a map
  .option("cloudFiles.schemaHints",
          "headers map<string,string>, statusCode SHORT") \
  .load("/api/requests") \
  .writeStream \
  .option("mergeSchema", "true") \
  .option("checkpointLocation", "<path-to-checkpoint>") \
  .start("<path_to_target")

This may be a bit of a leap, but I'm wondering if anyone knows if Autoloader supports pulling data directly from an API (as opposed to incrementally loading data from a designated "landing" path). Not sure if I'm reading too much into it, but the `schemaHints` and `/api/requests/` seem awfully close to being literal API calls, and this would be an interesting use-case if we are able to store both the raw json data as well as the API status code in the same target table.

szymon_dybczak · ‎08-08-2025

Hi @ChristianRRL ,

Unfortunately, they chose quite confusing name. Autloader only supports one type of source -> cloudFiles.
And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where the payload from API is saved.
So, to sum it up - you can't use autloader to read data directly from API's.

View solution in original post

szymon_dybczak · ‎08-08-2025

Hi @ChristianRRL ,

Unfortunately, they chose quite confusing name. Autloader only supports one type of source -> cloudFiles.
And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where the payload from API is saved.
So, to sum it up - you can't use autloader to read data directly from API's.