cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Autoloader Functionality Question: Pull API data directly?

ChristianRRL
Valued Contributor III

Hi there, when referencing Common data loading patterns > Enable flexible semi-structured data pipelines , I noticed this interesting code snippet:

spark.readStream.format("cloudFiles") \
  .option("cloudFiles.format", "json") \
  # will ensure that the headers column gets processed as a map
  .option("cloudFiles.schemaHints",
          "headers map<string,string>, statusCode SHORT") \
  .load("/api/requests") \
  .writeStream \
  .option("mergeSchema", "true") \
  .option("checkpointLocation", "<path-to-checkpoint>") \
  .start("<path_to_target")

This may be a bit of a leap, but I'm wondering if anyone knows if Autoloader supports pulling data directly from an API (as opposed to incrementally loading data from a designated "landing" path). Not sure if I'm reading too much into it, but the `schemaHints` and `/api/requests/` seem awfully close to being literal API calls, and this would be an interesting use-case if we are able to store both the raw json data as well as the API status code in the same target table.

1 ACCEPTED SOLUTION

Accepted Solutions

szymon_dybczak
Esteemed Contributor III

Hi @ChristianRRL ,

Unfortunately, they chose quite confusing name. Autloader only supports one type of source -> cloudFiles.
And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where the payload from API is saved.
So, to sum it up - you can't use autloader to read data directly from API's.

View solution in original post

2 REPLIES 2

szymon_dybczak
Esteemed Contributor III

Hi @ChristianRRL ,

Unfortunately, they chose quite confusing name. Autloader only supports one type of source -> cloudFiles.
And cloudFiles is nothing but your cloud object storage. So in this example they have a datalake directory /api/request where the payload from API is saved.
So, to sum it up - you can't use autloader to read data directly from API's.

This makes sense, thank you for clarifying!

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now