cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Get Started Discussions
Start your journey with Databricks by joining discussions on getting started guides, tutorials, and introductory topics. Connect with beginners and experts alike to kickstart your Databricks experience.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Auto-Update API Data

ChristianRRL
Contributor II

Not sure if this has come up before, but I'm wondering if Databricks has any kind of functionality to "watch" an API call for changes?

E.g. Currently I have a frequently running job that pulls data via an API call and overwrites the old data. This seems pretty inefficient since the data from the API call may or may not change between each overwrite. In fact, the data from the API call changes in an "ad-hoc" basis so it would be nice to pull data from the API *only* when there are data changes.

1 ACCEPTED SOLUTION

Accepted Solutions

Kaniz_Fatma
Community Manager
Community Manager

Hi @ChristianRRLDatabricks provides a REST API that allows you to interact with various aspects of your Databricks workspace programmatically. While there isnโ€™t a direct built-in feature to โ€œwatchโ€ an API call for changes, you can design a solution using the available APIs to achieve your goal.

Here are some approaches you can consider:

  1. Scheduled Polling with Conditional Fetch:

    • Instead of running your job frequently and overwriting data every time, schedule it to run at specific intervals (e.g., hourly or daily).
    • Before making the API call, check if the data has changed since the last fetch. You can do this by comparing timestamps or other relevant metadata.
    • If there are no changes, skip the API call. If changes are detected, proceed with fetching the updated data.
  2. Change Data Feed (CDC):

    • If your data source supports change data capture (CDC), you can use it to track changes efficiently.
    • Databricks supports CDC for certain data sources (e.g., Delta Lake tables). When using CDC, only the changed data is fetched, reducing unnecessary API calls.
    • Set up a streaming job that monitors the CDC logs and processes only the relevant changes.
  3. Monitor API Calls:

    • While not specifically designed for watching changes, you can create a custom monitoring solution using Databricksโ€™ REST API.
    • Set up a monitoring job that periodically checks the data source (e.g., the API endpoint) for changes.
    • If changes are detected (based on your criteria), trigger the actual data extraction job.
  4. Alerts and Webhooks:

    • Configure alerts within Databricks to notify you when specific conditions are met (e.g., data changes).
    • Use webhooks to trigger subsequent actions (such as fetching data) when an alert fires.
    • You can create an alert based on changes in the data source and set up a webhook to invoke your API call.
  5. Custom Logic in Your Job:

    • Modify your existing job to include custom logic that checks for changes before making the API call.
    • For example, store the last fetched timestamp and compare it with the current dataโ€™s timestamp. If they differ, fetch the updated data.

Explore the Databricks REST API documentation1 for details on how to make API calls and retrieve relevant information.

 

View solution in original post

1 REPLY 1

Kaniz_Fatma
Community Manager
Community Manager

Hi @ChristianRRLDatabricks provides a REST API that allows you to interact with various aspects of your Databricks workspace programmatically. While there isnโ€™t a direct built-in feature to โ€œwatchโ€ an API call for changes, you can design a solution using the available APIs to achieve your goal.

Here are some approaches you can consider:

  1. Scheduled Polling with Conditional Fetch:

    • Instead of running your job frequently and overwriting data every time, schedule it to run at specific intervals (e.g., hourly or daily).
    • Before making the API call, check if the data has changed since the last fetch. You can do this by comparing timestamps or other relevant metadata.
    • If there are no changes, skip the API call. If changes are detected, proceed with fetching the updated data.
  2. Change Data Feed (CDC):

    • If your data source supports change data capture (CDC), you can use it to track changes efficiently.
    • Databricks supports CDC for certain data sources (e.g., Delta Lake tables). When using CDC, only the changed data is fetched, reducing unnecessary API calls.
    • Set up a streaming job that monitors the CDC logs and processes only the relevant changes.
  3. Monitor API Calls:

    • While not specifically designed for watching changes, you can create a custom monitoring solution using Databricksโ€™ REST API.
    • Set up a monitoring job that periodically checks the data source (e.g., the API endpoint) for changes.
    • If changes are detected (based on your criteria), trigger the actual data extraction job.
  4. Alerts and Webhooks:

    • Configure alerts within Databricks to notify you when specific conditions are met (e.g., data changes).
    • Use webhooks to trigger subsequent actions (such as fetching data) when an alert fires.
    • You can create an alert based on changes in the data source and set up a webhook to invoke your API call.
  5. Custom Logic in Your Job:

    • Modify your existing job to include custom logic that checks for changes before making the API call.
    • For example, store the last fetched timestamp and compare it with the current dataโ€™s timestamp. If they differ, fetch the updated data.

Explore the Databricks REST API documentation1 for details on how to make API calls and retrieve relevant information.

 
Join 100K+ Data Experts: Register Now & Grow with Us!

Excited to expand your horizons with us? Click here to Register and begin your journey to success!

Already a member? Login and join your local regional user group! If there isn’t one near you, fill out this form and we’ll create one for you to join!