cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Databricks file based trigger to sharepoint

abhishek0306
New Contributor

Hi,

Can we create a file based trigger from sharepoint location for excel files from databricks. So my need is to copy the excel files from sharepoint to external volumes in databricks so can it be done using a trigger that whenever the file drops in sharepoint location a trigger is used to copy files. Also if a trigger cannot be used can we use any python code to do it. If cannot be done currently please let me know that we can only do the copy only schedule trigger.

4 REPLIES 4

pradeep_singh
Contributor III

I don't think that option is available with Sharepoint at the moment . You will have to use a scheduled jobs or streaming job with autoloader .

Thank You
Pradeep Singh - https://www.linkedin.com/in/dbxdev

balajij8
Contributor

@abhishek0306 

SharePoint does not natively support the event notifications required for Databricks File Arrival Triggers. You can use below

  • Azure Logic Apps - Create a workflow with "When a file is created in a folder" SharePoint trigger. The workflow copies the file to an ADLS Gen2 path. Once the file lands in ADLS, Databricks File Arrival Trigger can kick off automatically
  • Python Flow - You can use Python script to list files in SharePoint, compare them against a processed maintenance flow and download new files to ADLS Gen2 or Volumes. You can load the file using Auto Loader or Databricks File Arrival Trigger Flow.

emma_s
Databricks Employee
Databricks Employee

HI,

You could possibly achieve something near to this using the lakeflow connect Sharepoint connector. It's currently in beta so it would need to be enabled in your workspace. Although it isn't triggered on file updates, because it only ingests incrementally, new or changed files, then you could schedule it to run hourly. If nothing had changed, then it would use minimal consumption. The docs for this are here: https://docs.databricks.com/aws/en/ingestion/lakeflow-connect/sharepoint

Thanks,

Emma

rohan22sri
New Contributor II

File-based triggers in Databricks are designed to work with data that already resides in cloud storage (such as ADLS, S3, or GCS). In this case, since the source system is SharePoint, expecting a native file-based trigger from Databricks is not feasible.

SharePoint does not natively emit events that Databricks can directly subscribe to for real-time file ingestion. Because of this, you cannot implement a true event-driven (file-drop) trigger directly between SharePoint and Databricks.

Also note:
A native SharePoint connector/integration with Databricks is now available using lakeflow connect, but it is currently in beta mode. Because of its limited maturity, it may not yet fully support production-grade event-driven or trigger-based ingestion scenarios.


Alternative approaches:

  1. Scheduled ingestion (recommended baseline)
    You can implement a scheduled job in Databricks using Python to periodically check SharePoint (via Microsoft Graph API or SharePoint REST API), download new or updated Excel files, and copy them into external volumes. This is the most reliable and widely used approach.
  2. Using third-party connectors like Fivetran
    Databricks has partnered with Fivetran, which provides a managed connector for SharePoint.
    • Automatically detects updates and ingests data incrementally
    • Supports near real-time/streaming-style ingestion
    • Drawback: The pipeline typically needs to remain running, which may increase cost
    • Link :https://www.fivetran.com/connectors/sharepoint
  3. Event-driven workaround (advanced option)
    If near real-time behavior is required, you can use Microsoft tools like Power Automate or Azure Logic Apps to detect file uploads in SharePoint and trigger downstream processes (e.g., call an API or trigger a Databricks job). This introduces additional components but enables near event-driven behavior.
Rohan