cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Loading data from spark dataframe directly to Sharepoint

sshynkary
New Contributor

Hi guys!
I am trying to load data directly from PySpark dataframe to Sharepoint folder and I cannot find a solution regarding it.
I wanted to implement workaround using volumes and logic apps, but there are few issues. I need to partition df in a few files, because the limit to load data to SharePoint is 250GB and after every commit the hash is generated for every file, and in logic apps the filename should be fixed. So I need to do it directly.
The only possible solution I have found is here
Maybe you had a similar task and can provide a better solution? I will really appreciate your answer

1 REPLY 1

ChKing
New Contributor II

One approach could involve using Azure Data Lake as an intermediary. You can partition your PySpark DataFrames and load them into Azure Data Lake, which is optimized for large-scale data storage and integrates well with PySpark. Once the data is in Azure Data Lake, you can then use either Azure Logic Apps or Power Automate to automate the transfer of partitioned files from Azure Data Lake into SharePoint. This way, you avoid directly dealing with SharePoint's file size limits during the initial load.

Alternatively, you might consider leveraging the SharePoint REST API, which supports chunked uploads. This method allows you to partition your PySpark DataFrame into smaller files and upload them sequentially. The REST API would give you more control over the upload process compared to Logic Apps, and you wouldn’t have to worry about filename restrictions since you’d be handling file uploads programmatically.

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you won’t want to miss the chance to attend and share knowledge.

If there isn’t a group near you, start one and help create a community that brings people together.

Request a New Group