cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Loading data from spark dataframe directly to Sharepoint

sshynkary
New Contributor

Hi guys!
I am trying to load data directly from PySpark dataframe to Sharepoint folder and I cannot find a solution regarding it.
I wanted to implement workaround using volumes and logic apps, but there are few issues. I need to partition df in a few files, because the limit to load data to SharePoint is 250GB and after every commit the hash is generated for every file, and in logic apps the filename should be fixed. So I need to do it directly.
The only possible solution I have found is here
Maybe you had a similar task and can provide a better solution? I will really appreciate your answer

1 REPLY 1

ChKing
New Contributor II

One approach could involve using Azure Data Lake as an intermediary. You can partition your PySpark DataFrames and load them into Azure Data Lake, which is optimized for large-scale data storage and integrates well with PySpark. Once the data is in Azure Data Lake, you can then use either Azure Logic Apps or Power Automate to automate the transfer of partitioned files from Azure Data Lake into SharePoint. This way, you avoid directly dealing with SharePoint's file size limits during the initial load.

Alternatively, you might consider leveraging the SharePoint REST API, which supports chunked uploads. This method allows you to partition your PySpark DataFrame into smaller files and upload them sequentially. The REST API would give you more control over the upload process compared to Logic Apps, and you wouldn’t have to worry about filename restrictions since you’d be handling file uploads programmatically.