Databricks Community

sshynkary · ‎09-25-2024

Hi guys!
I am trying to load data directly from PySpark dataframe to Sharepoint folder and I cannot find a solution regarding it.
I wanted to implement workaround using volumes and logic apps, but there are few issues. I need to partition df in a few files, because the limit to load data to SharePoint is 250GB and after every commit the hash is generated for every file, and in logic apps the filename should be fixed. So I need to do it directly.
The only possible solution I have found is here
Maybe you had a similar task and can provide a better solution? I will really appreciate your answer

ChKing · ‎09-25-2024

One approach could involve using Azure Data Lake as an intermediary. You can partition your PySpark DataFrames and load them into Azure Data Lake, which is optimized for large-scale data storage and integrates well with PySpark. Once the data is in Azure Data Lake, you can then use either Azure Logic Apps or Power Automate to automate the transfer of partitioned files from Azure Data Lake into SharePoint. This way, you avoid directly dealing with SharePoint's file size limits during the initial load.

Alternatively, you might consider leveraging the SharePoint REST API, which supports chunked uploads. This method allows you to partition your PySpark DataFrame into smaller files and upload them sequentially. The REST API would give you more control over the upload process compared to Logic Apps, and you wouldn’t have to worry about filename restrictions since you’d be handling file uploads programmatically.

Databricks Community

Loading data from spark dataframe directly to Sharepoint

Join Us as a Local Community Builder!

Free Edition Hackathon

Big Book of Data Engineering - Get how-tos, code snippets and real-world examples

Level Up with Databricks Specialist Sessions

🌟 Community Pulse: Your Weekly Roundup! November 07 – 13, 2025

⭐ Setup Spark with Hadoop Anywhere : A DBR aligned local Spark+HDFS+Hive stack on Docker⭐