Salesforce Bulk API 2.0 not getting all rows from large table
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
07-08-2025 07:46 AM
Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?
Labels:
- Labels:
-
Workflows
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-11-2025 10:30 PM
The only reason you are seeing fewer records is that you don't have access to all the rows for that table.
Can you confirm that at your end?
Options
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-12-2025 02:06 PM
@AlanDanque I am working on a similar use case and will share screen shots shortly
But to reach the root cause can you share the below details
| Checks at Salesforce | Description |
| Header used? | Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included in the job request header? |
| Header honored? | Salesforce logs show chunked job with multiple batch IDs? Or only one batch returned? |
| Logs? | Job shows status Completed but result set is only 1 file? |
| Object supported? | Not all standard or custom objects support PK chunking; confirm Salesforce docs. |
| Checks at Databricks | Description |
| File Count Check | Check if the number of result files (CSV chunks) is greater than 1. If there’s only 1 file, chunking likely didn't happen, or job was not split correctly. Use: dbutils.fs.ls("/mnt/tmp/salesforce_chunks/") |
| Row Count Validation | After ingestion, check that the row count in the Delta table is close to expected (~13M). A record count of ~250K indicates silent truncation. Use: df.count() |
| Chunk Metadata Logging | Log the number of records per chunk/file during ingestion. This helps detect dropped or corrupted chunks. Log: filename, record count, chunk ID (if available) |
| Failed Chunk Detection | Look for missing or partial chunk downloads. If Salesforce returns 4 result files and only 3 are downloaded, something failed silently. Implement: Logging after each download attempt. |
| Job Status Check | Before downloading, check the job status from Salesforce via API. If JobComplete is false or a batch is in Failed, Databricks shouldn't proceed with ingestion.Use: API polling in notebook |