Re: Salesforce Bulk API 2.0 not getting all rows f...

AlanDanque · ‎07-08-2025

Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?

Krishna_S · ‎10-11-2025

@AlanDanque

The only reason you are seeing fewer records is that you don't have access to all the rows for that table.

Can you confirm that at your end?

ManojkMohan · ‎10-12-2025

@AlanDanque I am working on a similar use case and will share screen shots shortly

But to reach the root cause can you share the below details

Checks at Salesforce	Description
Header used?	Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included in the job request header?
Header honored?	Salesforce logs show chunked job with multiple batch IDs? Or only one batch returned?
Logs?	Job shows status Completed but result set is only 1 file?
Object supported?	Not all standard or custom objects support PK chunking; confirm Salesforce docs.

Checks at Databricks	Description
File Count Check	Check if the number of result files (CSV chunks) is greater than 1. If there’s only 1 file, chunking likely didn't happen, or job was not split correctly. Use: dbutils.fs.ls("/mnt/tmp/salesforce_chunks/")
Row Count Validation	After ingestion, check that the row count in the Delta table is close to expected (~13M). A record count of ~250K indicates silent truncation. Use: df.count()
Chunk Metadata Logging	Log the number of records per chunk/file during ingestion. This helps detect dropped or corrupted chunks. Log: filename, record count, chunk ID (if available)
Failed Chunk Detection	Look for missing or partial chunk downloads. If Salesforce returns 4 result files and only 3 are downloaded, something failed silently. Implement: Logging after each download attempt.
Job Status Check	Before downloading, check the job status from Salesforce via API. If JobComplete is false or a batch is in Failed, Databricks shouldn't proceed with ingestion.Use: API polling in notebook

Salesforce Bulk API 2.0 not getting all rows from large table