cancel
Showing results forย 
Search instead forย 
Did you mean:ย 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forย 
Search instead forย 
Did you mean:ย 

Salesforce Bulk API 2.0 not getting all rows from large table

AlanDanque
New Contributor

Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?

2 REPLIES 2

Krishna_S
Databricks Employee
Databricks Employee

@AlanDanque 

The only reason you are seeing fewer records is that you don't have access to all the rows for that table. 

Can you confirm that at your end?

 

ManojkMohan
Honored Contributor

@AlanDanque I am working on a similar use case and will share screen shots shortly

But to reach the root cause can you share the below details

Checks at SalesforceDescription
Header used?Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included in the job request header?
Header honored?Salesforce logs show chunked job with multiple batch IDs? Or only one batch returned?
Logs?Job shows status Completed but result set is only 1 file?
Object supported?Not all standard or custom objects support PK chunking; confirm Salesforce docs.
Checks at DatabricksDescription
File Count CheckCheck if the number of result files (CSV chunks) is greater than 1. If thereโ€™s only 1 file, chunking likely didn't happen, or job was not split correctly. Use: dbutils.fs.ls("/mnt/tmp/salesforce_chunks/")
Row Count ValidationAfter ingestion, check that the row count in the Delta table is close to expected (~13M). A record count of ~250K indicates silent truncation. Use: df.count()
Chunk Metadata LoggingLog the number of records per chunk/file during ingestion. This helps detect dropped or corrupted chunks. Log: filename, record count, chunk ID (if available)
Failed Chunk DetectionLook for missing or partial chunk downloads. If Salesforce returns 4 result files and only 3 are downloaded, something failed silently. Implement: Logging after each download attempt.
Job Status CheckBefore downloading, check the job status from Salesforce via API. If JobComplete is false or a batch is in Failed, Databricks shouldn't proceed with ingestion.Use: API polling in notebook