Databricks Community

AlanDanque · ‎07-08-2025

Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?

Krishna_S · a month ago

@AlanDanque

The only reason you are seeing fewer records is that you don't have access to all the rows for that table.

Can you confirm that at your end?

ManojkMohan · a month ago

@AlanDanque I am working on a similar use case and will share screen shots shortly

But to reach the root cause can you share the below details

Checks at Salesforce	Description
Header used?	Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included in the job request header?
Header honored?	Salesforce logs show chunked job with multiple batch IDs? Or only one batch returned?
Logs?	Job shows status Completed but result set is only 1 file?
Object supported?	Not all standard or custom objects support PK chunking; confirm Salesforce docs.

Checks at Databricks	Description
File Count Check	Check if the number of result files (CSV chunks) is greater than 1. If there’s only 1 file, chunking likely didn't happen, or job was not split correctly. Use: dbutils.fs.ls("/mnt/tmp/salesforce_chunks/")
Row Count Validation	After ingestion, check that the row count in the Delta table is close to expected (~13M). A record count of ~250K indicates silent truncation. Use: df.count()
Chunk Metadata Logging	Log the number of records per chunk/file during ingestion. This helps detect dropped or corrupted chunks. Log: filename, record count, chunk ID (if available)
Failed Chunk Detection	Look for missing or partial chunk downloads. If Salesforce returns 4 result files and only 3 are downloaded, something failed silently. Implement: Logging after each download attempt.
Job Status Check	Before downloading, check the job status from Salesforce via API. If JobComplete is false or a batch is in Failed, Databricks shouldn't proceed with ingestion.Use: API polling in notebook

Databricks Community

Salesforce Bulk API 2.0 not getting all rows from large table

Join Us as a Local Community Builder!

🌟 Community Pulse: Your Weekly Roundup! October 31 – November 06, 2025

Free Edition Hackathon

🚀 Announcing the Databricks Data Intelligence Platform Cheat Sheet

Zerobus Ingest in Action: How to Stream Event Data Directly into Your Lakehouse

Find Sensitive Data at Scale with Data Classification in Unity Catalog