<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:taxo="http://purl.org/rss/1.0/modules/taxonomy/" version="2.0">
  <channel>
    <title>topic Re: Salesforce Bulk API 2.0 not getting all rows from large table in Data Engineering</title>
    <link>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/134674#M50168</link>
    <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/174238"&gt;@AlanDanque&lt;/a&gt;&amp;nbsp;I am working on a similar use case and will share screen shots shortly&lt;/P&gt;&lt;P&gt;But to reach the root cause can you share the below details&lt;/P&gt;&lt;TABLE width="503px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Checks at Salesforce&lt;/TD&gt;&lt;TD width="309.391px"&gt;Description&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Header used?&lt;/TD&gt;&lt;TD width="309.391px"&gt;Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included in the job request header?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Header honored?&lt;/TD&gt;&lt;TD width="309.391px"&gt;Salesforce logs show chunked job with multiple batch IDs? Or only one batch returned?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Logs?&lt;/TD&gt;&lt;TD width="309.391px"&gt;Job shows status Completed but result set is only 1 file?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Object supported?&lt;/TD&gt;&lt;TD width="309.391px"&gt;&lt;A href="https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm" target="_blank"&gt;Not all standard or custom objects support PK chunking; confirm Salesforce docs.&lt;/A&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;TABLE width="552px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Checks at Databricks&lt;/TD&gt;&lt;TD width="316px"&gt;Description&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;File Count Check&lt;/TD&gt;&lt;TD width="316px"&gt;Check if the number of result files (CSV chunks) is greater than 1. If there’s only 1 file, chunking likely didn't happen, or job was not split correctly. Use: dbutils.fs.ls("/mnt/tmp/salesforce_chunks/")&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Row Count Validation&lt;/TD&gt;&lt;TD width="316px"&gt;After ingestion, check that the row count in the Delta table is close to expected (~13M). A record count of ~250K indicates silent truncation. Use: df.count()&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Chunk Metadata Logging&lt;/TD&gt;&lt;TD width="316px"&gt;Log the number of records per chunk/file during ingestion. This helps detect dropped or corrupted chunks. Log: filename, record count, chunk ID (if available)&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Failed Chunk Detection&lt;/TD&gt;&lt;TD width="316px"&gt;Look for missing or partial chunk downloads. If Salesforce returns 4 result files and only 3 are downloaded, something failed silently. Implement: Logging after each download attempt.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Job Status Check&lt;/TD&gt;&lt;TD width="316px"&gt;Before downloading, check the job status from Salesforce via API. If JobComplete is false or a batch is in Failed, Databricks shouldn't proceed with ingestion.Use: API polling in notebook&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
    <pubDate>Sun, 12 Oct 2025 21:06:53 GMT</pubDate>
    <dc:creator>ManojkMohan</dc:creator>
    <dc:date>2025-10-12T21:06:53Z</dc:date>
    <item>
      <title>Salesforce Bulk API 2.0 not getting all rows from large table</title>
      <link>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/124472#M47203</link>
      <description>&lt;P&gt;Has anyone run into an incomplete data extraction issue with the Salesforce Bulk API 2.0 where very large source object tables with more than 260k rows (s/b approx 13M) - result in only extracting approx 250k on attempt?&lt;/P&gt;</description>
      <pubDate>Tue, 08 Jul 2025 14:46:16 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/124472#M47203</guid>
      <dc:creator>AlanDanque</dc:creator>
      <dc:date>2025-07-08T14:46:16Z</dc:date>
    </item>
    <item>
      <title>Re: Salesforce Bulk API 2.0 not getting all rows from large table</title>
      <link>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/134650#M50163</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/174238"&gt;@AlanDanque&lt;/a&gt;&amp;nbsp;&lt;/P&gt;
&lt;P&gt;The only reason you are seeing fewer records is that you don't have access to all the rows for that table.&amp;nbsp;&lt;/P&gt;
&lt;P&gt;Can you confirm that at your end?&lt;/P&gt;
&lt;P&gt;&amp;nbsp;&lt;/P&gt;</description>
      <pubDate>Sun, 12 Oct 2025 05:30:57 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/134650#M50163</guid>
      <dc:creator>Krishna_S</dc:creator>
      <dc:date>2025-10-12T05:30:57Z</dc:date>
    </item>
    <item>
      <title>Re: Salesforce Bulk API 2.0 not getting all rows from large table</title>
      <link>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/134674#M50168</link>
      <description>&lt;P&gt;&lt;a href="https://community.databricks.com/t5/user/viewprofilepage/user-id/174238"&gt;@AlanDanque&lt;/a&gt;&amp;nbsp;I am working on a similar use case and will share screen shots shortly&lt;/P&gt;&lt;P&gt;But to reach the root cause can you share the below details&lt;/P&gt;&lt;TABLE width="503px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Checks at Salesforce&lt;/TD&gt;&lt;TD width="309.391px"&gt;Description&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Header used?&lt;/TD&gt;&lt;TD width="309.391px"&gt;Was Sforce-Enable-PKChunking: chunkSize=250000 explicitly included in the job request header?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Header honored?&lt;/TD&gt;&lt;TD width="309.391px"&gt;Salesforce logs show chunked job with multiple batch IDs? Or only one batch returned?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Logs?&lt;/TD&gt;&lt;TD width="309.391px"&gt;Job shows status Completed but result set is only 1 file?&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="192.609px"&gt;Object supported?&lt;/TD&gt;&lt;TD width="309.391px"&gt;&lt;A href="https://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm" target="_blank"&gt;Not all standard or custom objects support PK chunking; confirm Salesforce docs.&lt;/A&gt;&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;&lt;TABLE width="552px"&gt;&lt;TBODY&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Checks at Databricks&lt;/TD&gt;&lt;TD width="316px"&gt;Description&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;File Count Check&lt;/TD&gt;&lt;TD width="316px"&gt;Check if the number of result files (CSV chunks) is greater than 1. If there’s only 1 file, chunking likely didn't happen, or job was not split correctly. Use: dbutils.fs.ls("/mnt/tmp/salesforce_chunks/")&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Row Count Validation&lt;/TD&gt;&lt;TD width="316px"&gt;After ingestion, check that the row count in the Delta table is close to expected (~13M). A record count of ~250K indicates silent truncation. Use: df.count()&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Chunk Metadata Logging&lt;/TD&gt;&lt;TD width="316px"&gt;Log the number of records per chunk/file during ingestion. This helps detect dropped or corrupted chunks. Log: filename, record count, chunk ID (if available)&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Failed Chunk Detection&lt;/TD&gt;&lt;TD width="316px"&gt;Look for missing or partial chunk downloads. If Salesforce returns 4 result files and only 3 are downloaded, something failed silently. Implement: Logging after each download attempt.&lt;/TD&gt;&lt;/TR&gt;&lt;TR&gt;&lt;TD width="235px"&gt;Job Status Check&lt;/TD&gt;&lt;TD width="316px"&gt;Before downloading, check the job status from Salesforce via API. If JobComplete is false or a batch is in Failed, Databricks shouldn't proceed with ingestion.Use: API polling in notebook&lt;/TD&gt;&lt;/TR&gt;&lt;/TBODY&gt;&lt;/TABLE&gt;</description>
      <pubDate>Sun, 12 Oct 2025 21:06:53 GMT</pubDate>
      <guid>https://community.databricks.com/t5/data-engineering/salesforce-bulk-api-2-0-not-getting-all-rows-from-large-table/m-p/134674#M50168</guid>
      <dc:creator>ManojkMohan</dc:creator>
      <dc:date>2025-10-12T21:06:53Z</dc:date>
    </item>
  </channel>
</rss>

