Best practices : Silver Layer to Salesforce

ManojkMohan — Wed, 27 Aug 2025 19:32:22 GMT

Need community view to evaluate my solution based best practice Problem i am solving is reading match data from a CSV, this was uploaded into a volume , then i clean and transform in data bricks , and then upload it in batches to a custom Salesforce object called Match__c. I track success/failure for each upload and optionally saves any failed records to a CSV.

Step #	Step Name	Description
1	Read CSV into DataFrame	Reads IPL match data from a CSV file using Spark, then converts it to a Pandas DataFrame.
2	Drop Auto-Generated Field	Removes the match_id column which is not required in Salesforce.
3	Rename Columns to Salesforce API Names	Renames columns in the DataFrame to match Salesforce custom field API names (e.g., Team1__c).
4	Parse and Format Date	Converts date strings to proper date format (YYYY-MM-DD) and removes rows with invalid dates.
5	Map Team Names to Salesforce IDs	Replaces team names with their corresponding Salesforce Team__c record IDs (for lookups).
6	Clean Picklist Values	Formats picklist fields (like Stage, WonBy) by standardizing case and trimming spaces.
7	Standardize Venue Names	Maps long venue names to shorter, standardized names and truncates to 40 characters if needed.
8	Filter Required Fields	Keeps only the fields needed for Salesforce and removes rows missing required fields.
9	Connect to Salesforce	Authenticates to Salesforce using credentials from Databricks secrets.
10	Prepare for Upload	Converts all object-type columns to strings to ensure compatibility with the Salesforce API.
11	Upload in Batches	Splits data into batches (max 200 records) and inserts each batch using the Salesforce Bulk API.
12	Summarize Upload Results	Counts and prints the number of successfully and unsuccessfully inserted records.
13	Capture and Save Failed Records	Collects failed records, displays them in Databricks, and saves them to a CSV for investigation.
14	Display Sample Results	Shows a few sample API responses to verify the structure and contents of the insert results.

Re: Best practices : Silver Layer to Salesforce

-werners- — Thu, 28 Aug 2025 06:36:52 GMT

- skip the pandas conversion

- persist the transformed data in a databricks table and then write to salesforce.

topic Best practices : Silver Layer to Salesforce in Data Engineering

Best practices : Silver Layer to Salesforce

Re: Best practices : Silver Layer to Salesforce