Need community view to evaluate my solution based best practice Problem i am solving is reading match data from a CSV, this was uploaded into a volume , then i clean and transform in data bricks , and then upload it in batches to a custom Salesforce object called Match__c. I track success/failure for each upload and optionally saves any failed records to a CSV.
Step # | Step Name | Description |
1 | Read CSV into DataFrame | Reads IPL match data from a CSV file using Spark, then converts it to a Pandas DataFrame. |
2 | Drop Auto-Generated Field | Removes the match_id column which is not required in Salesforce. |
3 | Rename Columns to Salesforce API Names | Renames columns in the DataFrame to match Salesforce custom field API names (e.g., Team1__c). |
4 | Parse and Format Date | Converts date strings to proper date format (YYYY-MM-DD) and removes rows with invalid dates. |
5 | Map Team Names to Salesforce IDs | Replaces team names with their corresponding Salesforce Team__c record IDs (for lookups). |
6 | Clean Picklist Values | Formats picklist fields (like Stage, WonBy) by standardizing case and trimming spaces. |
7 | Standardize Venue Names | Maps long venue names to shorter, standardized names and truncates to 40 characters if needed. |
8 | Filter Required Fields | Keeps only the fields needed for Salesforce and removes rows missing required fields. |
9 | Connect to Salesforce | Authenticates to Salesforce using credentials from Databricks secrets. |
10 | Prepare for Upload | Converts all object-type columns to strings to ensure compatibility with the Salesforce API. |
11 | Upload in Batches | Splits data into batches (max 200 records) and inserts each batch using the Salesforce Bulk API. |
12 | Summarize Upload Results | Counts and prints the number of successfully and unsuccessfully inserted records. |
13 | Capture and Save Failed Records | Collects failed records, displays them in Databricks, and saves them to a CSV for investigation. |
14 | Display Sample Results | Shows a few sample API responses to verify the structure and contents of the insert results. |