cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

Best practices : Silver Layer to Salesforce

ManojkMohan
Contributor III

Need community view to evaluate my solution based best practice                                                                         Problem i am solving is reading match data from a CSV, this was uploaded into a volume , then i  clean and transform in data bricks , and then upload it in batches to a custom Salesforce object called Match__c. I track success/failure for each upload and optionally saves any failed records to a CSV.                                            

Step #

Step Name

Description

1

Read CSV into DataFrame

Reads IPL match data from a CSV file using Spark, then converts it to a Pandas DataFrame.

2

Drop Auto-Generated Field

Removes the match_id column which is not required in Salesforce.

3

Rename Columns to Salesforce API Names

Renames columns in the DataFrame to match Salesforce custom field API names (e.g., Team1__c).

4

Parse and Format Date

Converts date strings to proper date format (YYYY-MM-DD) and removes rows with invalid dates.

5

Map Team Names to Salesforce IDs

Replaces team names with their corresponding Salesforce Team__c record IDs (for lookups).

6

Clean Picklist Values

Formats picklist fields (like Stage, WonBy) by standardizing case and trimming spaces.

7

Standardize Venue Names

Maps long venue names to shorter, standardized names and truncates to 40 characters if needed.

8

Filter Required Fields

Keeps only the fields needed for Salesforce and removes rows missing required fields.

9

Connect to Salesforce

Authenticates to Salesforce using credentials from Databricks secrets.

10

Prepare for Upload

Converts all object-type columns to strings to ensure compatibility with the Salesforce API.

11

Upload in Batches

Splits data into batches (max 200 records) and inserts each batch using the Salesforce Bulk API.

12

Summarize Upload Results

Counts and prints the number of successfully and unsuccessfully inserted records.

13

Capture and Save Failed Records

Collects failed records, displays them in Databricks, and saves them to a CSV for investigation.

14

Display Sample Results

Shows a few sample API responses to verify the structure and contents of the insert results.

 

1 REPLY 1

-werners-
Esteemed Contributor III

- skip the pandas conversion

- persist the transformed data in a databricks table and then write to salesforce.