Salesforce connection with Databricks

NDK_1
New Contributor II

How can we connect with salesforce from databricks without using any third party jar files?

-werners-
Esteemed Contributor III

you could connect to the salesforce database directly using odbc/jdbc, if that is permitted.
But why not using an ETL tool to extract data, and use databricks for the processing?

NDK_1
New Contributor II

Not able to connect from odbc/jdbc. Now using azure logic apps to extract data from salesforce

rnavi
New Contributor II

@NDK_1 : are you looking to keep it in sync with your salesforce periodically or two way sync ? Would love to understand the usecase more.

NDK_1
New Contributor II

I am planning to use it as a two way sync, reading data from salesforce as well as writing data into salesforce

-werners-
Esteemed Contributor III

if you want to write to salesforce, it is probably a good idea to check the requirements.
Many of those huge software packages do not allow direct writes into the database.  So you might need a licensed connector or something.
If you want to keep it in sync, I am leaning towards using an ETL tool.

emillion25
New Contributor III

You can connect to Salesforce from Databricks without using third-party JAR files by leveraging Python and the Salesforce REST API using the simple-salesforce library. Since simple-salesforce is a Python package, you can install it within your Databricks notebook and interact with Salesforce directly.

Steps to Connect Databricks to Salesforce via REST API

  1. Install the simple-salesforce library in Databricks:

     
    python
     
    %pip install simple-salesforce
  2. Authenticate and Connect to Salesforce:

     
    python
    from simple_salesforce import Salesforce
     
    # Salesforce Credentials
    username = 'your_username'
    password = 'your_password'
    security_token = 'your_security_token'
     
    # Authenticate sf = Salesforce(username=username, password=password, security_token=security_token) print(sf) # If successful, this will print a Salesforce object
  3. Read Data from Salesforce:

     
    python
    query_result = sf.query("SELECT Id, Name FROM Account LIMIT 10") print(query_result['records'])
  4. Write Data to Salesforce:

     
    python
    new_account = sf.Account.create({'Name': 'New Databricks Account'}) print(new_account)

 

Handling Two-Way Sync

Since you're looking for a two-way sync (reading and writing data), here’s how you can:

  • Read from Salesforce: Use sf.query() or sf.query_all() to fetch records.
  • Write to Salesforce: Use create(), update(), or delete() methods based on the operation.
  • Automate Syncing: Use Databricks Jobs to run these scripts at scheduled intervals.

 

Alternatives

  • If you’re using OAuth for authentication, you can obtain an access token and use REST API calls using requests instead.
  • If you want to ingest bulk data, you can use Salesforce's Bulk API (also accessible via simple-salesforce).