cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results forĀ 
Search instead forĀ 
Did you mean:Ā 

Perform database operations like INSERT, and MERGE using the backend service (golang)

harikrishnang33
New Contributor II
  • I would like to have an example or reference documentation on how we can create a live (capable of real-time data persistence) pipeline from an in-house golang (backend) service to the databricks tables.
  • I would give a bit more detail on what the golang service intends to do. This golang service would take some input from other services and pushes the data into the databricks tables. The data can be pushed by any of these operations based on the use case: INSERT, UPDATE, UPSERT/MERGE, and DELETE. These operations would be real-time. 
  • Ideally what I would like to do is UPSERT/MERGE operation based on the uniqueness of a row (primary key is a combination of two or more columns) so that it would INSERT when no matching rows are found, and UPDATE when matching rows are found. So it would be good if I can get an example or reference documentation.
  • Also, I think there is no implementation of databricks driver for gorm. If there is any, please share that also. I am asking about sharing any gorm driver because we would have some 100+ columns to INSERT or UPDATE or UPSERT/MERGE. So if there is any ORM, there we can provide the interface and data and ORM would take care of the mapping into the query statement.

For example, consider this table. This is just an example of the understanding of what we would like to do.

user_id	phone_number	status	  is_deleted
123456	9123474387	       ACTIVE	         0
123456	9123474388	     VERIFYING	         0
123456	9123474388	     VERIFYING	         0

I would like to UPSERT rows based on the primary key (user_id + phone_number) using the data given below

user_id	phone_number	 status	         is_deleted
123456	9123474387	        ACTIVE	                 0
123456	9123474388	        ACTIVE	                 0
123456	9123474388	   VERIFY_FAILED	         0

2 REPLIES 2

Anonymous
Not applicable

@Harikrishnan Gā€‹ :

To create a live pipeline from an in-house Golang backend service to Databricks tables, you can use the Databricks API to read and write data in real-time. The API provides endpoints for various operations such as creating tables, inserting data, updating data, and deleting data.

Here's an example of how you can use the Databricks API to insert data into a table in real-time:

  1. Create a Databricks cluster and set up the required libraries and configurations to interact with the API.
  2. Create a table in Databricks that matches the schema of the data you want to insert.
  3. In your Golang service, use a REST client library such as "net/http" to send an HTTP POST request to the Databricks API endpoint for inserting data into the table.
  4. Construct the request body to include the data you want to insert in the table, in JSON format.
  5. Send the request to the Databricks API endpoint, along with the authentication token and other required headers.
  6. Check the response from the API to ensure that the data was inserted successfully.

To perform UPSERT/MERGE operations, you can use the MERGE command in SQL. Here's an example of how you can use the MERGE command to UPSERT data based on the primary key:

MERGE INTO table_name t
USING (
    VALUES 
    (123456, 9123474387, 'ACTIVE', 0),
    (123456, 9123474388, 'ACTIVE', 0),
    (123456, 9123474388, 'VERIFY_FAILED', 0)
) s (user_id, phone_number, status, is_deleted)
ON t.user_id = s.user_id AND t.phone_number = s.phone_number
WHEN MATCHED THEN
    UPDATE SET t.status = s.status, t.is_deleted = s.is_deleted
WHEN NOT MATCHED THEN
    INSERT (user_id, phone_number, status, is_deleted) VALUES (s.user_id, s.phone_number, s.status, s.is_deleted)

This SQL statement will merge the data from the "s" table into the "t" table based on the primary key (user_id + phone_number), performing an UPSERT operation.

Regarding Gorm, looks like there is no implementation of a Databricks driver for Gorm. However, you can use the Databricks API to interact with the tables directly from your Golang code. Alternatively, you can use a Golang SQL library such as "database/sql" to execute SQL statements directly against the Databricks tables.

Anonymous
Not applicable

Hi @Harikrishnan Gā€‹ 

Thank you for posting your question in our community! We are happy to assist you.

To help us provide you with the most accurate information, could you please take a moment to review the responses and select the one that best answers your question?

This will also help other community members who may have similar questions in the future. Thank you for your participation and let us know if you need any further assistance! 

Connect with Databricks Users in Your Area

Join a Regional User Group to connect with local Databricks users. Events will be happening in your city, and you wonā€™t want to miss the chance to attend and share knowledge.

If there isnā€™t a group near you, start one and help create a community that brings people together.

Request a New Group