cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Optimal Batch Size for Batch Insert Queries using JDBC for Delta Tables

ridrasura
New Contributor III

Hi,

I am currently experimenting with databricks-jdbc : 2.6.29 and trying to execute batch insert queries

What is the optimal batch size recommended by Databricks for performing Batch Insert queries?

Currently it seems that values are inserted row by row even if Batch Insert is done.

Is there any recommended way to perform multi-row insert? 

Note : using commons-dbutils as a wrapper on JDBC.

public static void insertRecordsBatchDbUtils() {
        String dbUrl = "jdbc:databricks://#####################";
        DataSource d = new DataSource();
        d.setURL(dbUrl);
        d.setUserID("token");
        d.setPassword("##############################");
        QueryRunner queryRunner = new QueryRunner();
        try (Connection con = d.getConnection()) {
            String sql = "INSERT INTO events.segment_debuggability_test VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?)";
 
            int[] result = queryRunner.batch(con, sql, new Object[][]{
                    {"130", "bolt_regular_customer", "advertiser public", "DELETE", "3455642", "bolt", "org3","org","file_34","azure_blob_path_1","M-Particle","success","10/10/2022 20:00","compliance success","1"},
                    {"132", "swiggy_customer", "advertiser private", "ADD", "3455634", "swiggy", "org5","org","file_209","s3_path_134","Private Segment","success","25/10/2022 20:00","compliance success","1"},
                    {"122", "zomato_customer", "advertiser public", "ADD", "34556445", "zomato", "org45","org","file_090","s3_path_161","S3 Segment Upload","failure","13/10/2022 20:00","compliance failure : advertiser-non-compliant","1"}});
 
            System.out.println("Inserted Records :: " + result.length);
        } catch (SQLException e) {
            e.printStackTrace();
        }
    }

1 REPLY 1

ridrasura
New Contributor III

Just an observation : By using auto optimize table level property, I was able to see batch inserts inserting records in single file.

https://docs.databricks.com/optimizations/auto-optimize.html

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.