JDBC driver CPU consumption

ivni
New Contributor III

Hi,

I am using JDBC driver to execute an insert statement with several thousand of rows (~4MB). It takes several seconds to complete and for some reason consumes 1 full CPU core for it.

It seems like a lot of the time is spent in this method:

com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName

Sample stack trace:

void java.util.regex.Pattern.compile()
void java.util.regex.Pattern.<init>(String, int)
Pattern java.util.regex.Pattern.compile(String, int)
String com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.RemoveCatalogFromQueryStringInternal(String, String, ILogger)
String com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName(String, ILogger, HiveJDBCSettings, IWarningListener)
void com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.<init>(ILogger, IHiveClient, HiveJDBCStatement, String, HiveJDBCCommonConnection, boolean, ConnSettingRequestMap, boolean, boolean)
IQueryExecutor com.databricks.client.hivecommon.dataengine.HiveJDBCDataEngine.prepare(String)
void com.databricks.client.jdbc.common.SPreparedStatement.<init>(String, IStatement, SConnection, int)
void com.databricks.client.jdbc.jdbc41.S41PreparedStatement.<init>(String, IStatement, SConnection, int)
void com.databricks.client.jdbc.jdbc42.S42PreparedStatement.<init>(String, IStatement, SConnection, int)
void com.databricks.client.hivecommon.jdbc42.Hive42PreparedStatement.<init>(String, HiveJDBCStatement, SConnection, int)
SPreparedStatement com.databricks.client.spark.jdbc.SparkJDBCObjectFactory.createPreparedStatement(String, IStatement, SConnection, int)
IJDBCPreparedStatement com.databricks.client.jdbc.common.JDBCObjectFactory.newPreparedStatement(String, IStatement, SConnection, int)
IJDBCPreparedStatement com.databricks.client.jdbc.common.SConnection$6.create(IStatement)
IJDBCStatement com.databricks.client.jdbc.common.SConnection$6.create(IStatement)
IJDBCStatement com.databricks.client.jdbc.common.SConnection$StatementCreator.create()
IJDBCPreparedStatement com.databricks.client.jdbc.common.SConnection.prepareStatement(String, int, int)
PreparedStatement com.databricks.client.jdbc.common.SConnection.prepareStatement(String, int, int)

How can this be fixed so it would not be CPU bound?

Driver version:

com.databricks:databricks-jdbc:2.6.40

szymon_dybczak
Esteemed Contributor III

Hi @ivni ,

Yes, that method could be CPU intensive. According to driver's docs it removes catalog name from query statement. But it doing this via regex patterns - this is heavy operation from CPU perspective, especially if you have a lot of complex queries.

szymon_dybczak_0-1758181838729.png

What you can try to do is to add useNativeQuery=1 to your connection string. With that setting, the driver passes the SQL queries verbatim to Databricks.

 

ivni
New Contributor III

Thank you for the suggestion, but useNativeQuery=1 doesn't seem to reduce CPU usage. Usage example:

 

        String sql = Files.readString(Path.of("insert.sql"));
        String url = "jdbc:databricks://host.cloud.databricks.com:443/data;connschema=schema;transportMode=http;ssl=1;AuthMech=3;httpPath=/path;useNativeQuery=1";

        Properties props = new Properties();
        props.setProperty("user", "token");
        props.setProperty("password", "<token>");
        props.setProperty("useNativeQuery", "1");
        Driver driver = DriverManager.getDriver(url);
        try (Connection conn = driver.connect(url, props);
             Statement st = conn.createStatement()) {
            st.execute(sql);
        }

 Any other suggestions?

szymon_dybczak
Esteemed Contributor III

Hi, 

You can also try to disable this StripCatalogName=0 in your jdbc connection string.

ivni
New Contributor III

StripCatalogName=0 doesn't seem to have effect either.

szymon_dybczak
Esteemed Contributor III

Ok, one last thing. Try to add explicitly to jdbc connection string information about catalog and connSchema

ConnCatalog=your_catalog;ConnSchema=your_schema;

ivni
New Contributor III

So I guess something like this?

jdbc:databricks://host.cloud.databricks.com:443;httpPath=/path;ConnCatalog=data;ConnSchema=schema;transportMode=http;ssl=1;AuthMech=3;useNativeQuery=1;StripCatalogName=0

These measures don't seem to influence CPU consumption. 

szymon_dybczak
Esteemed Contributor III

Could you once again check stack trace then? In previous message you wrote that major time is spent at below method:

com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName

 

How it looks like now?

ivni
New Contributor III

It is still there:

ivni_0-1758291599928.png