cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
Join discussions on data engineering best practices, architectures, and optimization strategies within the Databricks Community. Exchange insights and solutions with fellow data engineers.
cancel
Showing results for 
Search instead for 
Did you mean: 

JDBC driver CPU consumption

ivni
New Contributor

Hi,

I am using JDBC driver to execute an insert statement with several thousand of rows (~4MB). It takes several seconds to complete and for some reason consumes 1 full CPU core for it.

It seems like a lot of the time is spent in this method:

com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName

Sample stack trace:

void java.util.regex.Pattern.compile()
void java.util.regex.Pattern.<init>(String, int)
Pattern java.util.regex.Pattern.compile(String, int)
String com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.RemoveCatalogFromQueryStringInternal(String, String, ILogger)
String com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName(String, ILogger, HiveJDBCSettings, IWarningListener)
void com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.<init>(ILogger, IHiveClient, HiveJDBCStatement, String, HiveJDBCCommonConnection, boolean, ConnSettingRequestMap, boolean, boolean)
IQueryExecutor com.databricks.client.hivecommon.dataengine.HiveJDBCDataEngine.prepare(String)
void com.databricks.client.jdbc.common.SPreparedStatement.<init>(String, IStatement, SConnection, int)
void com.databricks.client.jdbc.jdbc41.S41PreparedStatement.<init>(String, IStatement, SConnection, int)
void com.databricks.client.jdbc.jdbc42.S42PreparedStatement.<init>(String, IStatement, SConnection, int)
void com.databricks.client.hivecommon.jdbc42.Hive42PreparedStatement.<init>(String, HiveJDBCStatement, SConnection, int)
SPreparedStatement com.databricks.client.spark.jdbc.SparkJDBCObjectFactory.createPreparedStatement(String, IStatement, SConnection, int)
IJDBCPreparedStatement com.databricks.client.jdbc.common.JDBCObjectFactory.newPreparedStatement(String, IStatement, SConnection, int)
IJDBCPreparedStatement com.databricks.client.jdbc.common.SConnection$6.create(IStatement)
IJDBCStatement com.databricks.client.jdbc.common.SConnection$6.create(IStatement)
IJDBCStatement com.databricks.client.jdbc.common.SConnection$StatementCreator.create()
IJDBCPreparedStatement com.databricks.client.jdbc.common.SConnection.prepareStatement(String, int, int)
PreparedStatement com.databricks.client.jdbc.common.SConnection.prepareStatement(String, int, int)

How can this be fixed so it would not be CPU bound?

Driver version:

com.databricks:databricks-jdbc:2.6.40
8 REPLIES 8

szymon_dybczak
Esteemed Contributor III

Hi @ivni ,

Yes, that method could be CPU intensive. According to driver's docs it removes catalog name from query statement. But it doing this via regex patterns - this is heavy operation from CPU perspective, especially if you have a lot of complex queries.

szymon_dybczak_0-1758181838729.png

What you can try to do is to add useNativeQuery=1 to your connection string. With that setting, the driver passes the SQL queries verbatim to Databricks.

 

Thank you for the suggestion, but useNativeQuery=1 doesn't seem to reduce CPU usage. Usage example:

 

        String sql = Files.readString(Path.of("insert.sql"));
        String url = "jdbc:databricks://host.cloud.databricks.com:443/data;connschema=schema;transportMode=http;ssl=1;AuthMech=3;httpPath=/path;useNativeQuery=1";

        Properties props = new Properties();
        props.setProperty("user", "token");
        props.setProperty("password", "<token>");
        props.setProperty("useNativeQuery", "1");
        Driver driver = DriverManager.getDriver(url);
        try (Connection conn = driver.connect(url, props);
             Statement st = conn.createStatement()) {
            st.execute(sql);
        }

 Any other suggestions?

szymon_dybczak
Esteemed Contributor III

Hi, 

You can also try to disable this StripCatalogName=0 in your jdbc connection string.

StripCatalogName=0 doesn't seem to have effect either.

szymon_dybczak
Esteemed Contributor III

Ok, one last thing. Try to add explicitly to jdbc connection string information about catalog and connSchema

ConnCatalog=your_catalog;ConnSchema=your_schema;

So I guess something like this?

jdbc:databricks://host.cloud.databricks.com:443;httpPath=/path;ConnCatalog=data;ConnSchema=schema;transportMode=http;ssl=1;AuthMech=3;useNativeQuery=1;StripCatalogName=0

These measures don't seem to influence CPU consumption. 

szymon_dybczak
Esteemed Contributor III

Could you once again check stack trace then? In previous message you wrote that major time is spent at below method:

com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName

 

How it looks like now?

It is still there:

ivni_0-1758291599928.png

 

Join Us as a Local Community Builder!

Passionate about hosting events and connecting people? Help us grow a vibrant local community—sign up today to get started!

Sign Up Now