topic JDBC driver CPU consumption in Data Engineering

JDBC driver CPU consumption

ivni — Thu, 18 Sep 2025 07:38:18 GMT

Hi,

I am using JDBC driver to execute an insert statement with several thousand of rows (~4MB). It takes several seconds to complete and for some reason consumes 1 full CPU core for it.

It seems like a lot of the time is spent in this method:

com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName

Sample stack trace:

void java.util.regex.Pattern.compile() void java.util.regex.Pattern.<init>(String, int) Pattern java.util.regex.Pattern.compile(String, int) String com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.RemoveCatalogFromQueryStringInternal(String, String, ILogger) String com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName(String, ILogger, HiveJDBCSettings, IWarningListener) void com.databricks.client.hivecommon.dataengine.HiveJDBCNativeQueryExecutor.<init>(ILogger, IHiveClient, HiveJDBCStatement, String, HiveJDBCCommonConnection, boolean, ConnSettingRequestMap, boolean, boolean) IQueryExecutor com.databricks.client.hivecommon.dataengine.HiveJDBCDataEngine.prepare(String) void com.databricks.client.jdbc.common.SPreparedStatement.<init>(String, IStatement, SConnection, int) void com.databricks.client.jdbc.jdbc41.S41PreparedStatement.<init>(String, IStatement, SConnection, int) void com.databricks.client.jdbc.jdbc42.S42PreparedStatement.<init>(String, IStatement, SConnection, int) void com.databricks.client.hivecommon.jdbc42.Hive42PreparedStatement.<init>(String, HiveJDBCStatement, SConnection, int) SPreparedStatement com.databricks.client.spark.jdbc.SparkJDBCObjectFactory.createPreparedStatement(String, IStatement, SConnection, int) IJDBCPreparedStatement com.databricks.client.jdbc.common.JDBCObjectFactory.newPreparedStatement(String, IStatement, SConnection, int) IJDBCPreparedStatement com.databricks.client.jdbc.common.SConnection$6.create(IStatement) IJDBCStatement com.databricks.client.jdbc.common.SConnection$6.create(IStatement) IJDBCStatement com.databricks.client.jdbc.common.SConnection$StatementCreator.create() IJDBCPreparedStatement com.databricks.client.jdbc.common.SConnection.prepareStatement(String, int, int) PreparedStatement com.databricks.client.jdbc.common.SConnection.prepareStatement(String, int, int)

How can this be fixed so it would not be CPU bound?

Driver version:

com.databricks:databricks-jdbc:2.6.40

Re: JDBC driver CPU consumption

szymon_dybczak — Thu, 18 Sep 2025 07:52:50 GMT

Hi @ivni ,

Yes, that method could be CPU intensive. According to driver's docs it removes catalog name from query statement. But it doing this via regex patterns - this is heavy operation from CPU perspective, especially if you have a lot of complex queries.

What you can try to do is to add useNativeQuery=1 to your connection string. With that setting, the driver passes the SQL queries verbatim to Databricks.

Re: JDBC driver CPU consumption

ivni — Fri, 19 Sep 2025 11:15:05 GMT

Thank you for the suggestion, but useNativeQuery=1 doesn't seem to reduce CPU usage. Usage example:

String sql = Files.readString(Path.of("insert.sql")); String url = "jdbc:databricks://host.cloud.databricks.com:443/data;connschema=schema;transportMode=http;ssl=1;AuthMech=3;httpPath=/path;useNativeQuery=1"; Properties props = new Properties(); props.setProperty("user", "token"); props.setProperty("password", "<token>"); props.setProperty("useNativeQuery", "1"); Driver driver = DriverManager.getDriver(url); try (Connection conn = driver.connect(url, props); Statement st = conn.createStatement()) { st.execute(sql); }

Any other suggestions?

Re: JDBC driver CPU consumption

szymon_dybczak — Fri, 19 Sep 2025 11:49:31 GMT

Hi,

You can also try to disable this StripCatalogName=0 in your jdbc connection string.

Re: JDBC driver CPU consumption

ivni — Fri, 19 Sep 2025 12:01:37 GMT

StripCatalogName=0 doesn't seem to have effect either.

Re: JDBC driver CPU consumption

szymon_dybczak — Fri, 19 Sep 2025 12:04:02 GMT

Ok, one last thing. Try to add explicitly to jdbc connection string information about catalog and connSchema

ConnCatalog=your_catalog;ConnSchema=your_schema;

Re: JDBC driver CPU consumption

ivni — Fri, 19 Sep 2025 12:51:03 GMT

So I guess something like this?

jdbc:databricks://host.cloud.databricks.com:443;httpPath=/path;ConnCatalog=data;ConnSchema=schema;transportMode=http;ssl=1;AuthMech=3;useNativeQuery=1;StripCatalogName=0

These measures don't seem to influence CPU consumption.

Re: JDBC driver CPU consumption

szymon_dybczak — Fri, 19 Sep 2025 13:04:42 GMT

Could you once again check stack trace then? In previous message you wrote that major time is spent at below method:

com.databricks.client.hivecommon.utils.HiveCommonQueryTranslationUtils.stripCatalogName

How it looks like now?

Re: JDBC driver CPU consumption

ivni — Fri, 19 Sep 2025 14:22:56 GMT

It is still there: