Re: Databricks JDBC Error while connecting from Da...

Fuzail · ‎05-01-2025

I am reading data from Databricks in datatstage 11.7 on prem using datastage JDBC connector and getting the below error. I tried to limit the select queries to one row , it was able to read data form the source,

JDBC_Connector_0: The connector encountered a Java exception:
java.sql.SQLException: [Databricks][JDBCDriver](500051) ERROR processing query/statement. Error Code: 0, SQL state: null, Query: select cola,colb from table1, Error message from Server: Configuration AutoCommit is not available..

I am using the latest JDBC driver available on databricks and done the required configuration. any assistance on this would be of great help

Louis_Frolio · ‎05-01-2025

Greetings Fuzail, here are some suggestions you might want to consider:

The error you're encountering, "Configuration AutoCommit is not available," when using the Databricks JDBC connector in DataStage 11.7 suggests a misalignment with the auto-commit settings.

Analysis and Recommendations:

Auto-Commit Behavior in Databricks JDBC Driver:
- The Databricks JDBC driver operates with auto-commit mode enabled by default. This means that manual commit operations are generally not supported, as indicated by related error messages like "Cannot use commit while connection is in auto-commit mode" from similar issues.
Resolution Steps:
- Ensure that the DataStage JDBC properties explicitly set the AutoCommit parameter to true in its connection settings to align with the Databricks JDBC driver's behavior. This adjustment should prevent the connector from attempting manual commits, which are not supported.
- Refer to the DataStage and Databricks configuration documentation to locate where these connection properties can be explicitly defined, such as in the configuration wizard or JDBC connection string.
Additional Debugging:
- If the error persists, verify that the latest Databricks JDBC driver version is being used, as updates might include fixes for such compatibility issues. The latest versions are recommended for addressing known problems.
- Cross-check the logs to identify whether any test queries fired by DataStage during connection initiation might conflict with the driver's auto-commit behavior.
Documentation:
- See the Databricks JDBC Driver Installation and Configuration Guide for further insights on how the driver handles transaction-related operations and auto-commit settings.

Making these adjustments should help resolve the auto-commit configuration error.

Cheers, Big Roux.

View solution in original post

Fuzail · ‎05-01-2025

Thank you so much for the quick help. setting auto commit to true resolved the issue. I just have one more followup question, The update to databricks using JDBC is taking very longer time and looks like its processing row by row, I tried to adjust the setting of the connector but does not help. From the datastage log i can see "The driver does not support batch updates. The connector will enforce the batch size value of 1." Is there any possible workaround for this issue.

View solution in original post

Fuzail · ‎05-02-2025

I just have one more followup question, The update to databricks using JDBC is taking very longer time and looks like its processing row by row, I tried to adjust the setting of the connector but does not help. From the datastage log i can see "The driver does not support batch updates. The connector will enforce the batch size value of 1." Is there any possible workaround for this issue @Louis_Frolio , can you provide your suggestion for this.

Louis_Frolio · ‎05-02-2025

Here are some suggestions, not sure if it fits with what you are doing but they are worth mentioning.

The Databricks JDBC driver currently does not support batch updates, which is why your updates appear to process row by row with a batch size of 1.

Here are the details and possible workarounds:

Driver Limitation:
- The Databricks JDBC driver enforces a batch size of 1 for updates because it does not currently support batch operations in auto-commit mode.
Workarounds:
- Use COPY INTO: Databricks supports the COPY INTO command, which can handle bulk data ingestion efficiently. This approach sidesteps the limitations of JDBC for batch updates.
- Batch Inserts Using Spark SQL: You can implement a workaround by inserting multiple rows in a single SQL statement via Spark SQL's VALUES clause. For instance, you can construct an INSERT INTO statement that batches hundreds of rows within a single operation. Note that you may need additional logic to handle splitting large jobs into manageable chunks.
- Programmatic Ingestion: If COPY INTO or Spark SQL is not feasible, consider using Databricks' supported ingestion methods like DataFrames or Delta Lake APIs for optimized data writes.

Databricks JDBC Error while connecting from Datastage JDBC connector

Analysis and Recommendations: