cancel
Showing results for 
Search instead for 
Did you mean: 
Data Engineering
cancel
Showing results for 
Search instead for 
Did you mean: 

Cannot use prepared statements with date functions

Rex
New Contributor III

We are using PHP and the Databricks SQL ODBC driver and cannot run a query that users DATE functions with prepared statements. Sample script/docker setup here:

https://github.com/rlorenzo/databricks_php/blob/odbc_prepare_error/test_connection.php

For example, the following query works in with the Python SQL driver:

SELECT count(1), companyid, company_name as name, series, investment_date
    FROM refactored_db.view_afd
    WHERE companyid not in (
        SELECT companyid
            FROM refactored_db.view_afd
            WHERE investment_date < DATE(NOW()-INTERVAL 64 DAY))
    AND investment_date BETWEEN DATE(NOW()-INTERVAL 64 DAY) AND DATE(NOW())
    GROUP BY companyid, company_name, series, investment_date
    HAVING count(1) <= ?

But using PHP and the ODBC driver we get the error:

Warning: odbc_prepare(): SQL error: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '?' expecting {'(', '{', 'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIO, SQL state 37000 in SQLPrepare in /test_connection.php on line 43
odbc_prepare Error: [Simba][Hardy] (80) Syntax or semantic analysis error thrown in server while executing query. Error message from server: org.apache.hive.service.cli.HiveSQLException: Error running query: org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input '?' expecting {'(', '{', 'APPLY', 'CALLED', 'CHANGES', 'CLONE', 'COLLECT', 'CONTAINS', 'CONVERT', 'COPY', 'COPY_OPTIO

Is this a known issue? We would like to use prepared statements to protect against possible SQL injection. However, it seems we need to forgo prepared statements if the driver doesn't support it fully.

1 ACCEPTED SOLUTION

Accepted Solutions

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

Darn it. In cases where rewriting a query is too tedious, another option is defining the query as a logical view and simply selecting from it and applying parameters as needed. With this approach the query in the definition can’t contain any parameters; they must be defined outside the view. Did you give that a shot?

View solution in original post

6 REPLIES 6

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Rex Lorenzo​ I'll dig into this and find out.

Kaniz
Community Manager
Community Manager

Hi Community, Just FYI.

A prepared statement is a feature used to execute the same (or similar) SQL statements repeatedly with high efficiency.

Prepared statements basically work like this:

  1. Prepare: An SQL statement template is created and sent to the database. Certain values are left unspecified, called parameters (labeled "?"). Example: INSERT INTO MyGuests VALUES(?,?, ?)
  2. The database parses, compiles, and performs query optimization on the SQL statement template, and stores the result without executing it
  3. Execute: At a later time, the application binds the values to the parameters, and the database executes the statement. The application may execute the statement as many times as it wants with different values

Compared to executing SQL statements directly, prepared statements have three main advantages:

  • Prepared statements reduce parsing time as the preparation on the query is done only once (although the statement is executed multiple times)
  • Bound parameters minimize bandwidth to the server as you need to send only the parameters each time, and not the whole query
  • Prepared statements are very useful against SQL injections, because parameter values, which are transmitted later using a different protocol, need not be correctly escaped. If the original statement template is not derived from external input, SQL injection cannot occur.

Source

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

@Rex Lorenzo​ apologies it took me so long to get back to you. This is definitely a bug, and it is in the ODBC driver. We are aware of it and are working on it. The root cause is that the SQL parser on driver isn't fully aware of Spark SQL i.e. it's best to stick to SQL-92 e.g. you are using DATE(), which is not standard SQL. Instead, try using CAST(as date) or  CONVERT( as date).

Can you please try it and let me know what happens?

Rex
New Contributor III

@Bilal Aslam​ We tried CAST and CONVERT and still getting the same error.

BilalAslamDbrx
Honored Contributor II
Honored Contributor II

Darn it. In cases where rewriting a query is too tedious, another option is defining the query as a logical view and simply selecting from it and applying parameters as needed. With this approach the query in the definition can’t contain any parameters; they must be defined outside the view. Did you give that a shot?

Kaniz
Community Manager
Community Manager

Hi @Rex Lorenzo​ , We haven't heard from you on the last response from @Bilal Aslam​ , and I was checking back to see if his suggestions helped you. Or else, If you have any solution, please share it with the community as it can be helpful to others.

Also, please don't forget to click on the "Select As Best" button whenever the information provided helps resolve your question.

Welcome to Databricks Community: Lets learn, network and celebrate together

Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections. 

Click here to register and join today! 

Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.