03-04-2024 12:01 AM
Hello. Please forgive me if this is not the right place to ask, but I'm having issues with databricks' statement execution api. I'm developing Haskell client for this api.
I managed to implement most of it, but I'm running into issues with using named parameters to interpolate values into queries.
The documentation says that I can use named parameter markers in combination with IDENTIFIER to interpolate table names into queries.
This is however not working in all cases. Specifically I'm getting a very misleading error in the following case, when I'm interpolating table identifier in an INSERT statement. I'm running the following request (using linux `curl` command to illustrate the issue)
curl --request POST \
https://WORKSPACE_ID_REDACTED.cloud.databricks.com/api/2.0/sql/statements/ \
--header "Authorization: Bearer DATABRICKS_TOKEN_REDACTED" \
--header "Content-Type: application/json" \
--data '{
"warehouse_id": "WAREHOUSE_ID_REDACTED",
"statement": "INSERT INTO IDENTIFIER(:mytable) VALUES (:name, :age)",
"parameters": [{"name": "name", "value": "jan"},{"name": "age", "value": "10"},{"name": "mytable", "value": "z_jan_hrcek.test.person"}]
}'
Actual behavior:
I get an error response like this:
{"statement_id":"01eed9f9-13c6-1abf-ad15-22647b4a619c","status":{"state":"FAILED","error":{"error_code":"BAD_REQUEST","message":"[UNBOUND_SQL_PARAMETER] Found the unbound parameter: name. Please, fix `args` and provide a mapping of the parameter to a SQL literal.; line 1 pos 41"}}}
This is misleading, because it states I'm not providing `name` parameter, even though you can see I am actually providing the value for this parameter in the request.
Expected behavior:
- all supplied parameters should be interpolated into the query
OR
* better error message to make it clear what the issue is (is it that interpolating table name into INSERT queries is not supported?)
Interestingly, if I hard-code the table name into the query, everything works as expected:
curl --request POST \
https://dbc-6ce34a66-aae7.cloud.databricks.com/api/2.0/sql/statements/ \
--header "Authorization: Bearer dapic78c5812b28d77c2c768c4520cf6c7c8" \
--header "Content-Type: application/json" \
--data '{
"warehouse_id": "77d7c1940bb93e75",
"statement": "INSERT INTO z_jan_hrcek.test.person VALUES (:name, :age)",
"parameters": [{"name": "name", "value": "jan"},{"name": "age", "value": "10"}]
}'
# ^ This succeeds, and 1 row is insterted into the table
But for my use case I need the ability to specify table name dynamically and ideally use parameter interpolation to help me prevent SQL injection.
03-07-2024 02:22 AM
Hi @jhrcek, Let’s dive into the issue you’re facing with Databricks’ Statement Execution API and named parameters.
Named Parameters and Table Identifiers:
INSERT
statement. This is a common use case, especially when dealing with dynamic table names.The Error Message:
UNBOUND_SQL_PARAMETER
: This indicates that a parameter in your SQL statement is not bound (i.e., its value is not provided).Found the unbound parameter: name
: It suggests that the name
parameter is not being properly bound.line 1 pos 41
, which is where the issue supposedly occurs.name
parameter in your request, so something else might be going wrong.Troubleshooting Steps:
:name
, :age
, and :mytable
) match the names you’re providing in the parameters
section of your request.IDENTIFIER
, make sure you’re enclosing the table name in double quotes (if needed). For example: INSERT INTO IDENTIFIER("z_jan_hrcek.test.person")
.?
) instead of named parameters. If this works, it might help narrow down the issue.Hard-Coded Table Name:
Conclusion:
IDENTIFIER
usage, and explore unnamed parameters. Feel free to share more details or ask for additional help if needed! 🚀
03-07-2024 03:35 AM
Thanks for the reply, but I don't find this (LLM generated?) reply very helpful 😢
None of the troubleshooting tips helps:
- Check Parameter Names: don't you see the example code? You can check for yourself that names of params in the query correspond to the names of args in parameters list
- Quotes Around IDENTIFIER: The identifier has to contain the table name placeholder. It can't be quoted, otherwise parameter interpolation doesn't work.
- API Version Compatibility: the api version IS in the URL /2.0/ as you can see in my examples
- Permissions and Warehouse Configuration: This can't be the problem, because everything works when I hardcode the table
- Try Unnamed Parameters: unnamed parameters are not supported in statement execution api based on the documentation (https://docs.databricks.com/api/workspace/statementexecution/executestatement#parameters😞
currently, positional parameters denoted by a ? marker are not supported by the Databricks SQL Statement Execution API.
What would be more helpful is if you could point me to some github repo/issue tracker where I can report this issue as it seems to be a problem with how the api backend is implemented.
03-07-2024 04:10 AM
Hi @jhrcek, I understand your frustration, and I apologize if my previous reply didn't address your concerns adequately. It seems you've already explored several troubleshooting tips without success.
To address the issue you're facing with the Databricks SQL Statement Execution API, I'll look into finding a suitable GitHub repository or issue tracker where you can report this problem. I'll get back to you as soon as I have more information.
Thank you for bringing this to our attention, and I appreciate your patience.
03-07-2024 06:20 AM
Ok, I thing I found what's behind this.
This seems to be a bug in spark itself.
Here's how I was able to reproduce it in spark-shell running in docker container:
https://gist.github.com/jhrcek/49386004a9a47172649158af288106f4
I think it's best if I just report it in spark issue tracker.
Join our fast-growing data practitioner and expert community of 80K+ members, ready to discover, help and collaborate together while making meaningful connections.
Click here to register and join today!
Engage in exciting technical discussions, join a group with your peers and meet our Featured Members.