Ai Query Prompt Token and Completition token

Andreyai · ‎08-26-2025

Hi

I would like to know how can I get the Completition token and Prompt token quantity when using Ai_Query?

Thanks

Khaja_Zaffer · ‎08-27-2025

Hello @Andreyai

good day!!

For AI_queries, we have documentation from databricks. :

https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query I am 100% sure you will get better insights from the documentations.

But I have something for you from internet:

Estimating Token Counts (Without Running the Query) You can use a tokenizer to approximate prompt and completion tokens based on your input text and expected output.

For Databricks foundation models like DBRX or Meta Llama series, use the cl100k_base encoding from OpenAI's tiktoken library (it's compatible).

Install tiktoken in a Databricks notebook (via %pip install tiktoken).

Example Python code to estimate:

python

import tiktoken
def count_tokens(text: str, encoding_name: str = "cl100k_base") -> int:
    encoding = tiktoken.get_encoding(encoding_name)
    return len(encoding.encode(text))

# Example usage
prompt = "Your prompt text here"  # Replace with your actual prompt
estimated_prompt_tokens = count_tokens(prompt)
print(f"Estimated prompt tokens: {estimated_prompt_tokens}")

# For completion, estimate based on expected output length (e.g., max_tokens param)
example_completion = "Sample generated response"  # Simulate or use a sample
estimated_completion_tokens = count_tokens(example_completion)
print(f"Estimated completion tokens: {estimated_completion_tokens}")

Andreyai · ‎08-27-2025

Hi

Thank you for your response.
But I was expecting a response from ai_query with the usage information like when you use a completion.create call on OpenAi. Is it possible? So on it call it will return an response and the usage.

In my case I have a set of images, where for each Ai_Query each image I am passing the prompt consist on text with commands and an image. And it returns a description of the image. And with that I would like to get the token quantity so I can infer the cost of the operation. I am using the Llama 4 maverick and Claude 3.7 Sonnet.

link OpenAI: https://platform.openai.com/docs/api-reference/chat/list

Thanks

Krishna_S · ‎10-03-2025

Hi @Andreyai

The batch inference requests hit a model serving endpoint; as long as inference tables and usage tracking are enabled on that endpoint, the requests will get logged regardless of how they were submitted to the endpoint.

See the schema for the endpoint usage and inference table schema, and it has both input tokens and output tokens information.

https://docs.databricks.com/aws/en/ai-gateway/inference-tables#query-and-analyze-results-in-the-infe...

https://docs.databricks.com/aws/en/ai-gateway/configure-ai-gateway-endpoints#systemservingendpoint_u...

https://docs.databricks.com/aws/en/ai-gateway/inference-tables#ai-gateway-enabled-inference-table-sc...

Hope this helps.