Ai Query Prompt Token and Completition token
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-26-2025 09:27 PM
Hi
I would like to know how can I get the Completition token and Prompt token quantity when using Ai_Query?
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2025 01:32 AM
Hello @Andreyai
good day!!
For AI_queries, we have documentation from databricks. :
https://docs.databricks.com/aws/en/sql/language-manual/functions/ai_query I am 100% sure you will get better insights from the documentations.
But I have something for you from internet:
- Install tiktoken in a Databricks notebook (via %pip install tiktoken).
- Example Python code to estimate:python
import tiktoken def count_tokens(text: str, encoding_name: str = "cl100k_base") -> int: encoding = tiktoken.get_encoding(encoding_name) return len(encoding.encode(text)) # Example usage prompt = "Your prompt text here" # Replace with your actual prompt estimated_prompt_tokens = count_tokens(prompt) print(f"Estimated prompt tokens: {estimated_prompt_tokens}") # For completion, estimate based on expected output length (e.g., max_tokens param) example_completion = "Sample generated response" # Simulate or use a sample estimated_completion_tokens = count_tokens(example_completion) print(f"Estimated completion tokens: {estimated_completion_tokens}")
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
08-27-2025 02:17 AM - edited 08-27-2025 02:41 AM
Hi
Thank you for your response.
But I was expecting a response from ai_query with the usage information like when you use a completion.create call on OpenAi. Is it possible? So on it call it will return an response and the usage.
In my case I have a set of images, where for each Ai_Query each image I am passing the prompt consist on text with commands and an image. And it returns a description of the image. And with that I would like to get the token quantity so I can infer the cost of the operation. I am using the Llama 4 maverick and Claude 3.7 Sonnet.
link OpenAI: https://platform.openai.com/docs/api-reference/chat/list
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Report Inappropriate Content
10-03-2025 09:05 PM
Hi @Andreyai
The batch inference requests hit a model serving endpoint; as long as inference tables and usage tracking are enabled on that endpoint, the requests will get logged regardless of how they were submitted to the endpoint.
See the schema for the endpoint usage and inference table schema, and it has both input tokens and output tokens information.
Hope this helps.