This is the first part of a guide on writing prompts for models accessible via the Databricks Foundation Model API, such as DBRX and Llama 3. Prompting, whether in the context of interacting with a chat-based AI application or deeply integrated with the codebase of an AI-based application, is central to how we get useful responses from large language models (LLMs). In this series, we will share a collection of prompting tools and techniques, all grounded in intuition about how LLMs work.
There are already plenty of very good LLM prompting guides available online. Why are we writing another one? We think that, in addition to cataloging useful techniques, there is room for a prompting guide that connects specific techniques to a broader understanding of the structure and behavior of LLMs. You should come away from this series with a strong intuition of how models work and how to write prompts to solve your specific problems, not with a memorized list of prompt types.
This guide will also focus on using prompting to solve specific problems. There isn’t always an obvious, correct prompt for each situation. Working with LLMs is an empirical exercise: learning how to compare and test prompts is at least as important as being able to pick the right kind of prompt.
This introductory post will expand on the concept of a “prompt” and explore the journey of a prompt through a model. The post will help to build some intuition about how LLMs use prompts to generate output. In particular, the post will cover:
Future posts in this series will go into more detail about specific techniques and why they work.
A prompt is an input that an LLM uses as a starting point for generating a response. The model continues from wherever the prompt leaves off, with the goal of generating new text that makes sense in the context of the prompt. Prompts are the primary mechanism by which users and developers interact with LLMs and steer their behaviors. Well-designed prompts can direct models to complete a wide range of tasks, such as classification, summarization, open-ended chat, and text revision. Prompts can also guide models’ communication styles, enabling them to adopt different personas or tones. You can even prompt models to return structured output, such as JSON or XML, which is essential when LLM outputs are used programmatically and integrated into other applications and systems.
Prompts can take many different forms depending on the user’s goal and on the tasks the model was trained to complete. While all LLM outputs are generated through an underlying text-completion mechanism, the specific form and purpose of the prompt can make the interaction with the model resemble different types of tasks.
Some examples include:
Anything that the model uses as the seed for completion is a prompt: it is a very general concept. However, different models can and usually do respond to the same prompt with different outputs, largely as a result of differences in training data.
LLMs are initially trained (pretrained) on text completion, which equips them with a strong aptitude for generating and manipulating text. Pretraining does not, however, typically enable models to consistently provide well-structured responses to questions and instructions. Enabling models to perform this task requires further fine-tuning on datasets with chat exchanges or question/answer pairs. Fine-tuning helps the model to generate outputs that are aligned with the specific requirements of the task.
Many LLMs are released in two variants: a "base" model, which is the pretrained model without any task-specific fine-tuning, and an "instruction" model, which has been fine-tuned to better handle specific types of prompts, such as questions or instructions.
LLMs that are fine-tuned for conversations or instruction following require specially-formatted prompts. This is the subject of the next section.
You have probably used AI chat interfaces such as ChatGPT, Claude, or the Databricks AI Playground before. The user enters some text in the box at the bottom, and the model responds. You might think that the user’s input in the text box is the prompt. In fact, at each step of the conversation, the prompt includes not just the user’s input but also the entire conversation history leading up to it.
The above image shows the Databricks AI Playground interface. Here’s what the corresponding prompt looks like:
<|im_start|>system
You are a helpful assistant. Follow the user's instructions to the best of your ability.<|im_end|>
<|im_start|>user
Tell me a riddle.<|im_end|>
<|im_start|>assistant
Sure, I'd be happy to! Here's a riddle for you:
I am taken from a mine, and shut up in a wooden case, from which I am never released, and yet I am used by almost every person. What am I?
Do you know the answer?<|im_end|>
<|im_start|>user
I don't know, tell me!<|im_end|>
<|im_start|>assistant
The entire conversation, including some special tokens structuring the exchange (<|im_start|> and <|im_end|>) and a system message establishing a persona and instructions for the model, is one continuous prompt. The prompt will grow with each new exchange in the conversation as new user and AI responses are appended to the prompt.
Different models will expect different prompt formats depending on how they were fine-tuned. That said, it is seldom necessary to apply this formatting yourself when interacting with LLMs via chat interfaces, API calls, or client libraries such as the databricks-genai-inference SDK. These interfaces generally abstract away the differences between expected prompt formats. Python SDKs for LLM APIs, for example, generally expect a list of dictionaries with “role” and “content” keys, with roles alternating between “user” and “assistant.” These are then mapped to the model’s expected format.
Often hidden from the end user, the system message is an important component of chat/instruction prompts. It provides useful context, framing, and constraints to focus the model’s responses on the desired format, persona, domain, or task relevant to the task at hand.
Chat interfaces demonstrate that what the user types into a chat box and what the model ultimately processes are often not identical. User inputs are typically placed in a prompt template that includes additional contextual and structural information needed by the model to produce useful responses.
The examples in the rest of this document will focus on chat/instruction-formatted prompts. Most of the models available via the Foundation Models API, including DBRX, Llama 3, and Mixtral-8x7B, have been specially trained to recognize structured chats and respond appropriately. The structure of the final prompt may change from model to model: different models, for example, use different special tokens to indicate the roles and the message boundaries. In each case, the end result is a single string that contains the entire conversation history and that the model uses to generate its next response.
In summary, in chat- and instruction-based models, a prompt includes the entire conversation history, a system message, and various structural cues that guide the model toward producing coherent and contextually-relevant responses.
Once the model receives the complete prompt, it begins the process of generating a response. The model:
How are LLMs able to generate useful and coherent responses if they only come up with one token at a time, without some overarching idea or narrative they want to communicate? The answer lies in the model training process.
LLMs are trained on enormous datasets of diverse text data. During training, the models learn complex relationships within and between words and sequences of words. They learn what tokens, words, and sequences of words are most likely to follow in the context of other texts. This training is extensive: DBRX, for example, was trained on 12 trillion tokens of text data. Grounded in this training and in the context of the prompt and prior tokens, LLMs can generate new tokens that maintain coherence and make sense in context.
Future posts will get into the details of specific prompting methods, but now that you’ve learned a little bit about the journey of a prompt through an LLM, there are some lessons you can start to apply to your prompt writing.
Different models will respond differently to the same prompt. LLMs won’t always respond in the same way to the same prompts. If you are switching between models, you should expect to need to tweak the prompts to preserve or enhance output quality.
In this first part of a series on writing effective prompts, you have learned about the journey of a prompt through a model and gained some insights into how this journey relates to the practice of prompting. A point that bears repeating as we close out this first article is that, while there are plenty of guidelines for writing good prompts, there is no universal list of the “best prompt” for each type of task. Prompting requires trial and error. Understanding how models work will give you better intuition about what types of prompts are best suited to your particular goals, but there’s no substitute for experimentation. The best way to learn to prompt is to write prompts.
The Databricks AI Playground is a great place to get started: you can compare different prompts, models, system messages, generation parameters, and more to start developing a more deliberate and discerning approach to prompting.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.