Databricks Community

Daniel-Liden · ‎07-16-2024

This is the first part of a guide on writing prompts for models accessible via the Databricks Foundation Model API, such as DBRX and Llama 3. Prompting, whether in the context of interacting with a chat-based AI application or deeply integrated with the codebase of an AI-based application, is central to how we get useful responses from large language models (LLMs). In this series, we will share a collection of prompting tools and techniques, all grounded in intuition about how LLMs work.

There are already plenty of very good LLM prompting guides available online. Why are we writing another one? We think that, in addition to cataloging useful techniques, there is room for a prompting guide that connects specific techniques to a broader understanding of the structure and behavior of LLMs. You should come away from this series with a strong intuition of how models work and how to write prompts to solve your specific problems, not with a memorized list of prompt types.

This guide will also focus on using prompting to solve specific problems. There isn’t always an obvious, correct prompt for each situation. Working with LLMs is an empirical exercise: learning how to compare and test prompts is at least as important as being able to pick the right kind of prompt.

Introduction: The Lifecycle of a Prompt

This introductory post will expand on the concept of a “prompt” and explore the journey of a prompt through a model. The post will help to build some intuition about how LLMs use prompts to generate output. In particular, the post will cover:

What, exactly, is a prompt with a focus on chat-formatted prompts
How the model uses the prompt to generate outputs
Some basic principles for writing a good prompt are based on an understanding of how the model uses prompts

Future posts in this series will go into more detail about specific techniques and why they work.

What is a Prompt?

A prompt is an input that an LLM uses as a starting point for generating a response. The model continues from wherever the prompt leaves off, with the goal of generating new text that makes sense in the context of the prompt. Prompts are the primary mechanism by which users and developers interact with LLMs and steer their behaviors. Well-designed prompts can direct models to complete a wide range of tasks, such as classification, summarization, open-ended chat, and text revision. Prompts can also guide models’ communication styles, enabling them to adopt different personas or tones. You can even prompt models to return structured output, such as JSON or XML, which is essential when LLM outputs are used programmatically and integrated into other applications and systems.

prompts diagram@2x.png

Prompts can take many different forms depending on the user’s goal and on the tasks the model was trained to complete. While all LLM outputs are generated through an underlying text-completion mechanism, the specific form and purpose of the prompt can make the interaction with the model resemble different types of tasks.

Some examples include:

Text completion: A prompt may provide the first part of a text that the user wants the model to complete, such as “The quick brown fox jumps over the” with the expectation that the model will finish the incomplete text. Again, all LLM responses are generated through a prompt completion mechanism, but only some prompts have the explicit form of a completion task.
Question Answering: Prompts can take the form of questions or instructions, such as “What is the weather today?” or “Tell me today’s forecast.”
Multi-Component Prompts: Prompts can be composed of multiple parts, such as a user question and an article or book excerpt. Usually these different components are assembled together in a template, along with instructions regarding how the model should address the user’s input based on the provided external information.
Multimedia Prompts: Certain models can use audio, video, and images as prompts or components of prompts.

Anything that the model uses as the seed for completion is a prompt: it is a very general concept. However, different models can and usually do respond to the same prompt with different outputs, largely as a result of differences in training data.

LLMs are initially trained (pretrained) on text completion, which equips them with a strong aptitude for generating and manipulating text. Pretraining does not, however, typically enable models to consistently provide well-structured responses to questions and instructions. Enabling models to perform this task requires further fine-tuning on datasets with chat exchanges or question/answer pairs. Fine-tuning helps the model to generate outputs that are aligned with the specific requirements of the task.

Many LLMs are released in two variants: a "base" model, which is the pretrained model without any task-specific fine-tuning, and an "instruction" model, which has been fine-tuned to better handle specific types of prompts, such as questions or instructions.

LLMs that are fine-tuned for conversations or instruction following require specially-formatted prompts. This is the subject of the next section.

Chat-formatted prompts

You have probably used AI chat interfaces such as ChatGPT, Claude, or the Databricks AI Playground before. The user enters some text in the box at the bottom, and the model responds. You might think that the user’s input in the text box is the prompt. In fact, at each step of the conversation, the prompt includes not just the user’s input but also the entire conversation history leading up to it.

The above image shows the Databricks AI Playground interface. Here’s what the corresponding prompt looks like:

<|im_start|>system
You are a helpful assistant. Follow the user's instructions to the best of your ability.<|im_end|>
<|im_start|>user
Tell me a riddle.<|im_end|>
<|im_start|>assistant
Sure, I'd be happy to! Here's a riddle for you:

I am taken from a mine, and shut up in a wooden case, from which I am never released, and yet I am used by almost every person. What am I?

Do you know the answer?<|im_end|>
<|im_start|>user
I don't know, tell me!<|im_end|>
<|im_start|>assistant

The entire conversation, including some special tokens structuring the exchange (<|im_start|> and <|im_end|>) and a system message establishing a persona and instructions for the model, is one continuous prompt. The prompt will grow with each new exchange in the conversation as new user and AI responses are appended to the prompt.

Different models will expect different prompt formats depending on how they were fine-tuned. That said, it is seldom necessary to apply this formatting yourself when interacting with LLMs via chat interfaces, API calls, or client libraries such as the databricks-genai-inference SDK. These interfaces generally abstract away the differences between expected prompt formats. Python SDKs for LLM APIs, for example, generally expect a list of dictionaries with “role” and “content” keys, with roles alternating between “user” and “assistant.” These are then mapped to the model’s expected format.

Often hidden from the end user, the system message is an important component of chat/instruction prompts. It provides useful context, framing, and constraints to focus the model’s responses on the desired format, persona, domain, or task relevant to the task at hand.

Chat interfaces demonstrate that what the user types into a chat box and what the model ultimately processes are often not identical. User inputs are typically placed in a prompt template that includes additional contextual and structural information needed by the model to produce useful responses.

chat interfaces diagram@2x.png

The examples in the rest of this document will focus on chat/instruction-formatted prompts. Most of the models available via the Foundation Models API, including DBRX, Llama 3, and Mixtral-8x7B, have been specially trained to recognize structured chats and respond appropriately. The structure of the final prompt may change from model to model: different models, for example, use different special tokens to indicate the roles and the message boundaries. In each case, the end result is a single string that contains the entire conversation history and that the model uses to generate its next response.

In summary, in chat- and instruction-based models, a prompt includes the entire conversation history, a system message, and various structural cues that guide the model toward producing coherent and contextually-relevant responses.

What does an LLM do with a prompt?

Once the model receives the complete prompt, it begins the process of generating a response. The model:

response generator diagram@2x.png

Reads the prompt: The model “reads” the whole input prompt. The prompt provides a frame of reference the model uses to generate its response.
Generates the first token: Based on the prompt, the model generates a single token. A token is a word, part of a word, punctuation, or a special structural indicator like the <|im_start|> token we saw above. The model uses the prompt to generate a token that makes sense in the context of the prompt and that maintains logical, grammatical, and narrative coherence with the prompt.
Expands the prompt with the new token: the new token is appended to the original prompt, essentially forming a new prompt.
Generates the next token: based on the original prompt and the first token, the model generates a second token, again seeking to maintain contextual consistency and coherence.
Repeats until finished: The process of generating a new token based on the original prompt and the previously-generated tokens continues until the model generates a special token indicating the end of its input, or it generates a user-specified stop token, or it generates a certain predefined number of tokens. LLMs generate responses in an autoregressive manner: each output token is a function of the prompt and previously-generated tokens.

How are LLMs able to generate useful and coherent responses if they only come up with one token at a time, without some overarching idea or narrative they want to communicate? The answer lies in the model training process.

LLMs are trained on enormous datasets of diverse text data. During training, the models learn complex relationships within and between words and sequences of words. They learn what tokens, words, and sequences of words are most likely to follow in the context of other texts. This training is extensive: DBRX, for example, was trained on 12 trillion tokens of text data. Grounded in this training and in the context of the prompt and prior tokens, LLMs can generate new tokens that maintain coherence and make sense in context.

Lessons for Prompting

Future posts will get into the details of specific prompting methods, but now that you’ve learned a little bit about the journey of a prompt through an LLM, there are some lessons you can start to apply to your prompt writing.

Prompting sets the frame of reference the model uses to generate responses. Prompting aims to elicit some information or behavior from the model. An effective prompt starts the model down a path that leads toward the desired output. A vague or poorly-formed prompt leaves too many possible paths open; the model can’t determine the relevant one. Setting constraints is a big part of prompting: you want to close off paths that lead away from the goal we’re working toward.
Prompts are often made up of more than just the end user’s input. Many AI applications, such as ChatGPT, take user input and provide some output. The input is often inserted into a template that may include additional features like the chat history, outside context, or special instructions that are hidden from the user. It is important to understand this distinction when making the leap from using AI applications to developing them. The final prompt that the model sees may be composed of multiple parts, and the structure, formatting, and ordering of those parts can affect the output.
The order in which a model generates results matters. Earlier tokens establish the context for later tokens. At each step, the model finds the most likely token given the prior tokens. Thus, for example, if you ask a model to provide an answer and then explain how it obtained that answer, the explanation will be based on the answer; the answer will not be based on the explanation. Conversely, if you ask a model to explain how to solve a problem and then provide the answer, the final answer will be informed by the context provided by the explanation.
Different models will respond differently to the same prompt. LLMs won’t always respond in the same way to the same prompts. If you are switching between models, you should expect to need to tweak the prompts to preserve or enhance output quality.

Conclusion

In this first part of a series on writing effective prompts, you have learned about the journey of a prompt through a model and gained some insights into how this journey relates to the practice of prompting. A point that bears repeating as we close out this first article is that, while there are plenty of guidelines for writing good prompts, there is no universal list of the “best prompt” for each type of task. Prompting requires trial and error. Understanding how models work will give you better intuition about what types of prompts are best suited to your particular goals, but there’s no substitute for experimentation. The best way to learn to prompt is to write prompts.

The Databricks AI Playground is a great place to get started: you can compare different prompts, models, system messages, generation parameters, and more to start developing a more deliberate and discerning approach to prompting.

Databricks Community

Foundation Models API Prompting Guide 1: Lifecycle of a Prompt

Introduction: The Lifecycle of a Prompt

What is a Prompt?

Chat-formatted prompts

What does an LLM do with a prompt?

Lessons for Prompting

Conclusion

Metadata-Driven ETL Framework in Databricks (Part-1)

Best practices for safe data experimentation with Databricks

Top 10 query performance tuning tips for Databricks Serverless SQL