developing, designing, and optimizing prompts to enhance the output of FMs for your needs
Improved Prompting technique consists of:
Instructions – a task for the model to do (description, how the model should perform)
Context – external information to guide the model
Input data – the input for which you want a response
Output Indicator – the output type or format
Negative Prompting
A technique where you explicitly instruct the model on what not to include or do in its response
Negative Prompting helps to:
Avoid Unwanted Content
Maintain Focus
Enhance Clarity – prevents the use of complex terminology or detailed data, making the output clearer and more accessible
Prompt Performance Optimization
System Prompts – how the model should behave and reply
Temperature (0 to 1) – creativity of the model’s output
Low (ex: 0.2) – outputs are more conser vative, repetitive, focused on most likely response
High (ex: 1.0) – outputs are more diverse, creative, and unpredictable, maybe less coherent
Top P (0 to 1)
You set a value for “p” (e.g., 0.9). The model considers the most probable tokens in order until their probabilities add up to or exceed ‘p’.
Low P (ex: 0.25) – consider the 25% most likely words, will make a more coherent response
High P (ex: 0.99) – consider a broad range of possible words, possibly more creative and diverse output
Top K – limits the number of probable words
The model selects the next word from a fixed number of the most probable options
During generation, the model predicts a probability distribution over the vocabulary for the next token. Instead of sampling from the entire vocabulary, Top K sampling only considers the top K most probable tokens. For instance, if K = 50, only the 50 most probable tokens are considered, and the rest are discarded. The model then randomly selects one of these tokens based on their probabilities.
Low K (ex: 10) – more coherent response, less probable words
High K (ex: 500) – more probable words, more diverse and creative
Length – maximum length of the answer
Stop Sequences – tokens that signal the model to stop generating output
Temperature increases variety, while Top-P and Top-K reduce variety and focus samples on the model’s top predictions.
Latency
how fast the model responds
impacted by a few parameters:
The model size
The model type itself (Llama has a different performance than Claude)
The number of tokens in the input (the bigger the slower)
The number of tokens in the output (the bigger the slower)
Latency is not impacted by Top P, Top K, Temperature
Zero-Shot Prompting
Present a task to the model without providing examples or explicit training for that specific task
fully rely on the model’s general knowledge
The larger and more capable the FM, the more likely you’ll get good results
Few-Shots Prompting
Provide examples of a task to the model to guide its output
We provide a “few shots” to the model to perform the task
If you provide one example only, this is also called “one-shot” or “single-shot”
Chain of Thought Prompting
Divide the task into a sequence of reasoning steps, leading to more structure and coherence
Using a sentence like “Think step by step” helps
Helpful when solving a problem as a human usually requires several steps
Can be combined with Zero-Shot or Few-Shots Prompting