Prompt Engineering
Prompt Engineering
- developing, designing, and optimizing prompts to enhance the output of FMs for your needs
- Improved Prompting technique consists of:
- Instructions – a task for the model to do (description, how the model should perform)
- Context – external information to guide the model
- Input data – the input for which you want a response
- Output Indicator – the output type or format
- Negative Prompting
- A technique where you explicitly instruct the model on what not to include or do in its response
- Negative Prompting helps to:
- Avoid Unwanted Content
- Maintain Focus
- Enhance Clarity – prevents the use of complex terminology or detailed data, making
the output clearer and more accessible

Prompt Performance Optimization
- System Prompts – how the model should behave and reply
- Temperature (0 to 1) – creativity of the model’s output
- Low (ex: 0.2) – outputs are more conser vative, repetitive, focused on most likely response
- High (ex: 1.0) – outputs are more diverse, creative, and unpredictable, maybe less coherent
- Top P (0 to 1)
- You set a value for “p” (e.g., 0.9). The model considers the most probable tokens in order until their probabilities add up to or exceed ‘p’.
- Low P (ex: 0.25) – consider the 25% most likely words, will make a more coherent response
- High P (ex: 0.99) – consider a broad range of possible words, possibly more creative and diverse output
- You set a value for “p” (e.g., 0.9). The model considers the most probable tokens in order until their probabilities add up to or exceed ‘p’.
- Top K – limits the number of probable words
- The model selects the next word from a fixed number of the most probable options
- During generation, the model predicts a probability distribution over the vocabulary for the next token. Instead of sampling from the entire vocabulary, Top K sampling only considers the top K most probable tokens. For instance, if K = 50, only the 50 most probable tokens are considered, and the rest are discarded. The model then randomly selects one of these tokens based on their probabilities.
- Low K (ex: 10) – more coherent response, less probable words
- High K (ex: 500) – more probable words, more diverse and creative
- Length – maximum length of the answer
- Stop Sequences – tokens that signal the model to stop generating output
- Temperature increases variety, while Top-P and Top-K reduce variety and focus samples on the model’s top predictions.
Latency
- how fast the model responds
- impacted by a few parameters:
- The model size
- The model type itself (Llama has a different performance than Claude)
- The number of tokens in the input (the bigger the slower)
- The number of tokens in the output (the bigger the slower)
- Latency is not impacted by Top P, Top K, Temperature
Zero-Shot Prompting
- Present a task to the model without providing examples or explicit training for that specific task
- fully rely on the model’s general knowledge
- The larger and more capable the FM, the more likely you’ll get good results
Few-Shots Prompting
- Provide examples of a task to the model to guide its output
- We provide a “few shots” to the model to perform the task
- If you provide one example only, this is also called “one-shot” or “single-shot”
Chain of Thought Prompting
- Divide the task into a sequence of reasoning steps, leading to more structure and coherence
- Using a sentence like “Think step by step” helps
- Helpful when solving a problem as a human usually requires several steps
- Can be combined with Zero-Shot or Few-Shots Prompting