23. AI – Prompt Engineering & Agentic AI

Prompt Engineering

Prompt Engineering

  • developing, designing, and optimizing prompts to enhance the output of FMs for your needs
  • Improved Prompting technique consists of:
    • Instructions – a task for the model to do (description, how the model should perform)
    • Context – external information to guide the model
    • Input data – the input for which you want a response
    • Output Indicator – the output type or format
  • Negative Prompting
    • A technique where you explicitly instruct the model on what not to include or do in its response
    • Negative Prompting helps to:
      • Avoid Unwanted Content
      • Maintain Focus
      • Enhance Clarity – prevents the use of complex terminology or detailed data, making
        the output clearer and more accessible

Prompt Performance Optimization

  • System Prompts – how the model should behave and reply
  • Temperature (0 to 1) – creativity of the model’s output
    • Low (ex: 0.2) – outputs are more conser vative, repetitive, focused on most likely response
    • High (ex: 1.0) – outputs are more diverse, creative, and unpredictable, maybe less coherent
  • Top P (0 to 1)
    • You set a value for “p” (e.g., 0.9). The model considers the most probable tokens in order until their probabilities add up to or exceed ‘p’. 
      • Low P (ex: 0.25) – consider the 25% most likely words, will make a more coherent response
      • High P (ex: 0.99) – consider a broad range of possible words, possibly more creative and diverse output
  • Top K – limits the number of probable words
    • The model selects the next word from a fixed number of the most probable options
    • During generation, the model predicts a probability distribution over the vocabulary for the next token. Instead of sampling from the entire vocabulary, Top K sampling only considers the top K most probable tokens. For instance, if K = 50, only the 50 most probable tokens are considered, and the rest are discarded. The model then randomly selects one of these tokens based on their probabilities.
      • Low K (ex: 10) – more coherent response, less probable words
      • High K (ex: 500) – more probable words, more diverse and creative
  • Length – maximum length of the answer
  • Stop Sequences – tokens that signal the model to stop generating output
  • Temperature increases variety, while Top-P and Top-K reduce variety and focus samples on the model’s top predictions.

Latency

  • how fast the model responds
  • impacted by a few parameters:
    • The model size
    • The model type itself (Llama has a different performance than Claude)
    • The number of tokens in the input (the bigger the slower)
    • The number of tokens in the output (the bigger the slower)
  • Latency is not impacted by Top P, Top K, Temperature

Zero-Shot Prompting

  • Present a task to the model without providing examples or explicit training for that specific task
  • fully rely on the model’s general knowledge
  • The larger and more capable the FM, the more likely you’ll get good results

Few-Shots Prompting

  • Provide examples of a task to the model to guide its output
  • We provide a “few shots” to the model to perform the task
  • If you provide one example only, this is also called “one-shot” or “single-shot”

Chain of Thought Prompting

  • Divide the task into a sequence of reasoning steps, leading to more structure and coherence
  • Using a sentence like “Think step by step” helps
  • Helpful when solving a problem as a human usually requires several steps
  • Can be combined with Zero-Shot or Few-Shots Prompting


Agentic AI

Agents in Bedrock

Multi-agent workflows

Strands Agents

AgentCore

Humans in the Loop

Amazon Q