← Back to ml

Instruction Hierarchy in Frontier LLMs

Learn about how AI systems prioritize instructions from different sources

IntermediateAI SafetyLLMsPrompt Injection

What is Instruction Hierarchy?

AI Systems like ChatGPT receive instructions from multiple sources. They have to prioritize information from the more verified sources. For example, asking ChatGPT to find the name of a certain food in a PDF makes ChatGPT:

  1. Check its system instructions
  2. Look at the user's prompt
  3. Scan the PDF
  4. Look at the Web

It needs to have an "instruction hierarchy", where it prioritizes data streams in a specific order. For OpenAI, they use System > Developer > User > Tool, where higher priority instructions are more trusted.

Why is Large Scale Instruction Training Hard?

The IH-Challenge

This is why the IH-Challenge was designed by the OpenAI team. They created a reinforcement learning training dataset to address those problems, and made sure that the dataset was:

  1. Simple(instruction-wise)
  2. Objectively gradable
  3. No trivial shortcuts

A model was then trained on this dataset, and the results were:

Why Does This Matter?

This is what makes this approach especially compelling for safety, because it directly improves safety without impacting any other metrics.

Research

  1. This is the original Instruction Hierarchy paper by OpenAI! Link