Chapter 4: Advanced Prompting Techniques#
Why it matters#
Standard prompt engineering, clear instructions and a few examples, handles most everyday tasks. But difficult questions require an LLM to follow complex instructions and reason over several steps, and there standard techniques fall short. Few-shot examples eat up the limited context window; a single chain of reasoning can just as easily be wrong as right; and models can be nudged into unsafe outputs. This chapter introduces three techniques that push reasoning further: chain-of-thought, self-consistency, and tree-of-thought. Each builds on the last, and together they form a ladder from a single guess to a deliberate search over possibilities.
A quick review#
Before adding new tools, recall the good practices from Chapter 3: write clear, specific, positive instructions; highlight what matters; separate instruction, content, and output directions; use examples (in-context learning); and iterate. The advanced techniques here do not replace those habits; they sit on top of them.
Chain-of-thought prompting#
Chain-of-thought (CoT) prompting breaks a complex task into intermediate reasoning steps, encouraging the model to explain its reasoning rather than jump to an answer [Wei et al., 2022]. Decomposing the problem into a series of steps tends to produce more accurate final answers, and it is the foundation the other two techniques build on.
There are two common ways to trigger it.
Zero-shot CoT#
Simply append an instruction like “Let’s think step by step” to the prompt. This alone elicits a sequential reasoning chain [Kojima et al., 2022].
Worked example: the juggler problem
Standard (zero-shot):
Q: A juggler can juggle 16 balls. Half of the balls are golf balls and half
of the golf balls are blue. How many blue golf balls are there?
A: The answer (in numerals) is
-> Output: 8 (incorrect)
Zero-shot CoT:
Q: A juggler can juggle 16 balls. Half of the balls are golf balls and half
of the golf balls are blue. How many blue golf balls are there?
A: Let's think step by step.
-> Output: There are 16 balls in total. Half are golf balls, so there are 8
golf balls. Half of the golf balls are blue, so there are 4 blue golf
balls. (correct)
The only change is the four-word cue, yet it turns a wrong answer into a right one by forcing the intermediate arithmetic into the open.
Few-shot CoT#
Show the model worked examples that include the reasoning, not just the answer. Seeing the explanation of how each example was solved leads the model to produce its own reasoned answer, often improving accuracy further.
Benefits and limitations#
CoT helps because LLMs benefit from detailed, logical steps. You can facilitate it by adding small logical steps to the prompt, by giving demonstrations that pair a question with a reasoning chain and answer, or with cues like “Let’s think step by step.” It particularly improves arithmetic, common-sense, and symbolic reasoning, with the largest gains appearing in models around 100 billion parameters [Wei et al., 2022].
It is not free, though:
CoT prompts, especially few-shot ones, are specific to a problem type.
Smaller models may produce fluent but illogical chains, hurting performance.
Generating the extra reasoning increases cost.
Choosing a technique
A rough guide:
Zero-shot when the model can do the task from pre-training alone.
One-shot when the output must follow a structure shown by one example.
Few-shot when a structure is best demonstrated by several examples.
Chain-of-thought when the task requires reasoning before answering.
How decoding produces text#
To understand the next two techniques, it helps to know how a model turns probabilities into words. Decoder-only models generate text one word at a time. At each step the model computes logits, scores assigned to every token in its vocabulary, forming a probability distribution. Decoding is the process of turning those logits into actual text. There are two broad styles:
Greedy decoding |
Stochastic decoding |
|---|---|
Deterministic: always pick the highest-probability token. |
Non-deterministic: sample the next token from the probability distribution. |
Fast; no need to track multiple sequences. |
The sampled token is not guaranteed to be the single most likely one. |
Can get stuck in repetitive loops; output is not “creative.” |
Allows greater diversity in output. |
Equivalent to sampling with |
Controlled by temperature, top-p, and top-k. |
The fact that stochastic decoding produces different reasoning paths on each run is exactly what the next technique exploits.
Self-consistency#
Self-consistency builds directly on chain-of-thought [Wang et al., 2023]. The idea: instead of trusting one chain of reasoning, generate multiple, diverse reasoning paths through few-shot CoT, then take a majority vote over their final answers. By exploring several possibilities and keeping the most common answer, the model becomes more reliable on arithmetic and common-sense reasoning.
Intuition: model ensembling
Self-consistency is reminiscent of ensembling in classical machine learning, where you average many models to reduce error. Here you “ensemble” many reasoning paths from a single model and let them vote. It works best on tasks that have a single correct answer, such as quantitative business questions.
Its limitations mirror its strengths:
It costs more compute than plain CoT. In practice you generate a small number of paths (say 5-10); performance usually saturates quickly.
If most reasoning paths are wrong, the majority vote is wrong too, so it cannot rescue a model that fundamentally misunderstands the task.
A practical refinement: use self-consistency to generate higher-quality training data, fine-tune on it, then get improved accuracy from a single inference run.
Tree-of-thought#
Tree-of-thought (ToT) generalizes chain-of-thought one step further [Yao et al., 2023]. Rather than following a single line of reasoning (CoT) or several independent lines (self-consistency), ToT guides the model to generate, evaluate, expand on, and choose among multiple candidate solutions, much as a person weighs options before committing.
ToT builds a tree in which each “thought” is a coherent chunk of language representing an intermediate step toward a solution. The model:
generates thoughts (via CoT prompting),
adds tree-branching with breadth-first and depth-first search,
and thereby explores systematically, with look-ahead and back-tracking.
This makes ToT well suited to problems with important early decisions and a need to explore alternatives, creative writing such as ad-copy generation, mathematical reasoning, and puzzles like crosswords.
The progression at a glance
Input-Output: ask, get one answer.
Chain-of-thought: ask, get one reasoned answer.
Self-consistency: generate several reasoned answers, take the majority vote.
Tree-of-thought: branch into many thoughts, evaluate and search among them, back-tracking when a branch looks unpromising.
Each step trades more computation for more reliable reasoning.
Benefits and limitations of ToT#
ToT shines on tasks that hinge on initial decisions, planning for the future, and exploring multiple solutions. But deliberate search is not always necessary: for tasks that strong models already handle easily, it adds cost without benefit. ToT also requires more resources than sampling methods, though the flexibility of how you build the tree lets you tune the cost-performance trade-off.
A note on safety#
Advanced reasoning prompts make models more capable, but capability cuts both ways. The same flexibility that lets you elicit step-by-step reasoning can be abused to coax models past their safeguards if protections are not in place. This is the bridge to Module 2, where evaluating, securing, and responsibly deploying LLMs takes center stage.
In the news#
Reasoning has become the frontier of model development. A wave of reasoning models now performs extended, structured “thinking” internally before answering, effectively building chain-of-thought and search into the model rather than relying solely on the prompt. The techniques in this chapter are the conceptual ancestors of that work: chain-of-thought, self-consistency voting, and tree-style search reappear, now partly automated inside the models themselves. Understanding them by hand is the best way to understand what those systems do.
Hands-on labs#
Implement these techniques on Amazon Bedrock in Lab 4a: Self-Consistency, which generates multiple reasoning paths and votes on the answer, and Lab 4b: Tree-of-Thought, which builds and searches a tree of thoughts.
Key takeaways#
Chain-of-thought elicits step-by-step reasoning (“Let’s think step by step”) and improves accuracy on reasoning tasks, especially in large models.
Self-consistency samples several diverse CoT paths and takes a majority vote, like ensembling, at the cost of extra compute.
Tree-of-thought branches into many thoughts and searches among them with look-ahead and back-tracking, best for problems needing exploration.
Greater reasoning capability raises safety stakes, motivating responsible AI in Module 2.
The final chapter of this module extends prompting beyond text into images and other modalities.