Principle 3: Context and Scaffolding Determine Outcomes
Most of the time, when AI falls short, the problem is that the task wasn’t framed well. The very same model that gives you bland or useless output in one setup can produce work that’s genuinely helpful in another, simply because the way you described and structured the task changed.
What this principle governs and why it matters
When AI systems underperform, the instinct is usually to blame the model. The technology wasn't ready. The system isn't smart enough. We need a more powerful version. This explanation is comforting because it places the problem outside our control and promises that progress will eventually solve it.
But in practice, most AI failures aren't failures of capability. They're failures of framing. The same model that produces useless output in one configuration produces genuinely valuable work in another—not because anything changed in the system, but because something changed in how the task was presented, bounded, and structured.
This principle governs the relationship between input quality and output quality, and it matters because that relationship is far tighter than most users assume. AI systems are not autonomous reasoners that can compensate for vague instructions or missing context. They are pattern-completion engines that operate on what they're given. The quality of what they produce is a direct function of the scaffolding you provide. Get the scaffolding wrong, and no amount of model capability will save you.
Understanding this principle shifts responsibility in uncomfortable ways. It means that when AI output disappoints, the first question should not be "is this model good enough?" but rather "did I give this model what it needed to succeed?"
The same model, two outcomes
A legal team at a technology company is exploring whether AI can help with contract review. They have a backlog of vendor agreements that need to be checked for specific risk provisions—liability caps, indemnification clauses, data handling requirements. The work is important but tedious, and the team is stretched thin.
The first attempt is straightforward. They take a contract, paste it into an AI system, and ask: "What are the key risks in this agreement?" The response is underwhelming. The system produces a generic summary that could apply to almost any commercial contract. It mentions that liability provisions exist without analyzing whether they're favorable or problematic. It flags data handling as a topic without noting that the specific terms fall below company policy. The output is technically accurate but practically useless—it tells the team nothing they couldn't see from skimming the document themselves.
The conclusion seems obvious: the AI isn't sophisticated enough for legal work. The team sets the project aside.
Months later, a new associate joins the team. She's worked with AI systems extensively and asks to revisit the experiment. But she approaches it differently. Before submitting any contract, she builds a structured prompt that includes the company's specific risk thresholds, the particular clauses that matter most, examples of language that's acceptable versus problematic, and a format for how the analysis should be presented. She breaks the task into stages: first extract the relevant provisions, then compare them against the standards, then flag deviations with specific references to the contract language.
Using the same model that had failed months earlier, she produces analyses that the senior attorneys find genuinely useful. The system now identifies specific gaps, quotes the problematic language, and explains why particular provisions fall outside policy. It still requires human review—the attorneys catch nuances the system misses—but the output provides a solid starting point that saves hours of initial review time.
The model didn't improve. The framing did.
What changed was not the AI's capability but the structure surrounding it. The first attempt asked the system to do something impossibly broad: assess risk without knowing what counts as risk for this company, in this context, against these standards. The second attempt gave the system everything it needed to produce a useful answer—the criteria, the format, the examples, the constraints. The scaffolding made the difference.
The principle, unpacked
AI performance is a direct function of the structure, constraints, and context provided. Poor framing produces unreliable output regardless of model capability.
This principle runs counter to a persistent fantasy about AI: that these systems understand what we want and can figure out how to deliver it, much as a capable human colleague might take a vague request and return something thoughtful. We're encouraged in this fantasy by the conversational interface, by the fluent responses, by the occasional moments when the system seems to grasp intent from minimal input. But the fantasy breaks down under pressure. When the task is complex, when the stakes are real, when precision matters, the gap between what we asked for and what we meant becomes the determining factor in whether the output is useful.
AI systems do not read minds. They do not infer context that wasn't provided. They do not know your standards unless you articulate them, your preferences unless you demonstrate them, your constraints unless you name them. They work with what's in the prompt and what's in their training data, and they blend these together in ways shaped by probability rather than understanding. When the prompt is thin, the output draws more heavily on generic patterns from training—which is why vague requests produce generic responses.
This is not a flaw to be engineered away. It's a fundamental characteristic of how these systems work. They are, at their core, sophisticated autocomplete—extraordinarily sophisticated, trained on vast human knowledge, capable of remarkable feats of synthesis and generation, but still operating by predicting what should come next given what came before. The quality of that prediction depends critically on the quality of what came before.
Scaffolding, in this context, means all the structure you provide to shape the AI's output: the framing of the task, the constraints you impose, the examples you offer, the format you request, the criteria you specify, the context you include. Good scaffolding doesn't just tell the system what to do; it tells the system how to do it, what success looks like, what to prioritize, what to avoid. It reduces the space of possible outputs from "anything plausible" to "something useful for this specific purpose."
The difference between good and poor scaffolding can be dramatic. A prompt that says "summarize this document" invites a generic summary that may or may not capture what you care about. A prompt that says "summarize this document focusing on the three main arguments, their supporting evidence, and any acknowledged limitations, in no more than 300 words" produces something you can actually use. The information required to produce both summaries was always in the document. The difference is whether you gave the system enough structure to know which information mattered.
This principle has implications for how we think about AI capability. Much of the public discourse treats capability as a property of the model—this model is smarter, that model is more powerful, the next version will be better. And there's truth to this; models do differ in their underlying capacity. But in practice, for most tasks, the constraint is not model capability. The constraint is whether the human using the model has provided adequate scaffolding. A well-framed prompt to a less powerful model often outperforms a poorly framed prompt to a more powerful one.
This means that skill in working with AI is not primarily about finding the best model. It's about learning to construct effective scaffolding—learning what context the system needs, how to break complex tasks into manageable pieces, how to provide examples that guide without constraining too narrowly, how to specify success criteria clearly enough that you can evaluate whether they've been met.
This skill is not intuitive. Most people, when they first use AI systems, prompt them the way they would ask a question of a knowledgeable colleague: briefly, conversationally, with the implicit expectation that the recipient will fill in the gaps. This works with humans because humans share context, can ask clarifying questions, and have models of what the asker probably wants. It fails with AI because none of those conditions hold. The system takes your prompt literally and fills gaps with generic patterns rather than your specific needs.
Learning to scaffold well requires a kind of explicitness that feels unnatural at first. You have to articulate things you would normally leave unsaid. You have to specify criteria you would normally assume were obvious. You have to provide context that a human colleague would already know. This feels tedious until you see the difference it makes in output quality—and then it starts to feel like the only way to work.
There's also a structural dimension to scaffolding that goes beyond individual prompts. Complex tasks often require breaking the work into stages, where the output of one step becomes input to the next. A single prompt asking an AI to "analyze this dataset and write a report on the findings" is likely to produce something superficial. A sequence of prompts—first explore the data and identify patterns, then evaluate which patterns are significant, then draft findings on the three most important patterns, then synthesize into a coherent narrative—produces something substantially better. The scaffolding here is not just in the prompts but in the workflow, the decomposition of a complex task into pieces the system can handle well.
The question that remains
This principle places significant responsibility on the person using the AI. If output quality depends on scaffolding quality, then disappointing results are at least partly a reflection of how the task was framed. This is uncomfortable. It's easier to blame the tool than to examine your own practice.
But the discomfort points toward something generative. If scaffolding determines outcomes, then improving outcomes is within your control. You don't have to wait for a better model. You don't have to hope the technology catches up to your needs. You can experiment with framing, iterate on structure, learn what kinds of context make the biggest difference for the tasks you care about.
The organizations that extract the most value from AI are not necessarily those with access to the most powerful models. They're the ones that have developed institutional knowledge about how to frame tasks effectively—that have built libraries of prompts and templates, trained their people in scaffolding techniques, created feedback loops that help them learn what works. They've recognized that AI capability is not just a property of the system but an emergent result of how the system is used.
There's a question of investment here that most organizations haven't fully grappled with. Building good scaffolding takes time. Learning what context matters for which tasks requires experimentation. Developing templates and workflows that consistently produce good results is real work. If you're not willing to invest in that work, you'll be limited to the shallow uses of AI—the tasks simple enough that minimal scaffolding suffices. The deeper uses, the ones that produce genuine leverage, require treating scaffolding as a skill to be developed rather than an afterthought.
The question that remains is whether you're willing to take that responsibility seriously. Whether you'll invest the effort to learn what these systems actually need, or whether you'll continue to throw thin prompts at powerful models and wonder why the results disappoint.
The model responds to what you give it. The harder question is whether you’ve taken the time to decide what that should be.