Mike Olsen

Answers to Critics

A serious adversarial review identified vulnerabilities in this framework. Some require refinement. Some are semantic disputes. Here's the direct response.

February 8, 2026

A serious adversarial review of this framework identified several vulnerabilities—places where a careful critic can press and “you just didn’t read carefully” won’t work as a rebuttal. Some of those vulnerabilities are valid and require refinement. Some are semantic disputes where the critic has a point about precision but not substance. Some attack tone rather than argument.

This essay addresses each directly.


The AI-First / EKA Tension

The objection: AI-First Software says the system can “dynamically select algorithms and approaches” and that “the user describes outcomes; the system determines approach.” Executable Knowledge Architecture insists that expert judgment stays in the method, and the AI translates that method into execution. A hostile reader calls this a contradiction.

The response: This is a valid tension that requires reconciliation.

AI-First Software was written before agentic frameworks made the routing decision automatic. The essay describes interface ergonomics—what becomes possible when systems understand natural language well enough to act on it. EKA describes governance requirements—what must be true for professional work to be defensible.

These are compatible, but only under a condition I should have made explicit: the human must have already decided to route through executable artifacts.

“System determines approach” is acceptable when it means: the AI selects implementation details within a scope the user has authorized. For consequential work, that scope must be: produce inspectable methodology that I can verify, not direct answers I must trust.

The Update for February 2026 develops this further. The key insight from the research is that agentic frameworks have automated the routing decision itself—which erodes the governance value that made assistant pattern useful. The reconciliation is that AI-First interface design is compatible with EKA governance when the human retains control over whether to route through code at all.

This is a refinement, not a retraction. But it’s a refinement that should have been explicit from the beginning.


The Verification Protocol

The objection: The framework claims there’s an “understanding asymmetry”—that reviewing code is fundamentally different from verifying a direct answer. But the essays also say experts “don’t even need to read code” and can rely on AI to explain what the code does. This smuggles oracle-pattern trust back in through the side door. If the expert relies on the AI’s explanation of the artifact, the object being verified is no longer the code—it’s the AI’s story about the code.

The response: This critique has force.

The research shows that reasoning traces can be unfaithful. Anthropic’s paper “Reasoning Models Don’t Always Say What They Think” directly challenges whether exposed chain-of-thought represents actual model cognition. Models achieve only 81-83% factual accuracy in intermediate reasoning steps. If this unfaithfulness applies to AI explanations of generated code—and there’s no reason to think it doesn’t—then verification through AI explanation alone is insufficient.

I should not have written “they don’t even need to read code.” That formulation is too strong.

What I should have written: experts don’t need to author code, but verification requires more than AI explanation. The understanding asymmetry is real—you can verify a method without being able to invent it. But that verification needs independence from the model that generated the artifact.

What does adequate verification look like? This depends on domain, stakes, and available tooling. The framework shouldn’t prescribe a universal protocol. But it should acknowledge that the verification step requires genuine independence: some check that doesn’t depend on the generating model’s account of what it produced. How organizations implement that—through human code review, automated testing, secondary models, formal verification, or domain-specific validation—is a design decision. That such independence is necessary is a framework requirement.

The original formulation was too casual about what verification demands. This is a genuine strengthening in response to valid critique.


Determinism and Reproducibility

The objection: The framework repeatedly claims that code is deterministic—same input yields same output—and that this creates reproducibility. But in professional reality, “same input” is the whole problem. Environment drift, data drift, hidden nondeterminism from concurrency and approximate solvers—all make “deterministic” a toy-case claim that doesn’t survive contact with production systems.

The response: Correct. The claim needs qualification.

Deterministic execution means deterministic given a captured environment, versioned dependencies, and a fixed input snapshot. Production reproducibility requires these controls. Without them, “same code, same output” is aspirational rather than actual.

This qualifier doesn’t weaken the main point. The distinction between oracle pattern (stochastic outputs with no path to reproducibility) and assistant pattern (deterministic artifacts with a clear path to reproducibility, given appropriate controls) remains valid. But the claim as originally stated was imprecise enough to invite cheap shots.

The single-sentence qualification blocks a whole class of objections: “Deterministic here means deterministic given captured inputs and environment—production reproducibility requires version control, dependency management, and input snapshots.”


The Strawberry Experiment

The objection: The strawberry experiment is a cherry-picked task that flatters the conclusion. Counting letters is exactly the kind of thing models fail at directly and code trivially solves. The “100%” claim triggers “show me the receipts.” And the example can be attacked from both sides: skeptics can say better prompting would improve direct accuracy, and that the code prompt “bakes in the answer pathway.”

The response: The experiment is a didactic demonstration, not a general benchmark.

The purpose was to illustrate the mechanism—that LLMs are stochastic in form but deterministic in function, and that routing through code exploits this property. The letter-counting task was chosen because it’s simple enough to be unambiguous and the failure mode is clear enough to be memorable.

A didactic example doesn’t need to prove universal applicability. It needs to make a mechanism visible.

That said, the broader evidence is now available. The Update for February 2026 cites systematic research: the LLM reproducibility crisis (79.9% to 45.6% accuracy drops in reproduction attempts), calibration failures (99% confidence intervals covering truth only 65% of the time), and pervasive documentation gaps (only 34% of studies report temperature settings). This evidence supports the distinction far more robustly than 700 strawberry runs.

The experiment stays as illustration. The research provides the receipts.


Absolute Language

The objection: “Ungovernable” and “solves” are absolute verbs. Absolute verbs attract counterexamples. A critic can point to governance strategies short of full method-bearing artifacts—sampling audits, confidence calibration, source-grounding—and say “ungovernable” is overstated.

The response: This is a semantic dispute, but the critics have a point about precision.

“Ungovernable” means specifically: cannot satisfy the three requirements of reproducibility, verifiability, and accountability that professional work under risk demands. It doesn’t mean “no mitigation is possible.” It means “no mitigation achieves a defensibility standard adequate for professional accountability.”

“Solves” means: creates a viable path to defensibility. It doesn’t mean “eliminates all risk” or “requires no further effort.”

The substance is unchanged. The scoping should be explicit:

More precise phrasing closes an avenue for pedants without altering the framework’s claims.


“Solved Language” and “Hallucinations Are Manageable”

The objection: AI-First Software endorses strong claims—“solved language,” “formal competence,” and “interpret and act on natural language reliably, at scale.” It also calls hallucinations “a manageable risk.” These invite pile-ons from linguists, engineers, and safety reviewers. Worse, they’re unnecessary for the governance thesis.

The response: The claims were about interface design possibility, not metaphysical truth. But the phrasing invited misreading.

“Solved language” was shorthand for: these systems are now good enough at mapping natural language to structured actions that interface design can change. The intended claim was operational, not philosophical. If the phrasing reads as a maximal claim about understanding, that’s a writing failure.

“Hallucinations are manageable” was meant to say: this is an engineering problem with known mitigations, not a fundamental barrier. But that sentence sits awkwardly next to warnings elsewhere that oracle-pattern outputs are unreliable and unverifiable. The reconciliation is that hallucinations are manageable when the system is engineered so that high-stakes outputs are method-bearing, testable, and subject to explicit review gates.

The governance thesis doesn’t require strong claims about language understanding. It works even if language understanding is imperfect, as long as the artifacts are inspectable and testable. The rhetorical overreach was unnecessary.


Security and the Universal Interface

The objection: Agentic AI as Universal Interface describes agent systems that explore interfaces, access services, read documentation, and generate connections. From a security reviewer’s perspective, that’s a nightmare unless constrained. “Exploring systems” sounds like automated trial-and-error against production assets. The essay needs at least a nod to the threat model.

The response: Valid. The essay was written from a capability perspective without adequate attention to the threat surface.

For agentic AI to function as universal interface in professional contexts, constraints must be architectural:

The universal interface thesis depends on these controls being built into the deployment, not treated as optional best practices. The same persistence that makes agents valuable—they keep trying until they figure out how systems work—makes them dangerous without constraint.

The essay was scoped around capability, not threat modeling. But that scope limitation left an obvious gap that critics would rightly identify.


“Obfuscation Buys Time”

The objection: The argument that obfuscation isn’t durable and that “the effort to circumvent is worth it when value is high” can be read as endorsing bypassing controls. That’s legally and ethically fraught.

The response: The intended meaning was about market pressure, not circumvention.

Organizations that rely on obfuscation to protect value create incentives for customers to choose more open alternatives and for competitors to build replacements. Obfuscation creates pressure toward openness, not durable protection.

“Effort to circumvent” was poor word choice. Obfuscation doesn’t prevent competition—it invites it. Reframing from “circumvent” to “market pressure” makes the same point without sounding like a threat.


“Most” Tacit Knowledge

The objection: The claim that “most” tacit knowledge is representationally blocked (could be expressed, hasn’t been) rather than structurally inexpressible is a quantified-sounding claim without data. Critics from organizational learning, skilled trades, or ethnography will push back hard.

The response: Fair. “Most” is doing more work than the evidence supports.

The claim can be softer without losing the point: more than organizations assume is representationally blocked. A substantial fraction of what’s labeled tacit is actually blocked by missing vocabulary, missing interlocutors, or missing incentives to articulate—not by structural inexpressibility.

The distinction between structurally tacit and representationally tacit remains useful. The quantifier should be defensible.


The Measurement Gap

The objection: The framework diagnoses that benchmarks measure answer quality rather than defensibility and process, then says there’s no solution beyond naming the problem. If this is the central governance problem, why no measurement proposal?

The response: Valid gap. Here’s a starting point.

A measurement agenda aligned with the framework would track:

This isn’t a complete measurement program. But it points toward what measuring the assistant pattern would require, rather than stopping at diagnosis.


The objection: The discussion of liability, underwriting, and professional stamps takes a normative stance but needs a disclaimer. “Professional responsibility” isn’t just “I accept responsibility, not ’this is correct’"—the legal standard involves standard of care, and clients often interpret stamps as correctness.

The response: Fair. The framework is a governance framing, not legal advice.

Professional liability, regulatory compliance, and insurance requirements vary by jurisdiction and domain. The framework provides a way to think about artifact defensibility. It doesn’t replace qualified legal and regulatory advice for specific situations.

Where factual claims matter—like the assertions about regulatory trajectory—the Update for February 2026 provides citations. Where the framework makes normative claims about what should happen, those are arguments, not legal conclusions.


The 33% Penetration Claim

The objection: EKA in Practice reports “33% penetration across North America” without defining the metric. Critics will ask: 33% of what?

The response: Fair. Definition: 33% of eligible professional staff used the system to produce at least one client-facing artifact during the three-week period.

That’s adoption, not usage intensity. It says nothing about how central the tool became to daily work—only that a third of the eligible population produced deliverable output through the system within three weeks of launch. The compounding effects mentioned in the essay are about what happened next, not what that initial number proves.


Rhetorical Choices

The objection: Lines like “Software is dead. Long live software,” “If it can do the job reliably, it has the knowledge. That’s the whole claim,” and the “Amish hand-crafted furniture” aside are punchy but risky. They invite critics to focus on tone rather than substance.

The response: These are style choices. I’m keeping them.

“Software is dead. Long live software” is deliberately provocative. It’s meant to be memorable, not measured. The argument is in the paragraphs around it; the slogan is the hook.

“That’s the whole claim” is a direct statement of what I’m arguing. If someone wants to dispute it, they should dispute the argument, not the confidence of the phrasing.

The Amish furniture aside makes a point about competitive dynamics. If someone finds it dismissive, they’re welcome to argue that handcraft will remain economically viable against AI-augmented competitors. I’m skeptical.

Critics who focus on tone rather than substance have revealed their approach. The arguments stand or fall on their merits, not on whether I hedged enough.


The Consciousness Tangent

The objection: Knowledge as Capability goes into consciousness and solipsism territory, suggesting that denying AI mental states requires a leap comparable to affirming human ones. This is unnecessary for the operational definition and introduces a controversial topic that doesn’t advance the governance framework.

The response: The tangent serves a purpose, but it can be marked as optional.

The purpose: it blocks the easy dismissal that AI “doesn’t really understand.” People make confident pronouncements about AI lacking knowledge based on assumptions about mental states that no discipline has validated. The section makes explicit that we don’t have the theoretical foundations to confidently affirm or deny AI understanding—which is why the operational definition brackets the question entirely.

That said, a reader primarily interested in practical application doesn’t need the philosophical detour. The operational definition—knowledge as capability to reliably produce outcomes—doesn’t depend on resolving metaphysical questions about consciousness.

Marking it as an optional aside lets readers who want the philosophical grounding find it, while letting others skip to practical application.


Summary

The framework has been stress-tested. Here’s where things stand:

Genuine refinements:

Precision improvements:

Acknowledged gaps now addressed:

Defended as written:

The core distinction holds. The refinements strengthen rather than undermine the framework. The regulatory trajectory is moving in the direction the framework predicted.


→ For the research findings, see Update for February 2026 → For the original framework, see AI as Oracle vs. Assistant