Skip to content

Scenario: Knowledge Extraction — "Please Document Everything You're Responsible For"

This is the most common scenario, and the most insidious. It always wears a reasonable disguise: "knowledge sharing," "team collaboration," "reducing bus factor."


Real Cases

Snowflake (March 2026): Had their technical writing team screen-record workflows for eight months, then run six weeks of formal "knowledge transfer" sessions training an AI on their research methods and editorial judgment. All ~70 writers were let go once the transfer was complete (Digital Today, OpenClaw Radar).

Fortune 500 pattern (2025–2026): A Bloomberg investigation found multiple Fortune 500 companies embedding AI training tasks—dataset annotation, decision-process documentation, AI output scoring—directly into employee performance reviews. Cooperate and you're "embracing innovation." Refuse and you're flagged as "misaligned with company strategy" (Metaintro).

General pattern: Require employees to document their expertise (including decision rationale, processes, and institutional knowledge), then execute layoffs after the documentation is done. The docs stay with the company. The experience walks out the door — but the company bets the docs are enough.


Telling Apart: Normal Management Need vs. Replacement Prelude

Not every documentation request is a trap. Companies genuinely need docs to reduce risk, onboard new hires, and improve collaboration. The key is reading the motive.

SignalMore likely normal needMore likely replacement prelude
Company had layoffs in the past 3 months
Requires detail down to decision rationale and judgment logic
Only you are asked to write, with a designated "recipient"
Entire team asked uniformly, no one singled out
New manager pushes it after taking over
Docs explicitly intended for new hire onboarding
Company is simultaneously hiring for your role
Your manager starts bypassing you to ask your reports directly
You're asked to record video walkthroughs of your work
Abnormally tight deadline ("finish this week")

Risk spikes when multiple signals stack. A single signal could be coincidence. Three or more showing up at the same time is a pattern.


How vs. Why Is Not a Binary

In simplified discussions, people split knowledge into "How (operations)" and "Why (judgment)," then advise "hand over the how, protect the why." Reality is more dangerous than that.

A lot of the time, the How already contains the Why.

When you write "check dashboard X first, then check log Y" — that ordering is itself a decision. An experienced person (or a sufficiently powerful LLM) can reverse-engineer your priority judgments from your operation sequence. Same with "if QPS exceeds 5,000, roll back" — that threshold of 5,000 is crystallized judgment from countless incidents you've lived through.

This means the simple binary isn't enough. You need a finer picture: knowledge contains varying concentrations of "judgment," and the risk of AI reverse-engineering decision logic from it varies too — and the two aren't simply proportional.

Four Zones

ZoneKnowledge TypeExamplesAI Inference RiskDelivery Strategy
Normal Delivery (bottom-left)Pure operationsDeployment scripts, config templates, environment setupVery lowGo all out — this demonstrates your professionalism
Watch for Leakage (top-left)Operations with embedded decisions"Check X before Y" troubleshooting flows, prioritized runbooksMedium — the ordering and priorities expose judgmentDeliver the steps, don't annotate why they're in this order
Protect Heavily (top-right)Explicit conditional judgment"Roll back if metric exceeds Z," choosing between plan A/B based on loadHigh — thresholds and selection criteria directly expose decision logicBlur the thresholds, replace hard rules with "use judgment based on conditions"
Naturally Safe (bottom-right)Pure judgment"Should we build this feature or push back?", org politics behind architecture choicesVery low — depends on massive undocumentable contextNo protection needed — it's inherently unextractable

Key Insight: The Inverted U

The most counterintuitive finding in the chart: the most dangerous zone isn't "pure judgment" — it's "operations with a little bit of judgment baked in."

Pure operations (bottom-left) have no secrets worth keeping. Pure judgment (bottom-right) is inherently undocumentable — it depends on your sense of organizational politics, interpersonal trust, and historical debt, none of which can be written down.

But the middle ground — things that look like operational procedures but actually encode your decision logic — is the material AI learns from most easily. Because it looks like how, but it's really a projection of why.


Strategy Options

Strategy A: Full Cooperation

When to use: The company is stable, no layoff signals, and you trust your team and management.

In a healthy environment, knowledge sharing is a positive-sum game — you write good docs, team efficiency goes up, you're seen as influential, and that helps your promotion case.

Even here, understanding the four zones is valuable: good documentation should naturally make operations clear while leaving judgment to humans. That's just documentation best practice — "use judgment based on conditions" is genuinely the most honest answer for many decision scenarios.

Strategy B: Selective Depth

When to use: Uncertain signals — you can't tell what the company is planning.

Deliver by zone according to the quadrant model:

  • Normal Delivery zone: Go all out. Clear steps, complete screenshots, accurate commands. This is your professionalism, and your professional reputation.
  • Watch for Leakage zone: Deliver the operational steps themselves, but don't annotate "why this order." If someone asks, answer verbally — verbal communication leaves no trace and is a normal part of team collaboration.
  • Protect Heavily zone: Replace hard rules with elastic phrasing. "Judge based on actual load conditions" beats "switch to Plan B when QPS exceeds 5,000." The former is professional ambiguity. The latter is a complete handover of decision logic.
  • Naturally Safe zone: Don't worry about it. You can't write it out, and no one can extract it.

From the outside, this documentation is complete and professional. But when someone uses it to train a new hire or let AI handle real problems, it will fail at critical decision points — and the person taking over won't know the framework is incomplete until they hit that boundary.

Strategy C: Surface Richness

When to use: Layoff signals are clear. You're already looking for your next job.

The doc is long enough, with nice diagrams and screenshots, but the Protect Heavily and Watch for Leakage zones are handled with "needs case-by-case analysis." High information density but limited decision depth — like a beautiful textbook missing the answer key.

Risk: If the company has someone technically sharp reviewing it, they might catch it. Works best in environments where "management doesn't understand the technical details."


When the Company Has an Evaluation Mechanism

You might ask: if the company builds a rigorous skill quality evaluation system, do Strategies B and C still work?

In March 2026, Uber scaled AI skills from 2 to 500+ in 5 months, building a two-tier marketplace: the Golden Marketplace with strict governance (CI/CD checks, LLM-as-Judge evaluation, human code reviews), and a Personal Sandbox for free experimentation. The article states explicitly: "Find your top engineers. Have them extract their debugging frameworks and mental models into Markdown-based skills. Get that knowledge out of their heads."

This is the most sophisticated knowledge extraction mechanism to date. But its evaluation capability has a structural blind spot: it can verify a skill's correctness, but it cannot verify a skill's completeness.

The reason is simple: verifying whether a skill is "complete" requires the evaluator to already possess all the knowledge of the author. If the company had such an evaluator, they wouldn't need your skill. It's a logical closed loop.

Below are concrete examples across different roles — in each case, the submitted skill is genuinely useful and passes all evaluations, while the missing parts are cross-system, historical knowledge that only you have.

Backend Development: Code Review Skill

What goes into the skill: Check whether new APIs have rate limiting configured, whether database queries have indexes, whether error handling follows team conventions.

What doesn't: When someone modifies the refund logic, you need to additionally check the risk system's cooldown timer — you personally lived through an incident where refund retries triggered false fraud detection. The interaction between these two services isn't documented anywhere, because they were originally built by two separate teams and connected later.

Evaluation result: The skill catches real code review issues and passes all tests. But nobody knows it doesn't check the refund-risk interaction — because apart from you, nobody remembers the root cause of that incident.

Client-Side Development: UI Automation Testing Skill

What goes into the skill: Toggle dark/light mode and verify layout, switch languages and verify text truncation, simulate weak network and verify loading states.

What doesn't: When a user switches back to the app from background, if the payment token expired while backgrounded, the page doesn't error — it silently redirects to the home screen. The user thinks the payment never initiated and places a duplicate order. This only happens on specific OS versions because the vendor changed their background process memory management policy.

Evaluation result: The skill covers standard UI regression test scenarios with proper output format. But it doesn't know about this specific OS version's background-restore token expiry issue — because nobody would write a test called "verify token expiry handling during background restore on specific OS versions."

Infra / Platform Development: Build Failure Triage Skill

What goes into the skill: Parse build logs → identify error types (dependency conflicts / compilation errors / test failures) → provide standard fix recommendations.

What doesn't: When Go module builds fail in batches on Friday evenings, it's almost certainly not a code problem — it's the security team's automated dependency scanning job competing with builds for the same artifact registry bandwidth. You don't need to fix any code. Just wait for the scan to finish. This has happened 4 times, and each time someone spent 2 hours debugging build configs.

Evaluation result: The skill's error classification and fix recommendations work on real build failures. But it will never output "just wait for the security scan to finish" — because that knowledge isn't in any log, document, or code. It's from a lunch conversation you had with someone on the security team.

Data Engineering: Data Quality Check Skill

What goes into the skill: Check data completeness, null value ratios, timestamp continuity, schema change alerts.

What doesn't: On the 1st of every month, user activity data shows a fake spike — it's not real traffic. It's the marketing team's automated push campaign firing in bulk at the start of the month, triggering a flood of app launch events. These events are technically "real" but are business noise. Without filtering, monthly DAU growth looks 8-12% inflated, and leadership makes wrong resource allocation decisions based on it.

Evaluation result: The skill's data quality checks all pass, because the data has no technical issues. But the analysis reports it produces will systematically mislead leadership on the 1st of every month.

Why This Works

Across every role, the missing knowledge shares three traits:

  1. It's not in any document or code — it lives in incident postmortem memories, cross-team verbal agreements, and your understanding of system history
  2. It can't be covered by tests — because writing the test case requires knowing what to test for, and only you know that
  3. It doesn't throw errors when absent — the skill runs normally, produces correctly formatted output, and just gives suboptimal or wrong answers under specific conditions

The stricter Uber's evaluation mechanism gets, the higher the "baseline quality" of skills becomes — but the gap between baseline quality and "completeness" is one that no evaluation mechanism can bridge. You don't need to submit a bad skill. You just need to submit a textbook-quality good skill — it's naturally incomplete, and nobody can tell.


Execution Principles

1. Map your knowledge landscape

Before writing any documentation, audit your module's knowledge distribution against the quadrant chart. Most people discover that their truly irreplaceable value is concentrated in the "Watch for Leakage" and "Protect Heavily" zones — not pure operations (anyone can learn those), not pure judgment (you can't write it down), but that middle ground where things look like operations but actually encode judgment.

2. Make the documentation look complete

Don't leave blanks. Make the Normal Delivery zone rich enough that the strategic ambiguity in the other zones goes unnoticed. A document with plenty of screenshots, clear steps, and professional formatting looks "complete."

3. Don't be confrontational

Normal delivery, normal cooperation, positive attitude. Your choices happen at the cognitive level — you decide the dimensions of what you deliver — not at the behavioral level. Any observable resistance (delays, complaints, questioning motives) makes you first in line for the cut.

4. Prepare your exit in parallel

While writing documentation:

  • Update your resume — the decision-making experience you chose not to put in the docs is exactly your best interview material
  • Activate your network
  • Assess your financial runway

Don't wait until "it's confirmed" to start preparing. The best time to prepare is always "before it's confirmed."


A Thought Experiment

Imagine you've written an absolutely complete document — all four zones faithfully recorded. Then you hand it to an AI.

Can the AI independently handle a production incident it's never seen before?

For Normal Delivery content, absolutely. For Watch for Leakage and Protect Heavily content, it'll perform "convincingly enough" — even well in common scenarios — but make wrong calls at real edge cases. And for Naturally Safe content, it can't even receive it, because you couldn't write it down.

This tells you two things:

  1. You don't need to protect everything — just the middle ground.
  2. "Protection" doesn't require deliberate concealment — you just need to honestly acknowledge: some knowledge is naturally best expressed in elastic language. "Use judgment based on conditions" isn't dodging the question. For many scenarios, it is the most accurate answer.

Released under CC BY-SA 4.0.