01 · Framing

Overview#

This work uses a small set of conservative, operational definitions to limit category drift.

This is not an ontology exercise. The point is to make statements that can be checked against deployed behavior, under constraints: latency, cost, data access, permissions, incentives.

Two separations do most of the work here:

Capability vs reliability: what a system can do in best-case conditions vs how often it does it correctly under messy inputs, shifting context, and incomplete information.
Research signals vs deployment realities: benchmark or demo performance vs performance in workflows with accountability, auditability, and changing requirements.

Definitions (operational, conservative)#

These definitions are intentionally narrow. They are for analysis and system design, not taxonomy.

AI: a software system that maps inputs to outputs using learned parameters, where behavior is better explained by learned generalization than by explicit, hand-authored rules alone.
Generative AI (GenAI): AI that produces variable-length artifacts (text, code, images, audio, structured data) conditioned on context, where the output is not a fixed class label and where quality is evaluated by a mix of correctness, utility, and constraints.
Agentic system: a system that executes a multi-step procedure toward a specified objective by selecting actions over time. Operational markers:
- It maintains state across steps (explicit memory or implicit state in context).
- It chooses among tools/actions based on intermediate results.
- It can recover (within bounds) from partial failure by replanning or retrying.
- It operates under a policy/guardrail layer that constrains tools, permissions, and acceptable outputs.
AGI-adjacent claim: a claim that a system exhibits broad task competence across domains with limited task-specific adaptation. In this work, “AGI-adjacent” is treated as a question of scope and transfer:
- How much task variation can be handled before accuracy collapses?
- How sensitive is performance to prompt structure, tool interfaces, or hidden scaffolding?
- How much new supervision, data, or engineering is required to reach acceptable reliability?

What this work is NOT claiming#

This work does not assume:

Timelines, inevitability, or monotonic progress.
That benchmark gains translate directly into production reliability.
That “autonomy” is broadly desirable, safe, or cost-effective.
That model capability implies organizational capability.
That a single model class or scaling approach dominates all tasks.

Why working definitions matter#

Without stable definitions, teams will talk past each other. The same label ends up referring to:

A model (weights + tokenizer).
A product (UI + workflow).
A system (model + tools + memory + evaluation + permissions).
An organization’s capability to integrate, govern, and iterate.

Models are treated as components. The unit of analysis is the system, because reliability, security, and cost are properties of the full loop: inputs, tools, human review, logging, and feedback.

Key points#

Working definitions are tools, not truth.
Capability and reliability are distinct.
Product impact depends on integration + incentives + evaluation.

01 · Framing

Overview#copy

Definitions (operational, conservative)#copy

What this work is NOT claiming#copy

Why working definitions matter#copy

Key points#copy

Open questions#copy

Overview#

Definitions (operational, conservative)#

What this work is NOT claiming#

Why working definitions matter#

Key points#

Open questions#