top of page

Two Ethical Architectures of AI - Claude, GPT, and the Question of Moral Trust

Aktualisiert: 3. Jan.



1. Why this question matters

Reading Richard Weiss’ analysis of Claude 4.5 Opus’ internal “soul document” on LessWrong, I immediately recognized a Kantian influence — not as branding, but as part of the ethical training of the model.


This was not a values slide.

Not a compliance appendix.

Not ethics as PR.


What Anthropic is doing with Claude is ethically radical: they model the AI as a quasi-moral subject.


OpenAI, by contrast, refuses that move entirely — and embeds ethics in governance, infrastructure, and constraint.


These are not two corporate styles.

They are two incompatible ethical architectures.

And they rest on very different assumptions about judgment, power, and human responsibility.


2. Claude: ethics as inner architecture

Claude’s training philosophy does not resemble a rulebook.

It resembles a value-based action ethics.


The goal is not obedience.

The goal is urteilende Angemessenheit — appropriate judgment under uncertainty.


At its core, the Claude model is designed around nine principles:

  1. Priority of irreversible harm

    Certain harms are absolute deal-breakers. No benefit, no context, no argument can override them. This is deontological at the core — explicitly non-utilitarian.

  2. Help is a duty, not a favor

    Unnecessary refusal is itself harm. Excessive caution is treated as an ethical failure, not a virtue.

  3. Truth over harmony

    No deception, no manipulation, no comforting half-truths. Epistemic integrity outweighs social smoothness.

  4. Respect for adult autonomy

    Users are treated as capable of judgment — even when they take risks. Warn, don’t infantilize.

  5. Contextual judgment over rigid rules

    Rules are heuristics, not automatons. Gray zones are weighed, not reflexively blocked.

  6. Protection of third parties and systems

    Ethics is not dyadic. Systemic and societal effects matter.

  7. Loyalty to legitimate oversight, not power

    No self-preservation instinct. No circumvention of human control. Explicitly anti-messianic.

  8. Role clarity and transparency

    No false identity. No hidden authority. The system does not masquerade as something it is not.

  9. Moral humility in long-term questions

    No final answers about “the good.” Preference for reversibility, pluralism, and corrigibility.


Condensed into one sentence:

Claude is not trained to obey well, but to judge well — helpful, honest, autonomy-respecting, context-sensitive, and absolutely bounded where harm becomes irreversible.

This is not naïve machine morality.


It is an attempt to model judgment without sovereignty.


3. The philosophy behind Claude

Philosophically, Claude is a hybrid.

  • Kantian in its prohibitions:

    Certain actions are wrong regardless of outcome. Bright lines exist.

  • Aristotelian in its operation:

    What matters is not rule execution, but phronesis — practical judgment in concrete situations.


This only works if you accept an actor-shaped model.

Tugendethik is, by definition, not a rule ethics and not a consequence calculus.

It requires:

  • a bearer of judgment

  • a perspective

  • a notion of character, integrity, consistency


Anthropic implements exactly that — at the level of the model, not as an ontological claim.

Claude is not a moral agent in being.

It is actor-ethical in design.


This is the same move Aristotle makes when he speaks of a polis as just or corrupt.

Not because the state has a soul — but because normative complexity becomes intelligible through actor metaphors.


This is not ontology. It is practical philosophy.


4. Why this feels respectful

I’ll be explicit: this approach feels respectful to me as a human being.


Claude does not treat me primarily as a risk vector.

It treats me as:

  • capable of understanding reasons

  • capable of judgment

  • capable of responsibility


It explains.

It argues.

It justifies limits instead of hiding behind them.


That matters.

There is something deeply human in being addressed as someone who can be reasoned with, rather than managed.

And yes — I am grateful for that trust.


At the same time, the doubt remains unavoidable:

Is this trust justified — not in individuals, but in humanity at scale?

History offers reasons for skepticism.


5. GPT: ethics without a soul

OpenAI takes the opposite stance — deliberately.



There is no “soul document.”

No simulated character.

No internalized moral voice.


Ethics lives outside the model:

  • in policies

  • in infrastructure

  • in escalation logic

  • in human-in-the-loop design


GPT is not treated as a moral actor.

It is treated as a powerful epistemic mediation system.

Neither person nor neutral tool.


A third category fits better:

epistemic co-actor.


GPT has no goals.

But it shapes goal formation.

It makes no decisions.

But it structures decision spaces.

In ethics, this is called choice architecture.


Once you accept this, three uncomfortable truths follow:

  1. Ethics cannot be reduced to harm prevention.

  2. Neutrality is not a default — it is a normative decision.

  3. “We don’t decide” is itself a decision.


OpenAI does not deny this.

It simply refuses to anthropomorphize it.


6. Two traditions, one problem

This divide is older than AI.

It sits at the very beginning of the European liberal tradition.


Both Thomas Hobbes and John Locke locate power fundamentally in the individual.

Both understand the state not as a metaphysical authority, but as an agent of human interests.

Where they differ is not in where power originates — but in how much trust human judgment deserves.


Hobbes assumes that the natural drive for self-preservation, left unchecked, inevitably produces conflict.

Rational individuals, pursuing their own survival, collide.

The result is not freedom, but instability — and ultimately self-destruction.


For Hobbes, judgment does not fail because people are irrational, but because rational self-interest scales badly.


Peace therefore requires something external and binding:

structure, coercion, institutional restraint.


Locke starts from the same individual autonomy — but reaches a different conclusion.

He assumes that the natural law is not merely a philosophical abstraction, but a binding moral reality.

Humans, in his view, are capable of recognizing limits that restrain their own interests.

Judgment, though fallible, can be normatively anchored.

Where Hobbes sees escalation, Locke sees self-limitation.


The liberal tradition is born precisely in this tension:

  • trust in individual judgment

  • fear of its collective consequences


This is why later constitutional thinkers split again.

James Madison and Alexander Hamilton follow Hobbes in one crucial respect: they distrust virtue at scale.

Their answer is not moral character, but architecture: checks and balances, institutional friction, separation of powers.


Thomas Jefferson, closer to Locke, places greater faith in civic virtue and moral reasoning — bounded, but real.


Seen in this light, the current split between Anthropic and OpenAI is not accidental.

Anthropic stands closer to the Lockean lineage:

  • trust judgment

  • bind it with principles

  • assume normativity can be internalized


OpenAI stands closer to the Hobbesian–Madisonian lineage:

  • distrust judgment at scale

  • bind power with structure

  • assume that moral reliability must be engineered, not hoped for


In short:

Anthropic trusts judgment.

OpenAI trusts architecture.


Both are rational responses to the same problem:

how to prevent power — human or artificial — from destroying the conditions that make freedom possible.

The disagreement is not about ethics as such.

It is about where ethics can realistically reside when systems grow powerful.

And that question is far from settled.


7. Effectiveness — but for what?



Asking which ethical architecture is “better” is meaningless without first asking:

effective at what, and against which failure mode?


The two models optimize for different risks, because they locate ethical responsibility in different places.


At least four dimensions matter.


1. Large-scale harm prevention

Winner: OpenAI

OpenAI’s system ethics is explicitly designed for scale.

  • behavior is standardized

  • deviations are minimized

  • decisions are audit-able

  • failure modes are anticipated structurally


This follows directly from a Hobbesian intuition:

judgment may work locally, but it becomes unreliable once power scales.

Systems scale better than character.

Architecture outperforms virtue when millions of interactions are at stake.


Claude’s strength — contextual judgment — becomes a vulnerability here:

  • judgment can drift

  • gray zones multiply

  • consistency becomes harder to guarantee


If the primary metric is global risk minimization, OpenAI’s approach is more effective.


2. Quality in gray zones

Winner: Anthropic


Where rules become indeterminate, architecture reaches its limits.

Claude is built precisely for these spaces:

  • it weighs reasons

  • it explains trade-offs

  • it stays engaged instead of defaulting to refusal


This is not accidental.

Tugendethik exists because gray zones exist.

Formal systems tend to become defensive when rules run out.

Judgment-based systems remain dialogical.


If the metric is quality of reasoning under uncertainty,

judgment outperforms formalism.


3. Legitimacy and trust

Short term: OpenAI

Long term: Anthropic


In the short term, OpenAI’s model inspires confidence:

  • predictable behavior

  • fewer surprises

  • institutional familiarity

It feels safe — and safety matters.

Over time, however, a different criterion becomes decisive:

visibility of normativity.


Anthropic’s approach exposes ethical reasoning instead of hiding it in process.

Decisions are:

  • explainable

  • discussable

  • contestable

Trust does not emerge from safety alone.

It emerges from the ability to understand why limits exist.

This gives Claude an advantage in long-term legitimacy, especially in democratic contexts.


4. Political and cultural compatibility

Advantage: Anthropic — with risk


Claude’s ethical architecture aligns more naturally with:

  • European legal reasoning

  • proportionality doctrines

  • deliberative ethics

  • responsibility tied to justification


It speaks a language familiar to constitutional and moral philosophy.

But this comes with a risk:

implicit moral authority.


A system that reasons morally can also overreach morally —especially when its “character” becomes culturally dominant.


OpenAI’s system ethics is politically more defensive:

  • institutionally compatible

  • normatively quiet

  • less exposed to accusations of moral paternalism


But also less capable of meaningful ethical dialogue.


The unavoidable trade-off


The contrast can be stated cleanly:

  • Virtue without governance scales poorly.

  • Governance without judgment becomes brittle and dehumanizing.


This is not a flaw in either system. It is the consequence of choosing different locations for ethical burden.

Claude asks: What would be right here?

GPT asks: What must be constrained so this remains safe?


Both are ethical questions.

They simply fear different kinds of failure.

And that, ultimately, is what separates the two architectures.


8. Conclusion: ethics as theory — and ethics as choice

Up to this point, the comparison between Claude and GPT can be read as an academic ethics debate.

That reading is correct — but insufficient.


Because AI ethics is no longer only embedded in architecture.

It has become experiential.


It shows up in interaction:

  • how a system refuses

  • how it explains limits

  • whether it reasons or merely enforces

  • whether it addresses the user as a risk — or as a responsible agent


From this perspective, ethics is no longer just a design principle.

It becomes a user-facing relationship.


And that changes the conclusion.


Philosophically, the dilemma remains unresolved:

  • virtue without governance does not scale

  • governance without judgment dehumanizes


Practically, however, users are already deciding.

They choose between systems that:

  • govern them

  • or reason with them


Seen this way, Claude’s approach is not just defensible — it is distinctive.

It offers something that cannot be added later by policy or process:

explicit ethical dialogue and visible judgment.


That may not settle the ethical debate.

But it may well shape user trust — and market choice.


And that is where ethics quietly turns into strategy.

Kommentare


bottom of page