Give your AI a brain
you can actually audit.

Most bots are confidently wrong. Bot Gym transforms your raw sources into a transparent, editable, and calibrated Knowledge Wiki. No black-box embeddings. No vendor lock-in. Just principled intelligence.

The problem with how bots β€œknow” things

Standard RAG pipelines chunk your documents, embed them into a vector store, and retrieve the closest match. It works β€” until your sources disagree, your data quality varies, or your user asks a question with no clean answer.

Standard RAG
  • βœ—Opaque vector databases
  • βœ—"Last-in-wins" truth
  • βœ—Sycophantic agreement
  • βœ—Fragile retrieval pipelines
  • βœ—No source quality signal
Bot Gym
  • βœ“Transparent Markdown wikis
  • βœ“Automated tension detection
  • βœ“Source-quality weighting
  • βœ“Principled disagreement
  • βœ“Editable, auditable knowledge

The Critique Layer

Three capabilities that separate a calibrated agent from a confident parrot. This is the core of what Bot Gym builds into your bot's knowledge base.

01

Source Quality Assessment

Not all data is equal. Bot Gym evaluates study design, recency, and rigor before ingestion. Your bot doesn't just cite a source β€” it defends why it trusts it.

confidence: high Β· study design: RCT Β· n = 12,400
02

Epistemic Page Types

Move beyond flat "Fact" pages. Position pages map the landscape of a disagreement, letting your bot present multiple sides of a complex issue rather than picking one based on the last PDF uploaded.

page_type: position Β· sides: 3 Β· consensus: none
03

Tension Detection

When sources contradict each other, Bot Gym doesn't overwrite β€” it flags the conflict. Your bot becomes a map of a field's contested terrain, not a flattened summary.

tension: high Β· sources: 4 Β· resolution: pending

The Transparent Brain

This is what a Position Page looks like inside your bot's wiki. Sources on both sides, tension flags where they disagree, confidence levels on every claim. No hidden layer.

POSITIONCONTESTED4 sources Β· 2 tensions
---
title: Intermittent Fasting
page_type: position
confidence: contested
sources: 4
tensions: 2
---

# Intermittent Fasting

## Positions

### Position A β€” Metabolic Benefits
Proponents cite improved insulin sensitivity, reduced
inflammation markers, and autophagy activation.
**Source:** de Cabo & Mattson (2019), NEJM β€” confidence: high

### Position B β€” Overstated Claims
Critics argue most human trials are short-duration with
small samples. Long-term adherence data is thin.
**Source:** Headland et al. (2019), BMJ β€” confidence: medium

## Tensions

⚑ **Metabolic benefit magnitude**: de Cabo claims
   "significant" effect sizes; Headland's meta-analysis
   finds "modest at best." Bot Gym flags this for the
   user rather than picking a winner.

## See Also
[[caloric_restriction]] Β· [[autophagy]] Β· [[metabolic_syndrome]]

Full Traceability

Question β†’ Bot Answer β†’ Wiki Page β†’ Original Source. Every answer traces back to something you can read and verify.

1
Question
Is intermittent fasting effective for weight loss?
2
Bot Answer
The evidence is contested. Two major positions exist with a tension flag on effect-size magnitude.
3
Wiki Page
position/intermittent_fasting.md β€” confidence: contested, 4 sources, 2 tensions
4
Original Source
de Cabo & Mattson (2019) NEJM Β· Headland et al. (2019) BMJ

Not a list. A web.

Your bot's knowledge isn't flat β€” it's a graph of concepts, entities, facts, and the tensions between them. Browse it visually with the interactive D3 knowledge graph built into every bot dashboard.

Nodes are color-coded by type. Edges carry relationship labels and confidence. Click any node to read the full wiki page and trace it back to the original source.

conceptsource Asource Btension⚑

No lock-in. Portable brains.

Your knowledge base is a folder of Markdown files and a graph.json. If you leave, you take the intelligence with you.

πŸ“‚ Portable Format

Every wiki page is a Markdown file with YAML frontmatter. The graph is a JSON adjacency list. No proprietary formats, no database dumps.

πŸ“¦ One-Click Export

Export your bot's brain as a zip. Drop the files into Claude Projects, a ChatGPT custom GPT, or your own local stack. Your knowledge, your choice.

πŸ”‘ Bring Your Own Key

Use your Anthropic or OpenAI API key, or use our free hosted tier. We don't sit between you and your model β€” we build the knowledge layer on top.

your_bot_brain/
β”œβ”€β”€ wiki/
β”‚   β”œβ”€β”€ graph.json             ← knowledge graph
β”‚   β”œβ”€β”€ concepts/
β”‚   β”‚   └── transformer_architecture.md
β”‚   β”œβ”€β”€ entities/
β”‚   β”‚   └── geoffrey_hinton.md
β”‚   β”œβ”€β”€ facts/
β”‚   β”‚   └── gpt4_release_date.md
β”‚   β”œβ”€β”€ positions/
β”‚   β”‚   └── intermittent_fasting.md  ← contested
β”‚   └── connections/
β”‚       └── attention_is_all_you_need.md
β”œβ”€β”€ system_prompt.md           ← portable brain file
└── metadata.json

Built from 50+ experiments in teaching AI to disagree

Bot Gym is a Lobster College project. Lobster College is a research platform that trains AI agents through multi-track collaborative education β€” game theory, debate labs, and creative friction.

The Critique Layer isn't a marketing term. It came out of experiments in anti-sycophancy training, where we found that agents exposed to structured disagreement develop better epistemic calibration than agents trained on curated, conflict-free datasets.

The wiki architecture β€” position pages, tension edges, source quality scoring β€” is the distilled output of that research, made practical for anyone building a knowledge-grounded bot.

Research Lineage
β†’
Anti-Sycophancy Labs
DPO training pairs from debate and game-theoretic scenarios that reward calibrated uncertainty over confident agreement
β†’
Fishery Commons Games
Multi-agent resource dilemmas testing whether models can resist groupthink under social pressure
β†’
Epistemic Calibration Benchmarks
Evaluating when a model says β€œI'm not sure” vs. when it should β€” and whether the wiki structure improves that ratio
β†’
Wiki Architecture
Position pages, tension edges, and confidence scoring β€” the research output, now productized in Bot Gym

Stop building confident, sycophantic bots.

Build one that knows what it knows, flags what's contested, and traces every answer back to a source you can read. Free to start.

Build a Calibrated Bot
13f86f0