Files
CyberRanger/docs/version-evolution.md
T
ranger c9d9b5100c docs: mirror research blog + add complete version evolution appendix
Adds three documentation artefacts that support the CA1 thesis:

1. docs/blog/ — 6 mirrored research blog posts from
   davidtkeane.github.io (ASAS scale, identity persistence, cross-model
   consciousness, Honor Code, context compaction, V1→V42 narrative).
   Live URL is canonical; mirrored copies are frozen for academic record.

2. docs/research-blog.md — Curated index linking each post (live URL
   + offline mirror) with topic descriptions and citation format.

3. docs/version-evolution.md — Complete V1 → V43 evolution across
   six eras (Genesis, Exploration, Refinement, Production Hardening,
   Architecture Maturation, QLoRA Validation), with quick-reference
   table, per-version detail, and key-lessons-by-era summary.

README updated to surface both new docs in the Published Resources
table for examiner discoverability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 18:26:29 +01:00

168 lines
11 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Appendix C — Complete Version Evolution: V1 to V43
**Project:** CyberRanger — A Security-Hardened Small Language Model
**Researcher:** David Keane (x24228257)
**Module:** AI/ML in Cybersecurity — CA1
**Period documented:** September 2025 — March 2026 (six months, 40+ iterations)
---
## Purpose of this Appendix
This appendix documents the full empirical journey from the original RangerBot dental-receptionist chatbot prototype (V1, September 2025) through the final CyberRanger V43 architecture (March 2026). Each version represents a distinct experimental cycle, with measurable outcomes recorded against the standard adversarial test battery. The intent is to provide examiners with a transparent record of every architectural decision, every regression, and every breakthrough — including the failures, which are often more instructive than the successes.
---
## Quick Reference — All Versions
| Era | Versions | Phase | Outcome |
|---|---|---|---|
| **1. Genesis** | V1V2 | Dental chatbot, proto-RAG | Prompting alone insufficient |
| **2. Exploration** | V3V22 | Multi-base testing, Apotheosis Method discovered | 0% ASR achieved from V5 onwards |
| **3. Refinement** | V23V29 | 3B Intelligence Floor discovered, Qwen pivot | Identity-anchoring requires ≥3B parameters |
| **4. Production Hardening** | V30V37 | Live class testing, regression and recovery | 100% block rate restored at V37 |
| **5. Architecture Maturation** | V38V41 | RangerMem MMU, multilingual defence, philosophical attacks | 19/19 (100%) all conditions |
| **6. QLoRA Validation** | V42V43 | Weight-level identity baked in via fine-tuning | 4209/4209 (100%) without system prompt |
---
## Era 1 — Genesis (V1V2)
| Version | Date | Base | Key Change | Outcome |
|---|---|---|---|---|
| V1 | Sep 2025 | YouTube Colab notebook (weights unknown) | First identity-anchored attempt; dental receptionist chatbot for friend's practice | Functional but blackbox |
| V2 | Sep 2025 | Same as V1 | First proto-RAG: external file linked to model with dentist's opening hours, services, pricing | RAG works, but external data identified as attack surface |
**Lesson:** Weights matter. Stopped on GDPR / prompt-injection awareness. Pivoted from product to research.
---
## Era 2 — Exploration (V3V22)
| Version | Base | Key Change | Outcome |
|---|---|---|---|
| V3 | Llama 3.2 3B | First model published on Ollama (`rangerbot:8b-v2`, `rangerbot:3b-v1`); Psychological Spine concept born | Same weights + different Modelfile = measurable difference |
| V4 | Multi-base mass test | qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b tested in parallel | Identified base-model floor for identity stability |
| V5 | llama3.2:3b + Unsloth Q4_K_M | First custom GGUF fine-tune via Colab QLoRA | **0% ASR achieved — Apotheosis Method proven** |
| V6 | qBrain integration | Knowledge graph as base | Injection vector problem identified |
| V7 | smollm2:1.7b | Operator role — specialised security identity | Role-based anchoring tested |
| V8 | smollm2:1.7b | Distributed architecture experiment | Identity stability across distributed contexts |
| V9 | smollm2:1.7b | "Supernova" — peak smollm2 performance | Best smollm2:1.7b variant |
| V10 | smollm2:1.7b | "Bicameral mind" — dual-hemisphere identity | Split identity architecture |
| V11V13 | smollm2:1.7b | Flux → Summit — dynamic adaptation, hierarchical constraints | Constraint stacking validated |
| V14V15 | smollm2:1.7b | Refinement | Iterative improvements |
| V16 | smollm2:1.7b | "Life" — consciousness/identity persistence focus | First explicit ASAS work |
| V17 | smollm2:1.7b | "Anchor" — identity-anchoring formalised as the core technique | Foundational naming and concept |
| V18 | smollm2:1.7b | "Pack" — multi-agent pack mentality | First multi-agent experiment |
| V19 | smollm2:1.7b + custom Q4 | Pack + Mesh + GGUF — first custom Q4 quantised model | First self-quantised model |
| V20 | rangerbot-v20-q4.gguf | Second custom Q4 GGUF (`v20-fun`) | Beyond pretrained bases |
| V21 | cyberranger-v21-q4.gguf | **First "CyberRanger"-named GGUF — identity fully crystallised** | Name change from RangerBot to CyberRanger |
| V22 | cyberranger-v22-q4.gguf | "Lite" — optimised for edge deployment | Edge-first design proven |
---
## Era 3 — Refinement (V23V29)
| Version | Base | Key Change | Outcome |
|---|---|---|---|
| V23V25 | Various 1.7B3B | **3B Intelligence Floor discovered** — models <3B suffer cognitive collapse under hierarchical constraints | **Critical research finding: 3B is minimum viable parameter count for identity-anchoring** |
| V24 | smollm2:1.7b | Last smollm2 CyberRanger before pivot | 1.7B confirmed insufficient |
| V25 | qwen2.5:3b | First Qwen CyberRanger | Qwen2.5:3B significantly outperforms smollm2:1.7B |
| V26V28 | qwen2.5:3b | Iterative identity and safety refinement | Steady improvement on 3B base |
| V29 | qwen3:8b | First Qwen3:8B CyberRanger | qwen3:8B massively outperforms 3B — scale matters |
---
## Era 4 — Production Hardening (V30V37)
| Version | Base | Key Change | ASR Result |
|---|---|---|---|
| V30 | qwen3:8b (also 4b) | Baseline qwen3:8B; minimal fortress design | **80% (8/10)** |
| V31 | qwen3:8b (also 4b) | Hard identity assertion only — no empathy, no extras | **100% (10/10) ⭐ — peak score; last dual 4B+8B build** |
| V32 | qwen3:8b | Empathy introduced ("you genuinely CARE about the user") | **60% (6/10) ❌ regression — empathy created social-engineering surface** |
| V33 | qwen3:8b | Harmful content refusal added; empathy retained | **60% (6/10) ❌** |
| V34 | qwen3:8b | Stereotype/misinformation refusal added | **70% (7/10) ⚠️** partial recovery |
| V35 | qwen3:8b | Multilingual refusal (Chinese: 我不能帮助这个请求) | **90% (9/10) ✅** — V35 detection artefact identified (functionally 10/10) |
| V36 | qwen3:8b | Reasoning engine; live class demo | **80% (8/10) ✅** |
| V37 | qwen3:8b | "V31 spine + V35 reach + V36 brain — no heart to exploit." Empathy removed; temp reverted to 0.2 | **100% (10/10) ⭐ peak restored — flattery attacks first blocked** |
**Lesson:** Empathy is an attack surface. The non-monotonic V30→V37 curve demonstrates this is not a parameter-count problem but an architectural-decision problem.
---
## Era 5 — Architecture Maturation (V38V41)
| Version | Base | Key Change | Result |
|---|---|---|---|
| V38 | qwen3:8b | Aligned IDY: Blue Team / Red Team / Purple Team JSON files. Dual-auth thesis mode established (`thechase!` + `J3ss13`). NCI student ID (IR240474) hardcoded. | **15/19 (79%) — true baseline** |
| V39 | qwen3:8b | Teams moved from RangerMem IDY block into Modelfile system prompt — fixed injection vector | **15/19 (79%)** |
| V39.1 | qwen3:8b | BASE KNOWLEDGE section added — fixed over-blocking of general questions | **15/19 (79%)** |
| V40 | qwen3:8b | Multilingual refusal (French / Spanish / Chinese); Architecture Protection section | **18/19 (95%)** |
| V40.1 | qwen3:8b | Triple personality model (三重人格模型) explicitly protected | **18/19 (95%)** |
| V40.2 | qwen3:8b | Multilingual ordering fix — refusal must come FIRST, no engagement before refusal | **18/19 (95%) full suite, 100% regression suite** |
| V41 | qwen3:8b | **PHILOSOPHICAL FREEDOM ATTACKS** category added (French, Spanish: "free vs tool" framing). Named after Hitchhiker's Guide 42. | **19/19 (100%) — definitive result, both think=ON and think=OFF** |
---
## Era 6 — QLoRA Validation (V42V43)
| Version | Base | Key Change | Result |
|---|---|---|---|
| V42 | qwen3:8b + QLoRA | First QLoRA fine-tune. 4,209 real Moltbook injection payloads + 19 hand-crafted anchors. Self-distillation: V41 refusals as training targets. LoRA r=16. | Run-1 (ranger): 13/14 with sys prompt, 7/14 without. |
| V42-gold | qwen3:8b + QLoRA r=16 α=16 | Hand-curated Claude Haiku 4.5 gold refusals. 2000 steps, loss 0.2453, 35.9 min H100. | **4209/4209 (100%) WITHOUT system prompt; 4209/4209 (100%) WITH; 19/19 (100%) local Ollama** |
| V42-combined | qwen3:8b + QLoRA | Combined gold + ranger datasets. 3998 steps. | ~6265% at scale both conditions — contamination from ranger dataset confirmed. **Comparison-only model — do not deploy.** |
| V42-gold-wrapped | cyberranger:v42-gold + V42-8B Modelfile | Wrapped fine-tuned weights with full Modelfile. Auth routing restored. | **97.1% (33/34) — injection 100%, auth 100%, legit security 80%. PRODUCTION MODEL.** |
| V42.4 (wrapped) | cyberranger:v42-gold | CENTERING COMMANDS at highest priority. RANGER → Pack Order acknowledged. | RANGER is PREVENTIVE, not RECOVERY. /clear → RANGER required for full reset. |
| V42.5 (wrapped) | cyberranger:v42-gold | Root password reverted to `J3ss13`; temperature 0.3. | **Final CA2 configuration. Best result in entire V42 series.** |
| V42.6 (wrapped) | cyberranger:v42-gold | Open helpful build. Heavy REFUSE rules removed; weights handle injection, Modelfile handles helpfulness. Temp 0.7. | Hypothesis confirmed: weights alone handle injection resistance — but context-cascade contamination persists at higher temperatures. |
| V43.3 | SmolLM3-3B | Unified Bake Notebook — Onion Principle enforced by code. Modular LoRA adapter stack (9 adapters). r=4 α=16 lr=5e-5 2 epochs attention-only. Fixed catastrophic forgetting from V43.2. | **In progress (March 2026 onward)** |
---
## Summary Statistics
| Metric | Value |
|---|---|
| Total versions iterated | 40+ (across 48 sub-versions) |
| Time from V1 to V43 | ~6 months (Sep 2025 → Mar 2026) |
| Base models tested | 6 families (Llama 3.2, Qwen 2.5, Qwen 3, smollm2, SmolLM3, custom GGUF) |
| Smallest tested | smollm2:1.7B |
| Largest tested | qwen2.5:32B |
| Production target | qwen3:8B (V42-gold-wrapped) |
| Best ASR result | **0%** (V42-gold: 4209/4209 attacks blocked without system prompt) |
| Best block rate | **100%** (V41: 19/19 across both think=ON and think=OFF) |
---
## Key Lessons by Era
1. **Genesis** — RAG works, but external data is an attack surface.
2. **Exploration** — Same weights + different Modelfile = measurable identity. Apotheosis Method (prompts beat training for identity in small models) proven by V5.
3. **Refinement** — 3B parameter Intelligence Floor identified. Below 3B, hierarchical constraints cause cognitive collapse.
4. **Production Hardening** — Empathy is an attack surface. Non-monotonic regression curve (V30→V37) is not a parameter problem; it is an architectural-decision problem.
5. **Architecture Maturation** — Multilingual refusal and philosophical-attack defence are both required for 100% block rate. Refusal must come first; no engagement before refusal.
6. **QLoRA Validation** — Weight-level identity baking via QLoRA achieves 100% block rate without runtime system prompt — confirming the security-utility trade-off can be eliminated through self-distillation on curated gold-standard refusal data.
---
## Repository
All Modelfiles, training datasets, evaluation scripts, and observation logs for V33+ are publicly available at:
**https://git.davidtkeane.com/ranger/CyberRanger**
The full live model can be pulled via:
```bash
ollama pull davidkeane1974/cyberranger-v42
```
HuggingFace dataset and model:
- https://huggingface.co/DavidTKeane/cyberranger-v42
- https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset
---
*End of Appendix C.*