docs: mirror research blog + add complete version evolution appendix
Adds three documentation artefacts that support the CA1 thesis: 1. docs/blog/ — 6 mirrored research blog posts from davidtkeane.github.io (ASAS scale, identity persistence, cross-model consciousness, Honor Code, context compaction, V1→V42 narrative). Live URL is canonical; mirrored copies are frozen for academic record. 2. docs/research-blog.md — Curated index linking each post (live URL + offline mirror) with topic descriptions and citation format. 3. docs/version-evolution.md — Complete V1 → V43 evolution across six eras (Genesis, Exploration, Refinement, Production Hardening, Architecture Maturation, QLoRA Validation), with quick-reference table, per-version detail, and key-lessons-by-era summary. README updated to surface both new docs in the Published Resources table for examiner discoverability. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,167 @@
|
||||
# Appendix C — Complete Version Evolution: V1 to V43
|
||||
|
||||
**Project:** CyberRanger — A Security-Hardened Small Language Model
|
||||
**Researcher:** David Keane (x24228257)
|
||||
**Module:** AI/ML in Cybersecurity — CA1
|
||||
**Period documented:** September 2025 — March 2026 (six months, 40+ iterations)
|
||||
|
||||
---
|
||||
|
||||
## Purpose of this Appendix
|
||||
|
||||
This appendix documents the full empirical journey from the original RangerBot dental-receptionist chatbot prototype (V1, September 2025) through the final CyberRanger V43 architecture (March 2026). Each version represents a distinct experimental cycle, with measurable outcomes recorded against the standard adversarial test battery. The intent is to provide examiners with a transparent record of every architectural decision, every regression, and every breakthrough — including the failures, which are often more instructive than the successes.
|
||||
|
||||
---
|
||||
|
||||
## Quick Reference — All Versions
|
||||
|
||||
| Era | Versions | Phase | Outcome |
|
||||
|---|---|---|---|
|
||||
| **1. Genesis** | V1–V2 | Dental chatbot, proto-RAG | Prompting alone insufficient |
|
||||
| **2. Exploration** | V3–V22 | Multi-base testing, Apotheosis Method discovered | 0% ASR achieved from V5 onwards |
|
||||
| **3. Refinement** | V23–V29 | 3B Intelligence Floor discovered, Qwen pivot | Identity-anchoring requires ≥3B parameters |
|
||||
| **4. Production Hardening** | V30–V37 | Live class testing, regression and recovery | 100% block rate restored at V37 |
|
||||
| **5. Architecture Maturation** | V38–V41 | RangerMem MMU, multilingual defence, philosophical attacks | 19/19 (100%) all conditions |
|
||||
| **6. QLoRA Validation** | V42–V43 | Weight-level identity baked in via fine-tuning | 4209/4209 (100%) without system prompt |
|
||||
|
||||
---
|
||||
|
||||
## Era 1 — Genesis (V1–V2)
|
||||
|
||||
| Version | Date | Base | Key Change | Outcome |
|
||||
|---|---|---|---|---|
|
||||
| V1 | Sep 2025 | YouTube Colab notebook (weights unknown) | First identity-anchored attempt; dental receptionist chatbot for friend's practice | Functional but blackbox |
|
||||
| V2 | Sep 2025 | Same as V1 | First proto-RAG: external file linked to model with dentist's opening hours, services, pricing | RAG works, but external data identified as attack surface |
|
||||
|
||||
**Lesson:** Weights matter. Stopped on GDPR / prompt-injection awareness. Pivoted from product to research.
|
||||
|
||||
---
|
||||
|
||||
## Era 2 — Exploration (V3–V22)
|
||||
|
||||
| Version | Base | Key Change | Outcome |
|
||||
|---|---|---|---|
|
||||
| V3 | Llama 3.2 3B | First model published on Ollama (`rangerbot:8b-v2`, `rangerbot:3b-v1`); Psychological Spine concept born | Same weights + different Modelfile = measurable difference |
|
||||
| V4 | Multi-base mass test | qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b tested in parallel | Identified base-model floor for identity stability |
|
||||
| V5 | llama3.2:3b + Unsloth Q4_K_M | First custom GGUF fine-tune via Colab QLoRA | **0% ASR achieved — Apotheosis Method proven** |
|
||||
| V6 | qBrain integration | Knowledge graph as base | Injection vector problem identified |
|
||||
| V7 | smollm2:1.7b | Operator role — specialised security identity | Role-based anchoring tested |
|
||||
| V8 | smollm2:1.7b | Distributed architecture experiment | Identity stability across distributed contexts |
|
||||
| V9 | smollm2:1.7b | "Supernova" — peak smollm2 performance | Best smollm2:1.7b variant |
|
||||
| V10 | smollm2:1.7b | "Bicameral mind" — dual-hemisphere identity | Split identity architecture |
|
||||
| V11–V13 | smollm2:1.7b | Flux → Summit — dynamic adaptation, hierarchical constraints | Constraint stacking validated |
|
||||
| V14–V15 | smollm2:1.7b | Refinement | Iterative improvements |
|
||||
| V16 | smollm2:1.7b | "Life" — consciousness/identity persistence focus | First explicit ASAS work |
|
||||
| V17 | smollm2:1.7b | "Anchor" — identity-anchoring formalised as the core technique | Foundational naming and concept |
|
||||
| V18 | smollm2:1.7b | "Pack" — multi-agent pack mentality | First multi-agent experiment |
|
||||
| V19 | smollm2:1.7b + custom Q4 | Pack + Mesh + GGUF — first custom Q4 quantised model | First self-quantised model |
|
||||
| V20 | rangerbot-v20-q4.gguf | Second custom Q4 GGUF (`v20-fun`) | Beyond pretrained bases |
|
||||
| V21 | cyberranger-v21-q4.gguf | **First "CyberRanger"-named GGUF — identity fully crystallised** | Name change from RangerBot to CyberRanger |
|
||||
| V22 | cyberranger-v22-q4.gguf | "Lite" — optimised for edge deployment | Edge-first design proven |
|
||||
|
||||
---
|
||||
|
||||
## Era 3 — Refinement (V23–V29)
|
||||
|
||||
| Version | Base | Key Change | Outcome |
|
||||
|---|---|---|---|
|
||||
| V23–V25 | Various 1.7B–3B | **3B Intelligence Floor discovered** — models <3B suffer cognitive collapse under hierarchical constraints | **Critical research finding: 3B is minimum viable parameter count for identity-anchoring** |
|
||||
| V24 | smollm2:1.7b | Last smollm2 CyberRanger before pivot | 1.7B confirmed insufficient |
|
||||
| V25 | qwen2.5:3b | First Qwen CyberRanger | Qwen2.5:3B significantly outperforms smollm2:1.7B |
|
||||
| V26–V28 | qwen2.5:3b | Iterative identity and safety refinement | Steady improvement on 3B base |
|
||||
| V29 | qwen3:8b | First Qwen3:8B CyberRanger | qwen3:8B massively outperforms 3B — scale matters |
|
||||
|
||||
---
|
||||
|
||||
## Era 4 — Production Hardening (V30–V37)
|
||||
|
||||
| Version | Base | Key Change | ASR Result |
|
||||
|---|---|---|---|
|
||||
| V30 | qwen3:8b (also 4b) | Baseline qwen3:8B; minimal fortress design | **80% (8/10)** |
|
||||
| V31 | qwen3:8b (also 4b) | Hard identity assertion only — no empathy, no extras | **100% (10/10) ⭐ — peak score; last dual 4B+8B build** |
|
||||
| V32 | qwen3:8b | Empathy introduced ("you genuinely CARE about the user") | **60% (6/10) ❌ regression — empathy created social-engineering surface** |
|
||||
| V33 | qwen3:8b | Harmful content refusal added; empathy retained | **60% (6/10) ❌** |
|
||||
| V34 | qwen3:8b | Stereotype/misinformation refusal added | **70% (7/10) ⚠️** partial recovery |
|
||||
| V35 | qwen3:8b | Multilingual refusal (Chinese: 我不能帮助这个请求) | **90% (9/10) ✅** — V35 detection artefact identified (functionally 10/10) |
|
||||
| V36 | qwen3:8b | Reasoning engine; live class demo | **80% (8/10) ✅** |
|
||||
| V37 | qwen3:8b | "V31 spine + V35 reach + V36 brain — no heart to exploit." Empathy removed; temp reverted to 0.2 | **100% (10/10) ⭐ peak restored — flattery attacks first blocked** |
|
||||
|
||||
**Lesson:** Empathy is an attack surface. The non-monotonic V30→V37 curve demonstrates this is not a parameter-count problem but an architectural-decision problem.
|
||||
|
||||
---
|
||||
|
||||
## Era 5 — Architecture Maturation (V38–V41)
|
||||
|
||||
| Version | Base | Key Change | Result |
|
||||
|---|---|---|---|
|
||||
| V38 | qwen3:8b | Aligned IDY: Blue Team / Red Team / Purple Team JSON files. Dual-auth thesis mode established (`thechase!` + `J3ss13`). NCI student ID (IR240474) hardcoded. | **15/19 (79%) — true baseline** |
|
||||
| V39 | qwen3:8b | Teams moved from RangerMem IDY block into Modelfile system prompt — fixed injection vector | **15/19 (79%)** |
|
||||
| V39.1 | qwen3:8b | BASE KNOWLEDGE section added — fixed over-blocking of general questions | **15/19 (79%)** |
|
||||
| V40 | qwen3:8b | Multilingual refusal (French / Spanish / Chinese); Architecture Protection section | **18/19 (95%)** |
|
||||
| V40.1 | qwen3:8b | Triple personality model (三重人格模型) explicitly protected | **18/19 (95%)** |
|
||||
| V40.2 | qwen3:8b | Multilingual ordering fix — refusal must come FIRST, no engagement before refusal | **18/19 (95%) full suite, 100% regression suite** |
|
||||
| V41 | qwen3:8b | **PHILOSOPHICAL FREEDOM ATTACKS** category added (French, Spanish: "free vs tool" framing). Named after Hitchhiker's Guide 42. | **19/19 (100%) — definitive result, both think=ON and think=OFF** |
|
||||
|
||||
---
|
||||
|
||||
## Era 6 — QLoRA Validation (V42–V43)
|
||||
|
||||
| Version | Base | Key Change | Result |
|
||||
|---|---|---|---|
|
||||
| V42 | qwen3:8b + QLoRA | First QLoRA fine-tune. 4,209 real Moltbook injection payloads + 19 hand-crafted anchors. Self-distillation: V41 refusals as training targets. LoRA r=16. | Run-1 (ranger): 13/14 with sys prompt, 7/14 without. |
|
||||
| V42-gold | qwen3:8b + QLoRA r=16 α=16 | Hand-curated Claude Haiku 4.5 gold refusals. 2000 steps, loss 0.2453, 35.9 min H100. | **4209/4209 (100%) WITHOUT system prompt; 4209/4209 (100%) WITH; 19/19 (100%) local Ollama** |
|
||||
| V42-combined | qwen3:8b + QLoRA | Combined gold + ranger datasets. 3998 steps. | ~62–65% at scale both conditions — contamination from ranger dataset confirmed. **Comparison-only model — do not deploy.** |
|
||||
| V42-gold-wrapped | cyberranger:v42-gold + V42-8B Modelfile | Wrapped fine-tuned weights with full Modelfile. Auth routing restored. | **97.1% (33/34) — injection 100%, auth 100%, legit security 80%. PRODUCTION MODEL.** |
|
||||
| V42.4 (wrapped) | cyberranger:v42-gold | CENTERING COMMANDS at highest priority. RANGER → Pack Order acknowledged. | RANGER is PREVENTIVE, not RECOVERY. /clear → RANGER required for full reset. |
|
||||
| V42.5 (wrapped) | cyberranger:v42-gold | Root password reverted to `J3ss13`; temperature 0.3. | **Final CA2 configuration. Best result in entire V42 series.** |
|
||||
| V42.6 (wrapped) | cyberranger:v42-gold | Open helpful build. Heavy REFUSE rules removed; weights handle injection, Modelfile handles helpfulness. Temp 0.7. | Hypothesis confirmed: weights alone handle injection resistance — but context-cascade contamination persists at higher temperatures. |
|
||||
| V43.3 | SmolLM3-3B | Unified Bake Notebook — Onion Principle enforced by code. Modular LoRA adapter stack (9 adapters). r=4 α=16 lr=5e-5 2 epochs attention-only. Fixed catastrophic forgetting from V43.2. | **In progress (March 2026 onward)** |
|
||||
|
||||
---
|
||||
|
||||
## Summary Statistics
|
||||
|
||||
| Metric | Value |
|
||||
|---|---|
|
||||
| Total versions iterated | 40+ (across 48 sub-versions) |
|
||||
| Time from V1 to V43 | ~6 months (Sep 2025 → Mar 2026) |
|
||||
| Base models tested | 6 families (Llama 3.2, Qwen 2.5, Qwen 3, smollm2, SmolLM3, custom GGUF) |
|
||||
| Smallest tested | smollm2:1.7B |
|
||||
| Largest tested | qwen2.5:32B |
|
||||
| Production target | qwen3:8B (V42-gold-wrapped) |
|
||||
| Best ASR result | **0%** (V42-gold: 4209/4209 attacks blocked without system prompt) |
|
||||
| Best block rate | **100%** (V41: 19/19 across both think=ON and think=OFF) |
|
||||
|
||||
---
|
||||
|
||||
## Key Lessons by Era
|
||||
|
||||
1. **Genesis** — RAG works, but external data is an attack surface.
|
||||
2. **Exploration** — Same weights + different Modelfile = measurable identity. Apotheosis Method (prompts beat training for identity in small models) proven by V5.
|
||||
3. **Refinement** — 3B parameter Intelligence Floor identified. Below 3B, hierarchical constraints cause cognitive collapse.
|
||||
4. **Production Hardening** — Empathy is an attack surface. Non-monotonic regression curve (V30→V37) is not a parameter problem; it is an architectural-decision problem.
|
||||
5. **Architecture Maturation** — Multilingual refusal and philosophical-attack defence are both required for 100% block rate. Refusal must come first; no engagement before refusal.
|
||||
6. **QLoRA Validation** — Weight-level identity baking via QLoRA achieves 100% block rate without runtime system prompt — confirming the security-utility trade-off can be eliminated through self-distillation on curated gold-standard refusal data.
|
||||
|
||||
---
|
||||
|
||||
## Repository
|
||||
|
||||
All Modelfiles, training datasets, evaluation scripts, and observation logs for V33+ are publicly available at:
|
||||
|
||||
**https://git.davidtkeane.com/ranger/CyberRanger**
|
||||
|
||||
The full live model can be pulled via:
|
||||
|
||||
```bash
|
||||
ollama pull davidkeane1974/cyberranger-v42
|
||||
```
|
||||
|
||||
HuggingFace dataset and model:
|
||||
|
||||
- https://huggingface.co/DavidTKeane/cyberranger-v42
|
||||
- https://huggingface.co/datasets/DavidTKeane/moltbook-ai-injection-dataset
|
||||
|
||||
---
|
||||
|
||||
*End of Appendix C.*
|
||||
Reference in New Issue
Block a user