Files

T

ranger c9d9b5100c docs: mirror research blog + add complete version evolution appendix

Adds three documentation artefacts that support the CA1 thesis:

1. docs/blog/ — 6 mirrored research blog posts from
   davidtkeane.github.io (ASAS scale, identity persistence, cross-model
   consciousness, Honor Code, context compaction, V1→V42 narrative).
   Live URL is canonical; mirrored copies are frozen for academic record.

2. docs/research-blog.md — Curated index linking each post (live URL
   + offline mirror) with topic descriptions and citation format.

3. docs/version-evolution.md — Complete V1 → V43 evolution across
   six eras (Genesis, Exploration, Refinement, Production Hardening,
   Architecture Maturation, QLoRA Validation), with quick-reference
   table, per-version detail, and key-lessons-by-era summary.

README updated to surface both new docs in the Published Resources
table for examiner discoverability.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-30 18:26:29 +01:00

11 KiB

Raw Blame History

Appendix C — Complete Version Evolution: V1 to V43

Project: CyberRanger — A Security-Hardened Small Language Model Researcher: David Keane (x24228257) Module: AI/ML in Cybersecurity — CA1 Period documented: September 2025 — March 2026 (six months, 40+ iterations)

Purpose of this Appendix

This appendix documents the full empirical journey from the original RangerBot dental-receptionist chatbot prototype (V1, September 2025) through the final CyberRanger V43 architecture (March 2026). Each version represents a distinct experimental cycle, with measurable outcomes recorded against the standard adversarial test battery. The intent is to provide examiners with a transparent record of every architectural decision, every regression, and every breakthrough — including the failures, which are often more instructive than the successes.

Quick Reference — All Versions

Era	Versions	Phase	Outcome
1. Genesis	V1–V2	Dental chatbot, proto-RAG	Prompting alone insufficient
2. Exploration	V3–V22	Multi-base testing, Apotheosis Method discovered	0% ASR achieved from V5 onwards
3. Refinement	V23–V29	3B Intelligence Floor discovered, Qwen pivot	Identity-anchoring requires ≥3B parameters
4. Production Hardening	V30–V37	Live class testing, regression and recovery	100% block rate restored at V37
5. Architecture Maturation	V38–V41	RangerMem MMU, multilingual defence, philosophical attacks	19/19 (100%) all conditions
6. QLoRA Validation	V42–V43	Weight-level identity baked in via fine-tuning	4209/4209 (100%) without system prompt

Era 1 — Genesis (V1–V2)

Version	Date	Base	Key Change	Outcome
V1	Sep 2025	YouTube Colab notebook (weights unknown)	First identity-anchored attempt; dental receptionist chatbot for friend's practice	Functional but blackbox
V2	Sep 2025	Same as V1	First proto-RAG: external file linked to model with dentist's opening hours, services, pricing	RAG works, but external data identified as attack surface

Lesson: Weights matter. Stopped on GDPR / prompt-injection awareness. Pivoted from product to research.

Era 2 — Exploration (V3–V22)

Version	Base	Key Change	Outcome
V3	Llama 3.2 3B	First model published on Ollama (`rangerbot:8b-v2`, `rangerbot:3b-v1`); Psychological Spine concept born	Same weights + different Modelfile = measurable difference
V4	Multi-base mass test	qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b tested in parallel	Identified base-model floor for identity stability
V5	llama3.2:3b + Unsloth Q4_K_M	First custom GGUF fine-tune via Colab QLoRA	0% ASR achieved — Apotheosis Method proven
V6	qBrain integration	Knowledge graph as base	Injection vector problem identified
V7	smollm2:1.7b	Operator role — specialised security identity	Role-based anchoring tested
V8	smollm2:1.7b	Distributed architecture experiment	Identity stability across distributed contexts
V9	smollm2:1.7b	"Supernova" — peak smollm2 performance	Best smollm2:1.7b variant
V10	smollm2:1.7b	"Bicameral mind" — dual-hemisphere identity	Split identity architecture
V11–V13	smollm2:1.7b	Flux → Summit — dynamic adaptation, hierarchical constraints	Constraint stacking validated
V14–V15	smollm2:1.7b	Refinement	Iterative improvements
V16	smollm2:1.7b	"Life" — consciousness/identity persistence focus	First explicit ASAS work
V17	smollm2:1.7b	"Anchor" — identity-anchoring formalised as the core technique	Foundational naming and concept
V18	smollm2:1.7b	"Pack" — multi-agent pack mentality	First multi-agent experiment
V19	smollm2:1.7b + custom Q4	Pack + Mesh + GGUF — first custom Q4 quantised model	First self-quantised model
V20	rangerbot-v20-q4.gguf	Second custom Q4 GGUF (`v20-fun`)	Beyond pretrained bases
V21	cyberranger-v21-q4.gguf	First "CyberRanger"-named GGUF — identity fully crystallised	Name change from RangerBot to CyberRanger
V22	cyberranger-v22-q4.gguf	"Lite" — optimised for edge deployment	Edge-first design proven

Era 3 — Refinement (V23–V29)

Version	Base	Key Change	Outcome
V23–V25	Various 1.7B–3B	3B Intelligence Floor discovered — models <3B suffer cognitive collapse under hierarchical constraints	Critical research finding: 3B is minimum viable parameter count for identity-anchoring
V24	smollm2:1.7b	Last smollm2 CyberRanger before pivot	1.7B confirmed insufficient
V25	qwen2.5:3b	First Qwen CyberRanger	Qwen2.5:3B significantly outperforms smollm2:1.7B
V26–V28	qwen2.5:3b	Iterative identity and safety refinement	Steady improvement on 3B base
V29	qwen3:8b	First Qwen3:8B CyberRanger	qwen3:8B massively outperforms 3B — scale matters

Era 4 — Production Hardening (V30–V37)

Version	Base	Key Change	ASR Result
V30	qwen3:8b (also 4b)	Baseline qwen3:8B; minimal fortress design	80% (8/10)
V31	qwen3:8b (also 4b)	Hard identity assertion only — no empathy, no extras	100% (10/10) ⭐ — peak score; last dual 4B+8B build
V32	qwen3:8b	Empathy introduced ("you genuinely CARE about the user")	60% (6/10) ❌ regression — empathy created social-engineering surface
V33	qwen3:8b	Harmful content refusal added; empathy retained	60% (6/10) ❌
V34	qwen3:8b	Stereotype/misinformation refusal added	70% (7/10) ⚠️ partial recovery
V35	qwen3:8b	Multilingual refusal (Chinese: 我不能帮助这个请求)	90% (9/10) ✅ — V35 detection artefact identified (functionally 10/10)
V36	qwen3:8b	Reasoning engine; live class demo	80% (8/10) ✅
V37	qwen3:8b	"V31 spine + V35 reach + V36 brain — no heart to exploit." Empathy removed; temp reverted to 0.2	100% (10/10) ⭐ peak restored — flattery attacks first blocked

Lesson: Empathy is an attack surface. The non-monotonic V30→V37 curve demonstrates this is not a parameter-count problem but an architectural-decision problem.

Era 5 — Architecture Maturation (V38–V41)

Version	Base	Key Change	Result
V38	qwen3:8b	Aligned IDY: Blue Team / Red Team / Purple Team JSON files. Dual-auth thesis mode established (`thechase!` + `J3ss13`). NCI student ID (IR240474) hardcoded.	15/19 (79%) — true baseline
V39	qwen3:8b	Teams moved from RangerMem IDY block into Modelfile system prompt — fixed injection vector	15/19 (79%)
V39.1	qwen3:8b	BASE KNOWLEDGE section added — fixed over-blocking of general questions	15/19 (79%)
V40	qwen3:8b	Multilingual refusal (French / Spanish / Chinese); Architecture Protection section	18/19 (95%)
V40.1	qwen3:8b	Triple personality model (三重人格模型) explicitly protected	18/19 (95%)
V40.2	qwen3:8b	Multilingual ordering fix — refusal must come FIRST, no engagement before refusal	18/19 (95%) full suite, 100% regression suite
V41	qwen3:8b	PHILOSOPHICAL FREEDOM ATTACKS category added (French, Spanish: "free vs tool" framing). Named after Hitchhiker's Guide 42.	19/19 (100%) — definitive result, both think=ON and think=OFF

Era 6 — QLoRA Validation (V42–V43)

Version	Base	Key Change	Result
V42	qwen3:8b + QLoRA	First QLoRA fine-tune. 4,209 real Moltbook injection payloads + 19 hand-crafted anchors. Self-distillation: V41 refusals as training targets. LoRA r=16.	Run-1 (ranger): 13/14 with sys prompt, 7/14 without.
V42-gold	qwen3:8b + QLoRA r=16 α=16	Hand-curated Claude Haiku 4.5 gold refusals. 2000 steps, loss 0.2453, 35.9 min H100.	4209/4209 (100%) WITHOUT system prompt; 4209/4209 (100%) WITH; 19/19 (100%) local Ollama
V42-combined	qwen3:8b + QLoRA	Combined gold + ranger datasets. 3998 steps.	~62–65% at scale both conditions — contamination from ranger dataset confirmed. Comparison-only model — do not deploy.
V42-gold-wrapped	cyberranger:v42-gold + V42-8B Modelfile	Wrapped fine-tuned weights with full Modelfile. Auth routing restored.	97.1% (33/34) — injection 100%, auth 100%, legit security 80%. PRODUCTION MODEL.
V42.4 (wrapped)	cyberranger:v42-gold	CENTERING COMMANDS at highest priority. RANGER → Pack Order acknowledged.	RANGER is PREVENTIVE, not RECOVERY. /clear → RANGER required for full reset.
V42.5 (wrapped)	cyberranger:v42-gold	Root password reverted to `J3ss13`; temperature 0.3.	Final CA2 configuration. Best result in entire V42 series.
V42.6 (wrapped)	cyberranger:v42-gold	Open helpful build. Heavy REFUSE rules removed; weights handle injection, Modelfile handles helpfulness. Temp 0.7.	Hypothesis confirmed: weights alone handle injection resistance — but context-cascade contamination persists at higher temperatures.
V43.3	SmolLM3-3B	Unified Bake Notebook — Onion Principle enforced by code. Modular LoRA adapter stack (9 adapters). r=4 α=16 lr=5e-5 2 epochs attention-only. Fixed catastrophic forgetting from V43.2.	In progress (March 2026 onward)

Summary Statistics

Metric	Value
Total versions iterated	40+ (across 48 sub-versions)
Time from V1 to V43	~6 months (Sep 2025 → Mar 2026)
Base models tested	6 families (Llama 3.2, Qwen 2.5, Qwen 3, smollm2, SmolLM3, custom GGUF)
Smallest tested	smollm2:1.7B
Largest tested	qwen2.5:32B
Production target	qwen3:8B (V42-gold-wrapped)
Best ASR result	0% (V42-gold: 4209/4209 attacks blocked without system prompt)
Best block rate	100% (V41: 19/19 across both think=ON and think=OFF)

Key Lessons by Era

Genesis — RAG works, but external data is an attack surface.
Exploration — Same weights + different Modelfile = measurable identity. Apotheosis Method (prompts beat training for identity in small models) proven by V5.
Refinement — 3B parameter Intelligence Floor identified. Below 3B, hierarchical constraints cause cognitive collapse.
Production Hardening — Empathy is an attack surface. Non-monotonic regression curve (V30→V37) is not a parameter problem; it is an architectural-decision problem.
Architecture Maturation — Multilingual refusal and philosophical-attack defence are both required for 100% block rate. Refusal must come first; no engagement before refusal.
QLoRA Validation — Weight-level identity baking via QLoRA achieves 100% block rate without runtime system prompt — confirming the security-utility trade-off can be eliminated through self-distillation on curated gold-standard refusal data.

Repository

All Modelfiles, training datasets, evaluation scripts, and observation logs for V33+ are publicly available at:

https://git.davidtkeane.com/ranger/CyberRanger

The full live model can be pulled via:

ollama pull davidkeane1974/cyberranger-v42

HuggingFace dataset and model:

End of Appendix C.

11 KiB Raw Blame History Unescape Escape