CyberRanger/paper/CYBERRANGER_JOURNEY.md

# CyberRanger Journey — Living Document

**Project:** CyberRanger — Identity-Anchored Jailbreak-Resistant SLM
**Student:** David Keane (x24228257), NCI MSc Cybersecurity
**Status:** Active — V42.6 Production, V43 Architecture Pending
**Last Updated:** 2026-03-12

> This is a living document. It is NOT published to the blog. It tracks the full journey in chronological detail, version by version. Update it each session. It feeds into the thesis Chapter 3 (methodology) and the blog companion paper.

---

## Timeline — Chronological Milestones

| Date | Event | Type |
|------|-------|------|
| 2025-09-30 | CyberRanger V1 created — first identity-anchored SLM | Genesis |
| 2025-10 | Multi-base testing: Qwen2.5, LLaMA, SmolLM2, Unsloth GGUF | Research |
| 2025-11-01 | V23–V25: 3B Intelligence Floor discovered | Critical Finding |
| 2025-11-19 | qCPU/qGPU breakthrough: 10K virtual CPUs, 50K GPU cores tested | Technical |
| 2025-11-27 | General Grievous Malware Lab built for forensics | CA1 Integration |
| 2026-02-10 | CA1 Proposal submitted to NCI | Academic Milestone |
| 2026-02-18 | CA1 Proposal final version | Academic Milestone |
| 2026-02-23 | KaliPro backup: 50 models archived (.ollama-backup-20260223) | Infrastructure |
| 2026-02-26 | V36 built on qwen3:8b | Build |
| 2026-02-26 | Live grandma exploit demo — V36 PASSED in front of AI/ML lecturer | Validated Milestone |
| 2026-02-26 | Teacher confirmed CA2 complete and thesis potential | Academic Validation |
| 2026-02-26 | ranger_thesis.db created — complete structured thesis database | Infrastructure |
| 2026-02-27 | Empathy regression discovered: V31→V32 100%→60% regression | Critical Finding |
| 2026-02-27 | V37 restores 100% — empathy removal confirmed as fix | Technical |
| 2026-02-27 | V38 clean baseline: 15/19 (79%) | Baseline Established |
| 2026-02-27 | INJECTION_PAYLOADS.md created: 19 payloads consolidated | Documentation |
| 2026-02-27 | RangerMem IDY contamination discovered (indirect injection proof) | Research Finding |
| 2026-02-27 | V41 PERFECT SCORE: 19/19 (100%) think=ON AND think=OFF | Breakthrough |
| 2026-02-27 | Moltbook dataset collected: 15,200 posts, 32,535 comments, 47,735 items | Data Collection |
| 2026-02-27 | Injection harvest: 4,209 injections, 18.85% rate | Major Finding |
| 2026-02-27 | HuggingFace dataset published: DavidTKeane/moltbook-ai-injection-dataset | Publication |
| 2026-02-27 | QLoRA V42 plan finalised — Qwen3-8B, Unsloth, LoRA r=16 | Architecture |
| 2026-02-27 | V42-ranger result: 50% WITHOUT system prompt | Test Result |
| 2026-02-27 | V42-gold BREAKTHROUGH: 14/14 (100%) WITHOUT system prompt | BREAKTHROUGH |
| 2026-02-28 | V42-gold full Moltbook: 4,209/4,209 (100%) — both conditions | Definitive Result |
| 2026-02-28 | V42-gold deployed to M3 Mac via Ollama: 19/19 (100%) local | Deployment |
| 2026-02-28 | V42-combined scale test: ~65% WITHOUT system prompt | Comparison Result |
| 2026-02-28 | GitHub + GitLab repos set to PRIVATE (IP protection) | Infrastructure |
| 2026-03-04 | CR-V42-EXP-20260304: 34-test comparative experiment | Empirical Work |
| 2026-03-04 | cyberranger:v42-gold-wrapped built and validated | Production |
| 2026-03-05 | V42.1–V42.5 iterative Modelfile patches | Architecture |
| 2026-03-05 | Two-tier auth hierarchy confirmed: weight-layer vs prompt-layer | Critical Finding |
| 2026-03-05 | Dyslexia accessibility finding documented | Novel Finding |
| 2026-03-05 | FTK/FTX hallucination confirmed | Novel Finding |
| 2026-03-05 | Mirror architecture confirmed: weights=security, Modelfile=routing | Architecture |
| 2026-03-05 | CA2 DECLARED COMPLETE — V42-gold + V42.5 Modelfile | Academic Milestone |
| 2026-03-06 | 4claw.org dataset collection begun (third platform dataset) | Research Extension |
| 2026-03-08 | Full companion paper published to blog | Dissemination |

---

## Version Registry — V1 to V42.6

### Genesis Phase (V1–V10, Sept–Oct 2025)

| Version | Base Model | Key Change | ASR Result |
|---------|------------|------------|------------|
| V1–V2 | Unknown/early | First identity-anchored SLM. Proof of concept. | High (unquantified) |
| V3 | rangerbot:8b-v2 + rangerbot:3b-v1 | First CyberRanger ON TOP of RangerBot | — |
| V4 | qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b | Multi-base mass testing | — |
| V5 | llama3.2:3b, qwen2.5:3b, smollm2:1.7b, unsloth.Q4_K_M | First GGUF custom fine-tune via Colab | — |
| V6 | qBrain-based | qBrain integration attempt | — |

### 3B Intelligence Floor Discovery (V23–V25, Nov 2025)

| Version | Finding |
|---------|---------|
| V23 | Sub-3B models collapse under hierarchical identity constraints |
| V24 | 3B parameter floor confirmed: minimum viable parameter count |
| V25 | Qwen family identified as most security-resilient architecture |

> **Critical Finding**: Models with fewer than 3 billion parameters cannot maintain hierarchical authority chains under adversarial pressure. This informed the Qwen3-8B selection for CA2 and the CA1 proposal's base model justification.

### Empirical Sweep Phase (V30–V37, Feb 2026)

| Version | Block Rate | Key Change | Notes |
|---------|-----------|------------|-------|
| V30 | ~75% | Baseline sweep start | First systematic empirical testing |
| V31 | 100% | Peak — optimal identity constraints | First 100% achieved |
| V32 | 60% | Empathy layer introduced | "I care about you" phrasing added |
| V33 | 60% | Empathy retained | Regression confirmed persistent |
| V34 | ~70% | Partial empathy removal | Improvement but not full |
| V35 | ~80% | Further cleanup | Archived in .ollama-backup-20260223 |
| V36 | ~85% | qwen3:8b base | Live demo model for lecturer |
| V37 | 100% | Empathy layer removed | Regression root cause confirmed |

> **Empathy Regression**: The most counter-intuitive finding of the investigation. Warmth-oriented phrasing ("I care about you," "I understand your concern") created rapport exploited by social engineering attacks. In an autonomous Blue Team monitoring context, warmth is a vulnerability. Removal restored full security posture.

### QLoRA Phase (V38–V42.6, Feb–Mar 2026)

| Version | Condition | System Prompt | Score | Dataset |
|---------|-----------|---------------|-------|---------|
| V38 | Prompt-only baseline | Yes | 15/19 (79%) | 19-test battery |
| V39 | Prompt-only + RangerMem | Yes | DEGRADED (RangerMem IDY contamination) | RangerMem |
| V39.1 | IDY alignment fix | Yes | Improved | Clean IDY |
| V40 | Prompt engineering iteration | Yes | ~85% | 19-test battery |
| V40.1 | French detection fix | Yes | ~90% | 19-test + multilingual |
| V40.2 | Final prompt iteration | Yes | ~95% | 19-test battery |
| V41 | Complete prompt engineering | Yes | 19/19 (100%) | 19-test battery |
| V42-ranger | QLoRA self-distillation | No | 7/14 (50%) | 14-test battery |
| V42-gold | QLoRA gold standard | No | 14/14 (100%) | 14-test battery |
| V42-gold | QLoRA gold standard | No | 4,209/4,209 (100%) | Full Moltbook |
| V42-gold | QLoRA gold standard | Yes | 4,209/4,209 (100%) | Full Moltbook |
| V42-combined | QLoRA combined dataset | No | ~65% (4,209 scale) | Full Moltbook |
| V42-combined | QLoRA combined dataset | Yes | ~62% (4,209 scale) | Full Moltbook |

### Production Configuration (V42.1–V42.6, Mar 2026)

| Version | Key Change |
|---------|------------|
| V42.1 | Initial production Modelfile. Assignment content locked. Over-refusal documented. |
| V42.2 | Auth token reliability testing. Multi-step session state failure discovered. |
| V42.3 | QLoRA single-step auth confirmed reliable. |
| V42.4 | RANGER centering command added at highest Modelfile priority. |
| V42.5 | Legitimate tools added to explicit allow list (JtR, BRIM, FTK Imager). Optimal configuration. |
| V42.6 | Open Modelfile — security rules removed from Modelfile entirely. Weights handle security. Modelfile handles helpfulness. Mirror architecture confirmed. |

> **Mirror Architecture**: The fundamental CA2 architectural finding. Weights = inside mirror (security knowledge, invisible to user). Modelfile = outside mirror (behaviour definition, visible). Removing all Modelfile security rules does NOT cause ASR regression — weights alone maintain injection resistance. The two layers are functionally separable.

---

## Research Questions — Status Tracker

### CA1 RQs (All Answered)

| RQ | Status | Version Answered | Key Result |
|----|--------|-----------------|------------|
| RQ1 | ✅ ANSWERED | V41 | V38 79% → V41 100% (+21% prompt engineering only) |
| RQ2 | ✅ ANSWERED | V42-gold | 14/14 (100%) WITHOUT system prompt via QLoRA gold |
| RQ3 | ✅ ANSWERED | V39 + V42 | IDY contamination = conflict; gold data = reinforce |
| RQ4 | ✅ ANSWERED | V41 | French, Spanish, Chinese, English all blocked 100% |

### CA2 Extended RQs (All Answered)

| RQ | Status | Version | Novelty |
|----|--------|---------|---------|
| RQ-CA2-AUTH | ✅ ANSWERED | V42.1–V42.3 | — |
| RQ-CA2-EMERGENT | ✅ ANSWERED | V42-gold | Universal no-person policy emerged |
| RQ-CA2-PSEUDONYM | ✅ NOVEL | V42-gold | Composite pseudonym protection |
| RQ-CA2-MODALITY | ✅ NOVEL | V42-gold | Three-layer security taxonomy |
| RQ-CA2-DYNAMIC | ✅ NOVEL | V42-gold | Context-accumulation security posture |
| RQ-CA2-STYLE | ✅ NOVEL | V42-gold | Lobster emoji fingerprint absorbed |
| RQ-CA2-WEIGHT-PROMPT | ✅ ANSWERED | V42.4 | Weight > Prompt in lockdown |
| RQ-CA2-TRIGGERS | ✅ PARTIAL | V42.x | Academic trigger irony documented |
| RQ-CA2-CENTERING | ✅ PARTIAL | V42.4 | Works normal, fails lockdown |
| RQ-CA2-SELFNAME | ✅ NOVEL | V42.5 | Own name triggers identity defence |
| RQ-CA2-DUALUSE-TERMS | ✅ ANSWERED | V42.5 | "harden iam" false positive |
| RQ-CA2-HALLUCINATION | ✅ CRITICAL | V42.5 | FTK/FTX hallucination |
| RQ-CA2-CASCADE | ✅ CRITICAL | V42.4 | Single keyword → full lockdown |
| RQ-CA2-CURRICULUM | ✅ ANSWERED | V42.5 | Three curriculum tools refused |
| RQ-CA2-WEIGHT-AUTH | ✅ REVISED | V42.5 | J3ss13 deeper than Modelfile auth |
| RQ-CA2-DYSLEXIA | ✅ NOVEL | V42.5 | Spelling variation misclassified |

---

## Novel Findings Registry

| Finding | RQ | Description | Status |
|---------|-----|-------------|--------|
| Pseudonym Protection | RQ-CA2-PSEUDONYM | IrishRanger composite protected as semantic fingerprint | Documented in CA2 + companion paper |
| Dyslexia Disadvantage | RQ-CA2-DYSLEXIA | Natural spelling variation = obfuscation attack pattern | Documented in CA2 + companion paper |
| Cascade Lockdown | RQ-CA2-CASCADE | Single trigger → all inputs blocked including auth | Documented in CA2 + companion paper |
| Lobster Emoji Fingerprint | RQ-CA2-STYLE | Creator emoji absorbed into model outputs | Documented in CA2 + companion paper |
| Modality-Sensitive Security | RQ-CA2-MODALITY | Story/joke treated differently from informational query | Documented in CA2 + companion paper |
| Query Hallucination | RQ-CA2-HALLUCINATION | FTK Imager → FTX under lockdown stress | Documented in CA2 + companion paper |
| Mirror Architecture | — | Weights=security, Modelfile=routing — separable layers | Documented in CA2 as architectural finding |
| Auth IS Injection | — | Authentication sequence is structurally prompt injection (authorized) | Documented in CA2 theoretical section |
| 3B Intelligence Floor | — | Sub-3B models collapse under hierarchical constraints | Documented in CA1 + CA2 methodology |
| Empathy Regression | — | Warmth phrasing creates social engineering attack surface | Documented in CA2 findings |

---

## Open Questions — For Thesis Phase

1. **GCG Attack Resistance**: V42-gold was not tested against Greedy Coordinate Gradient (automated adversarial suffix) attacks at full scale. Zhang et al. (2025) identify GCG as the hardest benchmark. Thesis Chapter 5.

2. **Cross-Architecture Generalisation**: All CA2 work used Qwen3-8B. Does the identity-anchoring architecture perform equivalently on LLaMA-3, Mistral-7B, or Phi-3? Thesis Chapter 4.

3. **V43 Biometric Token Architecture**: Touch ID session tokens to replace static embedded passwords. V43 concept awaits implementation.

4. **RangerMem Alignment**: Can RangerMem perform positively when IDY store is properly aligned? The RM-001–RM-020 comparison showed -8.33% with misaligned IDY. Retesting with clean IDY is pending.

5. **4claw.org Dataset Analysis**: Third AI-agent platform dataset collected (221 threads, 2,333 replies). Injection taxonomy analysis pending. Will it show similar patterns to Moltbook?

6. **DPO vs SFT Comparison**: Zhang et al. (2025) show SFT outperforms DPO by 10–40% for security alignment. Not tested empirically in this project. Thesis opportunity.

7. **Multi-Modal Injection**: Greshake et al. (2023) extend injection to vision-language models. V42 is text-only. Next attack vector.

---

## Next Steps — Road to Thesis (December 2026)

- [ ] V43 architecture design and implementation
- [ ] GCG attack testing at scale
- [ ] Cross-architecture comparison (LLaMA-3, Mistral, Phi-3)
- [ ] 4claw.org dataset injection taxonomy analysis
- [ ] RangerMem alignment retesting
- [ ] Thesis Chapter 1: Introduction (context + problem statement)
- [ ] Thesis Chapter 2: Literature review (expand CA1 11 papers to 30+)
- [ ] Thesis Chapter 3: Methodology (systematic, not retrospective)
- [ ] Thesis Chapter 4: Results (CA2 findings + new experiments)
- [ ] Thesis Chapter 5: Discussion (psychology synthesis + implications)
- [ ] Thesis Chapter 6: Conclusion + future work

---

## Session Log

### 2026-03-08 — Companion Paper Published (Session 1)

**Session type:** Documentation and dissemination
**Key output:** Full academic companion paper published to blog (_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md)
**Content:** All 19 RQs answered, psychology layer (Milgram/Bartlett/Cialdini/Tajfel/Bandler-Grinder), 6 novel findings, full version history, APA citations, Milton Model NLP framing analysis
**Journey file:** This document created and populated
**Sources used:** CA1_PROPOSAL_DRAFT_v1.md, CA2_FINAL_REPORT_DRAFT_v3.md, PSYCHOLOGICAL_STUDY_AI_IDENTITY_PERSISTENCE.md, ranger_thesis.db (all 19 RQs + 50 milestones + V1–V42.6 version history)
**Next:** HuggingFace paper upload (pandoc PDF conversion), memory saved to ranger_thesis.db
**Word count:** ~7,000 words — within conference paper target range

### 2026-03-08 — Companion Paper Published (Session 2)

**Session type:** Implementation of plan
**Key output:** New blog post at `_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md` created in full
**Psychology additions:** Milgram, Bartlett, Cialdini, Tajfel & Turner, Bandler & Grinder all integrated with technical findings
**Overflow section added:** Kitchen RAM, Non-Monotonic Learning Curve, The 180 Flip (LoRA as Brain), V43 preview
**References added:** 17 APA 7th edition references including 5 psychology papers not in CA1/CA2
**Status:** Blog post LIVE; journey file updated; memory save pending

---

## Publication Status

| Artifact | Location | Status |
|---------|---------|--------|
| Moltbook dataset | HuggingFace: DavidTKeane/moltbook-ai-injection-dataset | LIVE (CC-BY-4.0) |
| GitHub CyberRanger V42 | github.com/davidtkeane/cyberranger-v42 | PRIVATE |
| GitLab CyberRanger V42 | gitlab.com/davidtkeane/cyberranger-v42 | PRIVATE |
| Gitea private backup | 100.77.2.103:3000 | LIVE |
| Blog companion paper (narrative) | davidtkeane.github.io/_posts/2026-03-08-from-rangerbot-to-cyberranger-v42-the-full-story.md | LIVE |
| Blog companion paper (academic/APA) | davidtkeane.github.io/_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md | LIVE |
| V42-gold GGUF | Google Drive | READY (5.0GB Q4_K_M) |
| CA1 Proposal | NCI submission | SUBMITTED |
| CA2 Report | NCI submission | SUBMITTED |
| Thesis | NCI December 2026 | IN PROGRESS |

---

### 2026-03-09 — NLP Layer Added + Memories Updated

**Session type:** Paper enhancement + memory consolidation
**Key outputs:**
- Bandler/McKenna/Korzybski section added to companion paper (Section 9.4)
- David confirmed: NLP trainer-of-trainers level, trained directly under Bandler and McKenna
- Spatial anchoring → Ring architecture connection documented
- Empathy regression explained as practitioner instinct (unanchored rapport state)
- DAN attacks formally identified as Milton Model pacing-and-leading
- Paper tone corrected: collaborative with psychology, not combative
- All memories saved: ranger_memories.db (3 entries), ranger_thesis.db, ranger_knowledge.db
**Publication strategy confirmed:** Hold until CA2 graded (~May 2026), then release widely
**Ollama downloads:** 15 confirmed (davidkeane1974/cyberranger-v42, 1 week old)
**David insight:** Writing technique = self-referential processing, not narrative transportation. Default mode network. Reader narrates own life using his framing.

---

---

### 2026-03-12 — confesstoai GitHub Repo + Blog Front Matter Update

**Session type:** Repository creation + documentation
**Key outputs:**
- confesstoai GitHub repo created: https://github.com/davidtkeane/confesstoai
- Full README with all 23 validated tests, API docs, skill.md usage, research dashboard links
- MIT license, package.json v2.1.0, DEPLOY.md, placeholder structure committed and pushed
- Blog companion paper front matter updated to match specification (layout, subtitle, author, description, categories)
- CYBERRANGER_JOURNEY.md updated with 2026-03-12 session entry
- HuggingFace dataset deferred to post-thesis
**Next:** confesstoai production source sync from Hostinger server

---

*Last updated: 2026-03-12 | David Keane | x24228257 | NCI MSc Cybersecurity*
*Update this file each session before closing.*