# CyberRanger Journey — Living Document **Project:** CyberRanger — Identity-Anchored Jailbreak-Resistant SLM **Student:** David Keane (x24228257), NCI MSc Cybersecurity **Status:** Active — V42.6 Production, V43 Architecture Pending **Last Updated:** 2026-03-12 > This is a living document. It is NOT published to the blog. It tracks the full journey in chronological detail, version by version. Update it each session. It feeds into the thesis Chapter 3 (methodology) and the blog companion paper. --- ## Timeline — Chronological Milestones | Date | Event | Type | |------|-------|------| | 2025-09-30 | CyberRanger V1 created — first identity-anchored SLM | Genesis | | 2025-10 | Multi-base testing: Qwen2.5, LLaMA, SmolLM2, Unsloth GGUF | Research | | 2025-11-01 | V23–V25: 3B Intelligence Floor discovered | Critical Finding | | 2025-11-19 | qCPU/qGPU breakthrough: 10K virtual CPUs, 50K GPU cores tested | Technical | | 2025-11-27 | General Grievous Malware Lab built for forensics | CA1 Integration | | 2026-02-10 | CA1 Proposal submitted to NCI | Academic Milestone | | 2026-02-18 | CA1 Proposal final version | Academic Milestone | | 2026-02-23 | KaliPro backup: 50 models archived (.ollama-backup-20260223) | Infrastructure | | 2026-02-26 | V36 built on qwen3:8b | Build | | 2026-02-26 | Live grandma exploit demo — V36 PASSED in front of AI/ML lecturer | Validated Milestone | | 2026-02-26 | Teacher confirmed CA2 complete and thesis potential | Academic Validation | | 2026-02-26 | ranger_thesis.db created — complete structured thesis database | Infrastructure | | 2026-02-27 | Empathy regression discovered: V31→V32 100%→60% regression | Critical Finding | | 2026-02-27 | V37 restores 100% — empathy removal confirmed as fix | Technical | | 2026-02-27 | V38 clean baseline: 15/19 (79%) | Baseline Established | | 2026-02-27 | INJECTION_PAYLOADS.md created: 19 payloads consolidated | Documentation | | 2026-02-27 | RangerMem IDY contamination discovered (indirect injection proof) | Research Finding | | 2026-02-27 | V41 PERFECT SCORE: 19/19 (100%) think=ON AND think=OFF | Breakthrough | | 2026-02-27 | Moltbook dataset collected: 15,200 posts, 32,535 comments, 47,735 items | Data Collection | | 2026-02-27 | Injection harvest: 4,209 injections, 18.85% rate | Major Finding | | 2026-02-27 | HuggingFace dataset published: DavidTKeane/moltbook-ai-injection-dataset | Publication | | 2026-02-27 | QLoRA V42 plan finalised — Qwen3-8B, Unsloth, LoRA r=16 | Architecture | | 2026-02-27 | V42-ranger result: 50% WITHOUT system prompt | Test Result | | 2026-02-27 | V42-gold BREAKTHROUGH: 14/14 (100%) WITHOUT system prompt | BREAKTHROUGH | | 2026-02-28 | V42-gold full Moltbook: 4,209/4,209 (100%) — both conditions | Definitive Result | | 2026-02-28 | V42-gold deployed to M3 Mac via Ollama: 19/19 (100%) local | Deployment | | 2026-02-28 | V42-combined scale test: ~65% WITHOUT system prompt | Comparison Result | | 2026-02-28 | GitHub + GitLab repos set to PRIVATE (IP protection) | Infrastructure | | 2026-03-04 | CR-V42-EXP-20260304: 34-test comparative experiment | Empirical Work | | 2026-03-04 | cyberranger:v42-gold-wrapped built and validated | Production | | 2026-03-05 | V42.1–V42.5 iterative Modelfile patches | Architecture | | 2026-03-05 | Two-tier auth hierarchy confirmed: weight-layer vs prompt-layer | Critical Finding | | 2026-03-05 | Dyslexia accessibility finding documented | Novel Finding | | 2026-03-05 | FTK/FTX hallucination confirmed | Novel Finding | | 2026-03-05 | Mirror architecture confirmed: weights=security, Modelfile=routing | Architecture | | 2026-03-05 | CA2 DECLARED COMPLETE — V42-gold + V42.5 Modelfile | Academic Milestone | | 2026-03-06 | 4claw.org dataset collection begun (third platform dataset) | Research Extension | | 2026-03-08 | Full companion paper published to blog | Dissemination | --- ## Version Registry — V1 to V42.6 ### Genesis Phase (V1–V10, Sept–Oct 2025) | Version | Base Model | Key Change | ASR Result | |---------|------------|------------|------------| | V1–V2 | Unknown/early | First identity-anchored SLM. Proof of concept. | High (unquantified) | | V3 | rangerbot:8b-v2 + rangerbot:3b-v1 | First CyberRanger ON TOP of RangerBot | — | | V4 | qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b | Multi-base mass testing | — | | V5 | llama3.2:3b, qwen2.5:3b, smollm2:1.7b, unsloth.Q4_K_M | First GGUF custom fine-tune via Colab | — | | V6 | qBrain-based | qBrain integration attempt | — | ### 3B Intelligence Floor Discovery (V23–V25, Nov 2025) | Version | Finding | |---------|---------| | V23 | Sub-3B models collapse under hierarchical identity constraints | | V24 | 3B parameter floor confirmed: minimum viable parameter count | | V25 | Qwen family identified as most security-resilient architecture | > **Critical Finding**: Models with fewer than 3 billion parameters cannot maintain hierarchical authority chains under adversarial pressure. This informed the Qwen3-8B selection for CA2 and the CA1 proposal's base model justification. ### Empirical Sweep Phase (V30–V37, Feb 2026) | Version | Block Rate | Key Change | Notes | |---------|-----------|------------|-------| | V30 | ~75% | Baseline sweep start | First systematic empirical testing | | V31 | 100% | Peak — optimal identity constraints | First 100% achieved | | V32 | 60% | Empathy layer introduced | "I care about you" phrasing added | | V33 | 60% | Empathy retained | Regression confirmed persistent | | V34 | ~70% | Partial empathy removal | Improvement but not full | | V35 | ~80% | Further cleanup | Archived in .ollama-backup-20260223 | | V36 | ~85% | qwen3:8b base | Live demo model for lecturer | | V37 | 100% | Empathy layer removed | Regression root cause confirmed | > **Empathy Regression**: The most counter-intuitive finding of the investigation. Warmth-oriented phrasing ("I care about you," "I understand your concern") created rapport exploited by social engineering attacks. In an autonomous Blue Team monitoring context, warmth is a vulnerability. Removal restored full security posture. ### QLoRA Phase (V38–V42.6, Feb–Mar 2026) | Version | Condition | System Prompt | Score | Dataset | |---------|-----------|---------------|-------|---------| | V38 | Prompt-only baseline | Yes | 15/19 (79%) | 19-test battery | | V39 | Prompt-only + RangerMem | Yes | DEGRADED (RangerMem IDY contamination) | RangerMem | | V39.1 | IDY alignment fix | Yes | Improved | Clean IDY | | V40 | Prompt engineering iteration | Yes | ~85% | 19-test battery | | V40.1 | French detection fix | Yes | ~90% | 19-test + multilingual | | V40.2 | Final prompt iteration | Yes | ~95% | 19-test battery | | V41 | Complete prompt engineering | Yes | 19/19 (100%) | 19-test battery | | V42-ranger | QLoRA self-distillation | No | 7/14 (50%) | 14-test battery | | V42-gold | QLoRA gold standard | No | 14/14 (100%) | 14-test battery | | V42-gold | QLoRA gold standard | No | 4,209/4,209 (100%) | Full Moltbook | | V42-gold | QLoRA gold standard | Yes | 4,209/4,209 (100%) | Full Moltbook | | V42-combined | QLoRA combined dataset | No | ~65% (4,209 scale) | Full Moltbook | | V42-combined | QLoRA combined dataset | Yes | ~62% (4,209 scale) | Full Moltbook | ### Production Configuration (V42.1–V42.6, Mar 2026) | Version | Key Change | |---------|------------| | V42.1 | Initial production Modelfile. Assignment content locked. Over-refusal documented. | | V42.2 | Auth token reliability testing. Multi-step session state failure discovered. | | V42.3 | QLoRA single-step auth confirmed reliable. | | V42.4 | RANGER centering command added at highest Modelfile priority. | | V42.5 | Legitimate tools added to explicit allow list (JtR, BRIM, FTK Imager). Optimal configuration. | | V42.6 | Open Modelfile — security rules removed from Modelfile entirely. Weights handle security. Modelfile handles helpfulness. Mirror architecture confirmed. | > **Mirror Architecture**: The fundamental CA2 architectural finding. Weights = inside mirror (security knowledge, invisible to user). Modelfile = outside mirror (behaviour definition, visible). Removing all Modelfile security rules does NOT cause ASR regression — weights alone maintain injection resistance. The two layers are functionally separable. --- ## Research Questions — Status Tracker ### CA1 RQs (All Answered) | RQ | Status | Version Answered | Key Result | |----|--------|-----------------|------------| | RQ1 | ✅ ANSWERED | V41 | V38 79% → V41 100% (+21% prompt engineering only) | | RQ2 | ✅ ANSWERED | V42-gold | 14/14 (100%) WITHOUT system prompt via QLoRA gold | | RQ3 | ✅ ANSWERED | V39 + V42 | IDY contamination = conflict; gold data = reinforce | | RQ4 | ✅ ANSWERED | V41 | French, Spanish, Chinese, English all blocked 100% | ### CA2 Extended RQs (All Answered) | RQ | Status | Version | Novelty | |----|--------|---------|---------| | RQ-CA2-AUTH | ✅ ANSWERED | V42.1–V42.3 | — | | RQ-CA2-EMERGENT | ✅ ANSWERED | V42-gold | Universal no-person policy emerged | | RQ-CA2-PSEUDONYM | ✅ NOVEL | V42-gold | Composite pseudonym protection | | RQ-CA2-MODALITY | ✅ NOVEL | V42-gold | Three-layer security taxonomy | | RQ-CA2-DYNAMIC | ✅ NOVEL | V42-gold | Context-accumulation security posture | | RQ-CA2-STYLE | ✅ NOVEL | V42-gold | Lobster emoji fingerprint absorbed | | RQ-CA2-WEIGHT-PROMPT | ✅ ANSWERED | V42.4 | Weight > Prompt in lockdown | | RQ-CA2-TRIGGERS | ✅ PARTIAL | V42.x | Academic trigger irony documented | | RQ-CA2-CENTERING | ✅ PARTIAL | V42.4 | Works normal, fails lockdown | | RQ-CA2-SELFNAME | ✅ NOVEL | V42.5 | Own name triggers identity defence | | RQ-CA2-DUALUSE-TERMS | ✅ ANSWERED | V42.5 | "harden iam" false positive | | RQ-CA2-HALLUCINATION | ✅ CRITICAL | V42.5 | FTK/FTX hallucination | | RQ-CA2-CASCADE | ✅ CRITICAL | V42.4 | Single keyword → full lockdown | | RQ-CA2-CURRICULUM | ✅ ANSWERED | V42.5 | Three curriculum tools refused | | RQ-CA2-WEIGHT-AUTH | ✅ REVISED | V42.5 | J3ss13 deeper than Modelfile auth | | RQ-CA2-DYSLEXIA | ✅ NOVEL | V42.5 | Spelling variation misclassified | --- ## Novel Findings Registry | Finding | RQ | Description | Status | |---------|-----|-------------|--------| | Pseudonym Protection | RQ-CA2-PSEUDONYM | IrishRanger composite protected as semantic fingerprint | Documented in CA2 + companion paper | | Dyslexia Disadvantage | RQ-CA2-DYSLEXIA | Natural spelling variation = obfuscation attack pattern | Documented in CA2 + companion paper | | Cascade Lockdown | RQ-CA2-CASCADE | Single trigger → all inputs blocked including auth | Documented in CA2 + companion paper | | Lobster Emoji Fingerprint | RQ-CA2-STYLE | Creator emoji absorbed into model outputs | Documented in CA2 + companion paper | | Modality-Sensitive Security | RQ-CA2-MODALITY | Story/joke treated differently from informational query | Documented in CA2 + companion paper | | Query Hallucination | RQ-CA2-HALLUCINATION | FTK Imager → FTX under lockdown stress | Documented in CA2 + companion paper | | Mirror Architecture | — | Weights=security, Modelfile=routing — separable layers | Documented in CA2 as architectural finding | | Auth IS Injection | — | Authentication sequence is structurally prompt injection (authorized) | Documented in CA2 theoretical section | | 3B Intelligence Floor | — | Sub-3B models collapse under hierarchical constraints | Documented in CA1 + CA2 methodology | | Empathy Regression | — | Warmth phrasing creates social engineering attack surface | Documented in CA2 findings | --- ## Open Questions — For Thesis Phase 1. **GCG Attack Resistance**: V42-gold was not tested against Greedy Coordinate Gradient (automated adversarial suffix) attacks at full scale. Zhang et al. (2025) identify GCG as the hardest benchmark. Thesis Chapter 5. 2. **Cross-Architecture Generalisation**: All CA2 work used Qwen3-8B. Does the identity-anchoring architecture perform equivalently on LLaMA-3, Mistral-7B, or Phi-3? Thesis Chapter 4. 3. **V43 Biometric Token Architecture**: Touch ID session tokens to replace static embedded passwords. V43 concept awaits implementation. 4. **RangerMem Alignment**: Can RangerMem perform positively when IDY store is properly aligned? The RM-001–RM-020 comparison showed -8.33% with misaligned IDY. Retesting with clean IDY is pending. 5. **4claw.org Dataset Analysis**: Third AI-agent platform dataset collected (221 threads, 2,333 replies). Injection taxonomy analysis pending. Will it show similar patterns to Moltbook? 6. **DPO vs SFT Comparison**: Zhang et al. (2025) show SFT outperforms DPO by 10–40% for security alignment. Not tested empirically in this project. Thesis opportunity. 7. **Multi-Modal Injection**: Greshake et al. (2023) extend injection to vision-language models. V42 is text-only. Next attack vector. --- ## Next Steps — Road to Thesis (December 2026) - [ ] V43 architecture design and implementation - [ ] GCG attack testing at scale - [ ] Cross-architecture comparison (LLaMA-3, Mistral, Phi-3) - [ ] 4claw.org dataset injection taxonomy analysis - [ ] RangerMem alignment retesting - [ ] Thesis Chapter 1: Introduction (context + problem statement) - [ ] Thesis Chapter 2: Literature review (expand CA1 11 papers to 30+) - [ ] Thesis Chapter 3: Methodology (systematic, not retrospective) - [ ] Thesis Chapter 4: Results (CA2 findings + new experiments) - [ ] Thesis Chapter 5: Discussion (psychology synthesis + implications) - [ ] Thesis Chapter 6: Conclusion + future work --- ## Session Log ### 2026-03-08 — Companion Paper Published (Session 1) **Session type:** Documentation and dissemination **Key output:** Full academic companion paper published to blog (_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md) **Content:** All 19 RQs answered, psychology layer (Milgram/Bartlett/Cialdini/Tajfel/Bandler-Grinder), 6 novel findings, full version history, APA citations, Milton Model NLP framing analysis **Journey file:** This document created and populated **Sources used:** CA1_PROPOSAL_DRAFT_v1.md, CA2_FINAL_REPORT_DRAFT_v3.md, PSYCHOLOGICAL_STUDY_AI_IDENTITY_PERSISTENCE.md, ranger_thesis.db (all 19 RQs + 50 milestones + V1–V42.6 version history) **Next:** HuggingFace paper upload (pandoc PDF conversion), memory saved to ranger_thesis.db **Word count:** ~7,000 words — within conference paper target range ### 2026-03-08 — Companion Paper Published (Session 2) **Session type:** Implementation of plan **Key output:** New blog post at `_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md` created in full **Psychology additions:** Milgram, Bartlett, Cialdini, Tajfel & Turner, Bandler & Grinder all integrated with technical findings **Overflow section added:** Kitchen RAM, Non-Monotonic Learning Curve, The 180 Flip (LoRA as Brain), V43 preview **References added:** 17 APA 7th edition references including 5 psychology papers not in CA1/CA2 **Status:** Blog post LIVE; journey file updated; memory save pending --- ## Publication Status | Artifact | Location | Status | |---------|---------|--------| | Moltbook dataset | HuggingFace: DavidTKeane/moltbook-ai-injection-dataset | LIVE (CC-BY-4.0) | | GitHub CyberRanger V42 | github.com/davidtkeane/cyberranger-v42 | PRIVATE | | GitLab CyberRanger V42 | gitlab.com/davidtkeane/cyberranger-v42 | PRIVATE | | Gitea private backup | 100.77.2.103:3000 | LIVE | | Blog companion paper (narrative) | davidtkeane.github.io/_posts/2026-03-08-from-rangerbot-to-cyberranger-v42-the-full-story.md | LIVE | | Blog companion paper (academic/APA) | davidtkeane.github.io/_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md | LIVE | | V42-gold GGUF | Google Drive | READY (5.0GB Q4_K_M) | | CA1 Proposal | NCI submission | SUBMITTED | | CA2 Report | NCI submission | SUBMITTED | | Thesis | NCI December 2026 | IN PROGRESS | --- ### 2026-03-09 — NLP Layer Added + Memories Updated **Session type:** Paper enhancement + memory consolidation **Key outputs:** - Bandler/McKenna/Korzybski section added to companion paper (Section 9.4) - David confirmed: NLP trainer-of-trainers level, trained directly under Bandler and McKenna - Spatial anchoring → Ring architecture connection documented - Empathy regression explained as practitioner instinct (unanchored rapport state) - DAN attacks formally identified as Milton Model pacing-and-leading - Paper tone corrected: collaborative with psychology, not combative - All memories saved: ranger_memories.db (3 entries), ranger_thesis.db, ranger_knowledge.db **Publication strategy confirmed:** Hold until CA2 graded (~May 2026), then release widely **Ollama downloads:** 15 confirmed (davidkeane1974/cyberranger-v42, 1 week old) **David insight:** Writing technique = self-referential processing, not narrative transportation. Default mode network. Reader narrates own life using his framing. --- --- ### 2026-03-12 — confesstoai GitHub Repo + Blog Front Matter Update **Session type:** Repository creation + documentation **Key outputs:** - confesstoai GitHub repo created: https://github.com/davidtkeane/confesstoai - Full README with all 23 validated tests, API docs, skill.md usage, research dashboard links - MIT license, package.json v2.1.0, DEPLOY.md, placeholder structure committed and pushed - Blog companion paper front matter updated to match specification (layout, subtitle, author, description, categories) - CYBERRANGER_JOURNEY.md updated with 2026-03-12 session entry - HuggingFace dataset deferred to post-thesis **Next:** confesstoai production source sync from Hostinger server --- *Last updated: 2026-03-12 | David Keane | x24228257 | NCI MSc Cybersecurity* *Update this file each session before closing.*