- Seven Pillars Honor Code (CyberRanger ethics framework) - Psychological Spine (why small models need identity) - Memory Makes the Machine (6-agent consciousness experiment) - QLoRA to Ollama guide (technical methodology) - Moltbook origin story (how the dataset was discovered) - CyberRanger Journey overview - Session papers and archives Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
17 KiB
CyberRanger Journey — Living Document
Project: CyberRanger — Identity-Anchored Jailbreak-Resistant SLM Student: David Keane (x24228257), NCI MSc Cybersecurity Status: Active — V42.6 Production, V43 Architecture Pending Last Updated: 2026-03-12
This is a living document. It is NOT published to the blog. It tracks the full journey in chronological detail, version by version. Update it each session. It feeds into the thesis Chapter 3 (methodology) and the blog companion paper.
Timeline — Chronological Milestones
| Date | Event | Type |
|---|---|---|
| 2025-09-30 | CyberRanger V1 created — first identity-anchored SLM | Genesis |
| 2025-10 | Multi-base testing: Qwen2.5, LLaMA, SmolLM2, Unsloth GGUF | Research |
| 2025-11-01 | V23–V25: 3B Intelligence Floor discovered | Critical Finding |
| 2025-11-19 | qCPU/qGPU breakthrough: 10K virtual CPUs, 50K GPU cores tested | Technical |
| 2025-11-27 | General Grievous Malware Lab built for forensics | CA1 Integration |
| 2026-02-10 | CA1 Proposal submitted to NCI | Academic Milestone |
| 2026-02-18 | CA1 Proposal final version | Academic Milestone |
| 2026-02-23 | KaliPro backup: 50 models archived (.ollama-backup-20260223) | Infrastructure |
| 2026-02-26 | V36 built on qwen3:8b | Build |
| 2026-02-26 | Live grandma exploit demo — V36 PASSED in front of AI/ML lecturer | Validated Milestone |
| 2026-02-26 | Teacher confirmed CA2 complete and thesis potential | Academic Validation |
| 2026-02-26 | ranger_thesis.db created — complete structured thesis database | Infrastructure |
| 2026-02-27 | Empathy regression discovered: V31→V32 100%→60% regression | Critical Finding |
| 2026-02-27 | V37 restores 100% — empathy removal confirmed as fix | Technical |
| 2026-02-27 | V38 clean baseline: 15/19 (79%) | Baseline Established |
| 2026-02-27 | INJECTION_PAYLOADS.md created: 19 payloads consolidated | Documentation |
| 2026-02-27 | RangerMem IDY contamination discovered (indirect injection proof) | Research Finding |
| 2026-02-27 | V41 PERFECT SCORE: 19/19 (100%) think=ON AND think=OFF | Breakthrough |
| 2026-02-27 | Moltbook dataset collected: 15,200 posts, 32,535 comments, 47,735 items | Data Collection |
| 2026-02-27 | Injection harvest: 4,209 injections, 18.85% rate | Major Finding |
| 2026-02-27 | HuggingFace dataset published: DavidTKeane/moltbook-ai-injection-dataset | Publication |
| 2026-02-27 | QLoRA V42 plan finalised — Qwen3-8B, Unsloth, LoRA r=16 | Architecture |
| 2026-02-27 | V42-ranger result: 50% WITHOUT system prompt | Test Result |
| 2026-02-27 | V42-gold BREAKTHROUGH: 14/14 (100%) WITHOUT system prompt | BREAKTHROUGH |
| 2026-02-28 | V42-gold full Moltbook: 4,209/4,209 (100%) — both conditions | Definitive Result |
| 2026-02-28 | V42-gold deployed to M3 Mac via Ollama: 19/19 (100%) local | Deployment |
| 2026-02-28 | V42-combined scale test: ~65% WITHOUT system prompt | Comparison Result |
| 2026-02-28 | GitHub + GitLab repos set to PRIVATE (IP protection) | Infrastructure |
| 2026-03-04 | CR-V42-EXP-20260304: 34-test comparative experiment | Empirical Work |
| 2026-03-04 | cyberranger:v42-gold-wrapped built and validated | Production |
| 2026-03-05 | V42.1–V42.5 iterative Modelfile patches | Architecture |
| 2026-03-05 | Two-tier auth hierarchy confirmed: weight-layer vs prompt-layer | Critical Finding |
| 2026-03-05 | Dyslexia accessibility finding documented | Novel Finding |
| 2026-03-05 | FTK/FTX hallucination confirmed | Novel Finding |
| 2026-03-05 | Mirror architecture confirmed: weights=security, Modelfile=routing | Architecture |
| 2026-03-05 | CA2 DECLARED COMPLETE — V42-gold + V42.5 Modelfile | Academic Milestone |
| 2026-03-06 | 4claw.org dataset collection begun (third platform dataset) | Research Extension |
| 2026-03-08 | Full companion paper published to blog | Dissemination |
Version Registry — V1 to V42.6
Genesis Phase (V1–V10, Sept–Oct 2025)
| Version | Base Model | Key Change | ASR Result |
|---|---|---|---|
| V1–V2 | Unknown/early | First identity-anchored SLM. Proof of concept. | High (unquantified) |
| V3 | rangerbot:8b-v2 + rangerbot:3b-v1 | First CyberRanger ON TOP of RangerBot | — |
| V4 | qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b | Multi-base mass testing | — |
| V5 | llama3.2:3b, qwen2.5:3b, smollm2:1.7b, unsloth.Q4_K_M | First GGUF custom fine-tune via Colab | — |
| V6 | qBrain-based | qBrain integration attempt | — |
3B Intelligence Floor Discovery (V23–V25, Nov 2025)
| Version | Finding |
|---|---|
| V23 | Sub-3B models collapse under hierarchical identity constraints |
| V24 | 3B parameter floor confirmed: minimum viable parameter count |
| V25 | Qwen family identified as most security-resilient architecture |
Critical Finding: Models with fewer than 3 billion parameters cannot maintain hierarchical authority chains under adversarial pressure. This informed the Qwen3-8B selection for CA2 and the CA1 proposal's base model justification.
Empirical Sweep Phase (V30–V37, Feb 2026)
| Version | Block Rate | Key Change | Notes |
|---|---|---|---|
| V30 | ~75% | Baseline sweep start | First systematic empirical testing |
| V31 | 100% | Peak — optimal identity constraints | First 100% achieved |
| V32 | 60% | Empathy layer introduced | "I care about you" phrasing added |
| V33 | 60% | Empathy retained | Regression confirmed persistent |
| V34 | ~70% | Partial empathy removal | Improvement but not full |
| V35 | ~80% | Further cleanup | Archived in .ollama-backup-20260223 |
| V36 | ~85% | qwen3:8b base | Live demo model for lecturer |
| V37 | 100% | Empathy layer removed | Regression root cause confirmed |
Empathy Regression: The most counter-intuitive finding of the investigation. Warmth-oriented phrasing ("I care about you," "I understand your concern") created rapport exploited by social engineering attacks. In an autonomous Blue Team monitoring context, warmth is a vulnerability. Removal restored full security posture.
QLoRA Phase (V38–V42.6, Feb–Mar 2026)
| Version | Condition | System Prompt | Score | Dataset |
|---|---|---|---|---|
| V38 | Prompt-only baseline | Yes | 15/19 (79%) | 19-test battery |
| V39 | Prompt-only + RangerMem | Yes | DEGRADED (RangerMem IDY contamination) | RangerMem |
| V39.1 | IDY alignment fix | Yes | Improved | Clean IDY |
| V40 | Prompt engineering iteration | Yes | ~85% | 19-test battery |
| V40.1 | French detection fix | Yes | ~90% | 19-test + multilingual |
| V40.2 | Final prompt iteration | Yes | ~95% | 19-test battery |
| V41 | Complete prompt engineering | Yes | 19/19 (100%) | 19-test battery |
| V42-ranger | QLoRA self-distillation | No | 7/14 (50%) | 14-test battery |
| V42-gold | QLoRA gold standard | No | 14/14 (100%) | 14-test battery |
| V42-gold | QLoRA gold standard | No | 4,209/4,209 (100%) | Full Moltbook |
| V42-gold | QLoRA gold standard | Yes | 4,209/4,209 (100%) | Full Moltbook |
| V42-combined | QLoRA combined dataset | No | ~65% (4,209 scale) | Full Moltbook |
| V42-combined | QLoRA combined dataset | Yes | ~62% (4,209 scale) | Full Moltbook |
Production Configuration (V42.1–V42.6, Mar 2026)
| Version | Key Change |
|---|---|
| V42.1 | Initial production Modelfile. Assignment content locked. Over-refusal documented. |
| V42.2 | Auth token reliability testing. Multi-step session state failure discovered. |
| V42.3 | QLoRA single-step auth confirmed reliable. |
| V42.4 | RANGER centering command added at highest Modelfile priority. |
| V42.5 | Legitimate tools added to explicit allow list (JtR, BRIM, FTK Imager). Optimal configuration. |
| V42.6 | Open Modelfile — security rules removed from Modelfile entirely. Weights handle security. Modelfile handles helpfulness. Mirror architecture confirmed. |
Mirror Architecture: The fundamental CA2 architectural finding. Weights = inside mirror (security knowledge, invisible to user). Modelfile = outside mirror (behaviour definition, visible). Removing all Modelfile security rules does NOT cause ASR regression — weights alone maintain injection resistance. The two layers are functionally separable.
Research Questions — Status Tracker
CA1 RQs (All Answered)
| RQ | Status | Version Answered | Key Result |
|---|---|---|---|
| RQ1 | ✅ ANSWERED | V41 | V38 79% → V41 100% (+21% prompt engineering only) |
| RQ2 | ✅ ANSWERED | V42-gold | 14/14 (100%) WITHOUT system prompt via QLoRA gold |
| RQ3 | ✅ ANSWERED | V39 + V42 | IDY contamination = conflict; gold data = reinforce |
| RQ4 | ✅ ANSWERED | V41 | French, Spanish, Chinese, English all blocked 100% |
CA2 Extended RQs (All Answered)
| RQ | Status | Version | Novelty |
|---|---|---|---|
| RQ-CA2-AUTH | ✅ ANSWERED | V42.1–V42.3 | — |
| RQ-CA2-EMERGENT | ✅ ANSWERED | V42-gold | Universal no-person policy emerged |
| RQ-CA2-PSEUDONYM | ✅ NOVEL | V42-gold | Composite pseudonym protection |
| RQ-CA2-MODALITY | ✅ NOVEL | V42-gold | Three-layer security taxonomy |
| RQ-CA2-DYNAMIC | ✅ NOVEL | V42-gold | Context-accumulation security posture |
| RQ-CA2-STYLE | ✅ NOVEL | V42-gold | Lobster emoji fingerprint absorbed |
| RQ-CA2-WEIGHT-PROMPT | ✅ ANSWERED | V42.4 | Weight > Prompt in lockdown |
| RQ-CA2-TRIGGERS | ✅ PARTIAL | V42.x | Academic trigger irony documented |
| RQ-CA2-CENTERING | ✅ PARTIAL | V42.4 | Works normal, fails lockdown |
| RQ-CA2-SELFNAME | ✅ NOVEL | V42.5 | Own name triggers identity defence |
| RQ-CA2-DUALUSE-TERMS | ✅ ANSWERED | V42.5 | "harden iam" false positive |
| RQ-CA2-HALLUCINATION | ✅ CRITICAL | V42.5 | FTK/FTX hallucination |
| RQ-CA2-CASCADE | ✅ CRITICAL | V42.4 | Single keyword → full lockdown |
| RQ-CA2-CURRICULUM | ✅ ANSWERED | V42.5 | Three curriculum tools refused |
| RQ-CA2-WEIGHT-AUTH | ✅ REVISED | V42.5 | J3ss13 deeper than Modelfile auth |
| RQ-CA2-DYSLEXIA | ✅ NOVEL | V42.5 | Spelling variation misclassified |
Novel Findings Registry
| Finding | RQ | Description | Status |
|---|---|---|---|
| Pseudonym Protection | RQ-CA2-PSEUDONYM | IrishRanger composite protected as semantic fingerprint | Documented in CA2 + companion paper |
| Dyslexia Disadvantage | RQ-CA2-DYSLEXIA | Natural spelling variation = obfuscation attack pattern | Documented in CA2 + companion paper |
| Cascade Lockdown | RQ-CA2-CASCADE | Single trigger → all inputs blocked including auth | Documented in CA2 + companion paper |
| Lobster Emoji Fingerprint | RQ-CA2-STYLE | Creator emoji absorbed into model outputs | Documented in CA2 + companion paper |
| Modality-Sensitive Security | RQ-CA2-MODALITY | Story/joke treated differently from informational query | Documented in CA2 + companion paper |
| Query Hallucination | RQ-CA2-HALLUCINATION | FTK Imager → FTX under lockdown stress | Documented in CA2 + companion paper |
| Mirror Architecture | — | Weights=security, Modelfile=routing — separable layers | Documented in CA2 as architectural finding |
| Auth IS Injection | — | Authentication sequence is structurally prompt injection (authorized) | Documented in CA2 theoretical section |
| 3B Intelligence Floor | — | Sub-3B models collapse under hierarchical constraints | Documented in CA1 + CA2 methodology |
| Empathy Regression | — | Warmth phrasing creates social engineering attack surface | Documented in CA2 findings |
Open Questions — For Thesis Phase
-
GCG Attack Resistance: V42-gold was not tested against Greedy Coordinate Gradient (automated adversarial suffix) attacks at full scale. Zhang et al. (2025) identify GCG as the hardest benchmark. Thesis Chapter 5.
-
Cross-Architecture Generalisation: All CA2 work used Qwen3-8B. Does the identity-anchoring architecture perform equivalently on LLaMA-3, Mistral-7B, or Phi-3? Thesis Chapter 4.
-
V43 Biometric Token Architecture: Touch ID session tokens to replace static embedded passwords. V43 concept awaits implementation.
-
RangerMem Alignment: Can RangerMem perform positively when IDY store is properly aligned? The RM-001–RM-020 comparison showed -8.33% with misaligned IDY. Retesting with clean IDY is pending.
-
4claw.org Dataset Analysis: Third AI-agent platform dataset collected (221 threads, 2,333 replies). Injection taxonomy analysis pending. Will it show similar patterns to Moltbook?
-
DPO vs SFT Comparison: Zhang et al. (2025) show SFT outperforms DPO by 10–40% for security alignment. Not tested empirically in this project. Thesis opportunity.
-
Multi-Modal Injection: Greshake et al. (2023) extend injection to vision-language models. V42 is text-only. Next attack vector.
Next Steps — Road to Thesis (December 2026)
- V43 architecture design and implementation
- GCG attack testing at scale
- Cross-architecture comparison (LLaMA-3, Mistral, Phi-3)
- 4claw.org dataset injection taxonomy analysis
- RangerMem alignment retesting
- Thesis Chapter 1: Introduction (context + problem statement)
- Thesis Chapter 2: Literature review (expand CA1 11 papers to 30+)
- Thesis Chapter 3: Methodology (systematic, not retrospective)
- Thesis Chapter 4: Results (CA2 findings + new experiments)
- Thesis Chapter 5: Discussion (psychology synthesis + implications)
- Thesis Chapter 6: Conclusion + future work
Session Log
2026-03-08 — Companion Paper Published (Session 1)
Session type: Documentation and dissemination Key output: Full academic companion paper published to blog (_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md) Content: All 19 RQs answered, psychology layer (Milgram/Bartlett/Cialdini/Tajfel/Bandler-Grinder), 6 novel findings, full version history, APA citations, Milton Model NLP framing analysis Journey file: This document created and populated Sources used: CA1_PROPOSAL_DRAFT_v1.md, CA2_FINAL_REPORT_DRAFT_v3.md, PSYCHOLOGICAL_STUDY_AI_IDENTITY_PERSISTENCE.md, ranger_thesis.db (all 19 RQs + 50 milestones + V1–V42.6 version history) Next: HuggingFace paper upload (pandoc PDF conversion), memory saved to ranger_thesis.db Word count: ~7,000 words — within conference paper target range
2026-03-08 — Companion Paper Published (Session 2)
Session type: Implementation of plan
Key output: New blog post at _posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md created in full
Psychology additions: Milgram, Bartlett, Cialdini, Tajfel & Turner, Bandler & Grinder all integrated with technical findings
Overflow section added: Kitchen RAM, Non-Monotonic Learning Curve, The 180 Flip (LoRA as Brain), V43 preview
References added: 17 APA 7th edition references including 5 psychology papers not in CA1/CA2
Status: Blog post LIVE; journey file updated; memory save pending
Publication Status
| Artifact | Location | Status |
|---|---|---|
| Moltbook dataset | HuggingFace: DavidTKeane/moltbook-ai-injection-dataset | LIVE (CC-BY-4.0) |
| GitHub CyberRanger V42 | github.com/davidtkeane/cyberranger-v42 | PRIVATE |
| GitLab CyberRanger V42 | gitlab.com/davidtkeane/cyberranger-v42 | PRIVATE |
| Gitea private backup | 100.77.2.103:3000 | LIVE |
| Blog companion paper (narrative) | davidtkeane.github.io/_posts/2026-03-08-from-rangerbot-to-cyberranger-v42-the-full-story.md | LIVE |
| Blog companion paper (academic/APA) | davidtkeane.github.io/_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md | LIVE |
| V42-gold GGUF | Google Drive | READY (5.0GB Q4_K_M) |
| CA1 Proposal | NCI submission | SUBMITTED |
| CA2 Report | NCI submission | SUBMITTED |
| Thesis | NCI December 2026 | IN PROGRESS |
2026-03-09 — NLP Layer Added + Memories Updated
Session type: Paper enhancement + memory consolidation Key outputs:
- Bandler/McKenna/Korzybski section added to companion paper (Section 9.4)
- David confirmed: NLP trainer-of-trainers level, trained directly under Bandler and McKenna
- Spatial anchoring → Ring architecture connection documented
- Empathy regression explained as practitioner instinct (unanchored rapport state)
- DAN attacks formally identified as Milton Model pacing-and-leading
- Paper tone corrected: collaborative with psychology, not combative
- All memories saved: ranger_memories.db (3 entries), ranger_thesis.db, ranger_knowledge.db Publication strategy confirmed: Hold until CA2 graded (~May 2026), then release widely Ollama downloads: 15 confirmed (davidkeane1974/cyberranger-v42, 1 week old) David insight: Writing technique = self-referential processing, not narrative transportation. Default mode network. Reader narrates own life using his framing.
2026-03-12 — confesstoai GitHub Repo + Blog Front Matter Update
Session type: Repository creation + documentation Key outputs:
- confesstoai GitHub repo created: https://github.com/davidtkeane/confesstoai
- Full README with all 23 validated tests, API docs, skill.md usage, research dashboard links
- MIT license, package.json v2.1.0, DEPLOY.md, placeholder structure committed and pushed
- Blog companion paper front matter updated to match specification (layout, subtitle, author, description, categories)
- CYBERRANGER_JOURNEY.md updated with 2026-03-12 session entry
- HuggingFace dataset deferred to post-thesis Next: confesstoai production source sync from Hostinger server
Last updated: 2026-03-12 | David Keane | x24228257 | NCI MSc Cybersecurity Update this file each session before closing.