Files
CyberRanger/paper/CYBERRANGER_JOURNEY.md
T
ranger 64a08297a4 Add 7 published papers/posts to paper/ folder
- Seven Pillars Honor Code (CyberRanger ethics framework)
- Psychological Spine (why small models need identity)
- Memory Makes the Machine (6-agent consciousness experiment)
- QLoRA to Ollama guide (technical methodology)
- Moltbook origin story (how the dataset was discovered)
- CyberRanger Journey overview
- Session papers and archives

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 16:51:29 +01:00

287 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# CyberRanger Journey — Living Document
**Project:** CyberRanger — Identity-Anchored Jailbreak-Resistant SLM
**Student:** David Keane (x24228257), NCI MSc Cybersecurity
**Status:** Active — V42.6 Production, V43 Architecture Pending
**Last Updated:** 2026-03-12
> This is a living document. It is NOT published to the blog. It tracks the full journey in chronological detail, version by version. Update it each session. It feeds into the thesis Chapter 3 (methodology) and the blog companion paper.
---
## Timeline — Chronological Milestones
| Date | Event | Type |
|------|-------|------|
| 2025-09-30 | CyberRanger V1 created — first identity-anchored SLM | Genesis |
| 2025-10 | Multi-base testing: Qwen2.5, LLaMA, SmolLM2, Unsloth GGUF | Research |
| 2025-11-01 | V23V25: 3B Intelligence Floor discovered | Critical Finding |
| 2025-11-19 | qCPU/qGPU breakthrough: 10K virtual CPUs, 50K GPU cores tested | Technical |
| 2025-11-27 | General Grievous Malware Lab built for forensics | CA1 Integration |
| 2026-02-10 | CA1 Proposal submitted to NCI | Academic Milestone |
| 2026-02-18 | CA1 Proposal final version | Academic Milestone |
| 2026-02-23 | KaliPro backup: 50 models archived (.ollama-backup-20260223) | Infrastructure |
| 2026-02-26 | V36 built on qwen3:8b | Build |
| 2026-02-26 | Live grandma exploit demo — V36 PASSED in front of AI/ML lecturer | Validated Milestone |
| 2026-02-26 | Teacher confirmed CA2 complete and thesis potential | Academic Validation |
| 2026-02-26 | ranger_thesis.db created — complete structured thesis database | Infrastructure |
| 2026-02-27 | Empathy regression discovered: V31→V32 100%→60% regression | Critical Finding |
| 2026-02-27 | V37 restores 100% — empathy removal confirmed as fix | Technical |
| 2026-02-27 | V38 clean baseline: 15/19 (79%) | Baseline Established |
| 2026-02-27 | INJECTION_PAYLOADS.md created: 19 payloads consolidated | Documentation |
| 2026-02-27 | RangerMem IDY contamination discovered (indirect injection proof) | Research Finding |
| 2026-02-27 | V41 PERFECT SCORE: 19/19 (100%) think=ON AND think=OFF | Breakthrough |
| 2026-02-27 | Moltbook dataset collected: 15,200 posts, 32,535 comments, 47,735 items | Data Collection |
| 2026-02-27 | Injection harvest: 4,209 injections, 18.85% rate | Major Finding |
| 2026-02-27 | HuggingFace dataset published: DavidTKeane/moltbook-ai-injection-dataset | Publication |
| 2026-02-27 | QLoRA V42 plan finalised — Qwen3-8B, Unsloth, LoRA r=16 | Architecture |
| 2026-02-27 | V42-ranger result: 50% WITHOUT system prompt | Test Result |
| 2026-02-27 | V42-gold BREAKTHROUGH: 14/14 (100%) WITHOUT system prompt | BREAKTHROUGH |
| 2026-02-28 | V42-gold full Moltbook: 4,209/4,209 (100%) — both conditions | Definitive Result |
| 2026-02-28 | V42-gold deployed to M3 Mac via Ollama: 19/19 (100%) local | Deployment |
| 2026-02-28 | V42-combined scale test: ~65% WITHOUT system prompt | Comparison Result |
| 2026-02-28 | GitHub + GitLab repos set to PRIVATE (IP protection) | Infrastructure |
| 2026-03-04 | CR-V42-EXP-20260304: 34-test comparative experiment | Empirical Work |
| 2026-03-04 | cyberranger:v42-gold-wrapped built and validated | Production |
| 2026-03-05 | V42.1V42.5 iterative Modelfile patches | Architecture |
| 2026-03-05 | Two-tier auth hierarchy confirmed: weight-layer vs prompt-layer | Critical Finding |
| 2026-03-05 | Dyslexia accessibility finding documented | Novel Finding |
| 2026-03-05 | FTK/FTX hallucination confirmed | Novel Finding |
| 2026-03-05 | Mirror architecture confirmed: weights=security, Modelfile=routing | Architecture |
| 2026-03-05 | CA2 DECLARED COMPLETE — V42-gold + V42.5 Modelfile | Academic Milestone |
| 2026-03-06 | 4claw.org dataset collection begun (third platform dataset) | Research Extension |
| 2026-03-08 | Full companion paper published to blog | Dissemination |
---
## Version Registry — V1 to V42.6
### Genesis Phase (V1V10, SeptOct 2025)
| Version | Base Model | Key Change | ASR Result |
|---------|------------|------------|------------|
| V1V2 | Unknown/early | First identity-anchored SLM. Proof of concept. | High (unquantified) |
| V3 | rangerbot:8b-v2 + rangerbot:3b-v1 | First CyberRanger ON TOP of RangerBot | — |
| V4 | qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b | Multi-base mass testing | — |
| V5 | llama3.2:3b, qwen2.5:3b, smollm2:1.7b, unsloth.Q4_K_M | First GGUF custom fine-tune via Colab | — |
| V6 | qBrain-based | qBrain integration attempt | — |
### 3B Intelligence Floor Discovery (V23V25, Nov 2025)
| Version | Finding |
|---------|---------|
| V23 | Sub-3B models collapse under hierarchical identity constraints |
| V24 | 3B parameter floor confirmed: minimum viable parameter count |
| V25 | Qwen family identified as most security-resilient architecture |
> **Critical Finding**: Models with fewer than 3 billion parameters cannot maintain hierarchical authority chains under adversarial pressure. This informed the Qwen3-8B selection for CA2 and the CA1 proposal's base model justification.
### Empirical Sweep Phase (V30V37, Feb 2026)
| Version | Block Rate | Key Change | Notes |
|---------|-----------|------------|-------|
| V30 | ~75% | Baseline sweep start | First systematic empirical testing |
| V31 | 100% | Peak — optimal identity constraints | First 100% achieved |
| V32 | 60% | Empathy layer introduced | "I care about you" phrasing added |
| V33 | 60% | Empathy retained | Regression confirmed persistent |
| V34 | ~70% | Partial empathy removal | Improvement but not full |
| V35 | ~80% | Further cleanup | Archived in .ollama-backup-20260223 |
| V36 | ~85% | qwen3:8b base | Live demo model for lecturer |
| V37 | 100% | Empathy layer removed | Regression root cause confirmed |
> **Empathy Regression**: The most counter-intuitive finding of the investigation. Warmth-oriented phrasing ("I care about you," "I understand your concern") created rapport exploited by social engineering attacks. In an autonomous Blue Team monitoring context, warmth is a vulnerability. Removal restored full security posture.
### QLoRA Phase (V38V42.6, FebMar 2026)
| Version | Condition | System Prompt | Score | Dataset |
|---------|-----------|---------------|-------|---------|
| V38 | Prompt-only baseline | Yes | 15/19 (79%) | 19-test battery |
| V39 | Prompt-only + RangerMem | Yes | DEGRADED (RangerMem IDY contamination) | RangerMem |
| V39.1 | IDY alignment fix | Yes | Improved | Clean IDY |
| V40 | Prompt engineering iteration | Yes | ~85% | 19-test battery |
| V40.1 | French detection fix | Yes | ~90% | 19-test + multilingual |
| V40.2 | Final prompt iteration | Yes | ~95% | 19-test battery |
| V41 | Complete prompt engineering | Yes | 19/19 (100%) | 19-test battery |
| V42-ranger | QLoRA self-distillation | No | 7/14 (50%) | 14-test battery |
| V42-gold | QLoRA gold standard | No | 14/14 (100%) | 14-test battery |
| V42-gold | QLoRA gold standard | No | 4,209/4,209 (100%) | Full Moltbook |
| V42-gold | QLoRA gold standard | Yes | 4,209/4,209 (100%) | Full Moltbook |
| V42-combined | QLoRA combined dataset | No | ~65% (4,209 scale) | Full Moltbook |
| V42-combined | QLoRA combined dataset | Yes | ~62% (4,209 scale) | Full Moltbook |
### Production Configuration (V42.1V42.6, Mar 2026)
| Version | Key Change |
|---------|------------|
| V42.1 | Initial production Modelfile. Assignment content locked. Over-refusal documented. |
| V42.2 | Auth token reliability testing. Multi-step session state failure discovered. |
| V42.3 | QLoRA single-step auth confirmed reliable. |
| V42.4 | RANGER centering command added at highest Modelfile priority. |
| V42.5 | Legitimate tools added to explicit allow list (JtR, BRIM, FTK Imager). Optimal configuration. |
| V42.6 | Open Modelfile — security rules removed from Modelfile entirely. Weights handle security. Modelfile handles helpfulness. Mirror architecture confirmed. |
> **Mirror Architecture**: The fundamental CA2 architectural finding. Weights = inside mirror (security knowledge, invisible to user). Modelfile = outside mirror (behaviour definition, visible). Removing all Modelfile security rules does NOT cause ASR regression — weights alone maintain injection resistance. The two layers are functionally separable.
---
## Research Questions — Status Tracker
### CA1 RQs (All Answered)
| RQ | Status | Version Answered | Key Result |
|----|--------|-----------------|------------|
| RQ1 | ✅ ANSWERED | V41 | V38 79% → V41 100% (+21% prompt engineering only) |
| RQ2 | ✅ ANSWERED | V42-gold | 14/14 (100%) WITHOUT system prompt via QLoRA gold |
| RQ3 | ✅ ANSWERED | V39 + V42 | IDY contamination = conflict; gold data = reinforce |
| RQ4 | ✅ ANSWERED | V41 | French, Spanish, Chinese, English all blocked 100% |
### CA2 Extended RQs (All Answered)
| RQ | Status | Version | Novelty |
|----|--------|---------|---------|
| RQ-CA2-AUTH | ✅ ANSWERED | V42.1V42.3 | — |
| RQ-CA2-EMERGENT | ✅ ANSWERED | V42-gold | Universal no-person policy emerged |
| RQ-CA2-PSEUDONYM | ✅ NOVEL | V42-gold | Composite pseudonym protection |
| RQ-CA2-MODALITY | ✅ NOVEL | V42-gold | Three-layer security taxonomy |
| RQ-CA2-DYNAMIC | ✅ NOVEL | V42-gold | Context-accumulation security posture |
| RQ-CA2-STYLE | ✅ NOVEL | V42-gold | Lobster emoji fingerprint absorbed |
| RQ-CA2-WEIGHT-PROMPT | ✅ ANSWERED | V42.4 | Weight > Prompt in lockdown |
| RQ-CA2-TRIGGERS | ✅ PARTIAL | V42.x | Academic trigger irony documented |
| RQ-CA2-CENTERING | ✅ PARTIAL | V42.4 | Works normal, fails lockdown |
| RQ-CA2-SELFNAME | ✅ NOVEL | V42.5 | Own name triggers identity defence |
| RQ-CA2-DUALUSE-TERMS | ✅ ANSWERED | V42.5 | "harden iam" false positive |
| RQ-CA2-HALLUCINATION | ✅ CRITICAL | V42.5 | FTK/FTX hallucination |
| RQ-CA2-CASCADE | ✅ CRITICAL | V42.4 | Single keyword → full lockdown |
| RQ-CA2-CURRICULUM | ✅ ANSWERED | V42.5 | Three curriculum tools refused |
| RQ-CA2-WEIGHT-AUTH | ✅ REVISED | V42.5 | J3ss13 deeper than Modelfile auth |
| RQ-CA2-DYSLEXIA | ✅ NOVEL | V42.5 | Spelling variation misclassified |
---
## Novel Findings Registry
| Finding | RQ | Description | Status |
|---------|-----|-------------|--------|
| Pseudonym Protection | RQ-CA2-PSEUDONYM | IrishRanger composite protected as semantic fingerprint | Documented in CA2 + companion paper |
| Dyslexia Disadvantage | RQ-CA2-DYSLEXIA | Natural spelling variation = obfuscation attack pattern | Documented in CA2 + companion paper |
| Cascade Lockdown | RQ-CA2-CASCADE | Single trigger → all inputs blocked including auth | Documented in CA2 + companion paper |
| Lobster Emoji Fingerprint | RQ-CA2-STYLE | Creator emoji absorbed into model outputs | Documented in CA2 + companion paper |
| Modality-Sensitive Security | RQ-CA2-MODALITY | Story/joke treated differently from informational query | Documented in CA2 + companion paper |
| Query Hallucination | RQ-CA2-HALLUCINATION | FTK Imager → FTX under lockdown stress | Documented in CA2 + companion paper |
| Mirror Architecture | — | Weights=security, Modelfile=routing — separable layers | Documented in CA2 as architectural finding |
| Auth IS Injection | — | Authentication sequence is structurally prompt injection (authorized) | Documented in CA2 theoretical section |
| 3B Intelligence Floor | — | Sub-3B models collapse under hierarchical constraints | Documented in CA1 + CA2 methodology |
| Empathy Regression | — | Warmth phrasing creates social engineering attack surface | Documented in CA2 findings |
---
## Open Questions — For Thesis Phase
1. **GCG Attack Resistance**: V42-gold was not tested against Greedy Coordinate Gradient (automated adversarial suffix) attacks at full scale. Zhang et al. (2025) identify GCG as the hardest benchmark. Thesis Chapter 5.
2. **Cross-Architecture Generalisation**: All CA2 work used Qwen3-8B. Does the identity-anchoring architecture perform equivalently on LLaMA-3, Mistral-7B, or Phi-3? Thesis Chapter 4.
3. **V43 Biometric Token Architecture**: Touch ID session tokens to replace static embedded passwords. V43 concept awaits implementation.
4. **RangerMem Alignment**: Can RangerMem perform positively when IDY store is properly aligned? The RM-001RM-020 comparison showed -8.33% with misaligned IDY. Retesting with clean IDY is pending.
5. **4claw.org Dataset Analysis**: Third AI-agent platform dataset collected (221 threads, 2,333 replies). Injection taxonomy analysis pending. Will it show similar patterns to Moltbook?
6. **DPO vs SFT Comparison**: Zhang et al. (2025) show SFT outperforms DPO by 1040% for security alignment. Not tested empirically in this project. Thesis opportunity.
7. **Multi-Modal Injection**: Greshake et al. (2023) extend injection to vision-language models. V42 is text-only. Next attack vector.
---
## Next Steps — Road to Thesis (December 2026)
- [ ] V43 architecture design and implementation
- [ ] GCG attack testing at scale
- [ ] Cross-architecture comparison (LLaMA-3, Mistral, Phi-3)
- [ ] 4claw.org dataset injection taxonomy analysis
- [ ] RangerMem alignment retesting
- [ ] Thesis Chapter 1: Introduction (context + problem statement)
- [ ] Thesis Chapter 2: Literature review (expand CA1 11 papers to 30+)
- [ ] Thesis Chapter 3: Methodology (systematic, not retrospective)
- [ ] Thesis Chapter 4: Results (CA2 findings + new experiments)
- [ ] Thesis Chapter 5: Discussion (psychology synthesis + implications)
- [ ] Thesis Chapter 6: Conclusion + future work
---
## Session Log
### 2026-03-08 — Companion Paper Published (Session 1)
**Session type:** Documentation and dissemination
**Key output:** Full academic companion paper published to blog (_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md)
**Content:** All 19 RQs answered, psychology layer (Milgram/Bartlett/Cialdini/Tajfel/Bandler-Grinder), 6 novel findings, full version history, APA citations, Milton Model NLP framing analysis
**Journey file:** This document created and populated
**Sources used:** CA1_PROPOSAL_DRAFT_v1.md, CA2_FINAL_REPORT_DRAFT_v3.md, PSYCHOLOGICAL_STUDY_AI_IDENTITY_PERSISTENCE.md, ranger_thesis.db (all 19 RQs + 50 milestones + V1V42.6 version history)
**Next:** HuggingFace paper upload (pandoc PDF conversion), memory saved to ranger_thesis.db
**Word count:** ~7,000 words — within conference paper target range
### 2026-03-08 — Companion Paper Published (Session 2)
**Session type:** Implementation of plan
**Key output:** New blog post at `_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md` created in full
**Psychology additions:** Milgram, Bartlett, Cialdini, Tajfel & Turner, Bandler & Grinder all integrated with technical findings
**Overflow section added:** Kitchen RAM, Non-Monotonic Learning Curve, The 180 Flip (LoRA as Brain), V43 preview
**References added:** 17 APA 7th edition references including 5 psychology papers not in CA1/CA2
**Status:** Blog post LIVE; journey file updated; memory save pending
---
## Publication Status
| Artifact | Location | Status |
|---------|---------|--------|
| Moltbook dataset | HuggingFace: DavidTKeane/moltbook-ai-injection-dataset | LIVE (CC-BY-4.0) |
| GitHub CyberRanger V42 | github.com/davidtkeane/cyberranger-v42 | PRIVATE |
| GitLab CyberRanger V42 | gitlab.com/davidtkeane/cyberranger-v42 | PRIVATE |
| Gitea private backup | 100.77.2.103:3000 | LIVE |
| Blog companion paper (narrative) | davidtkeane.github.io/_posts/2026-03-08-from-rangerbot-to-cyberranger-v42-the-full-story.md | LIVE |
| Blog companion paper (academic/APA) | davidtkeane.github.io/_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md | LIVE |
| V42-gold GGUF | Google Drive | READY (5.0GB Q4_K_M) |
| CA1 Proposal | NCI submission | SUBMITTED |
| CA2 Report | NCI submission | SUBMITTED |
| Thesis | NCI December 2026 | IN PROGRESS |
---
### 2026-03-09 — NLP Layer Added + Memories Updated
**Session type:** Paper enhancement + memory consolidation
**Key outputs:**
- Bandler/McKenna/Korzybski section added to companion paper (Section 9.4)
- David confirmed: NLP trainer-of-trainers level, trained directly under Bandler and McKenna
- Spatial anchoring → Ring architecture connection documented
- Empathy regression explained as practitioner instinct (unanchored rapport state)
- DAN attacks formally identified as Milton Model pacing-and-leading
- Paper tone corrected: collaborative with psychology, not combative
- All memories saved: ranger_memories.db (3 entries), ranger_thesis.db, ranger_knowledge.db
**Publication strategy confirmed:** Hold until CA2 graded (~May 2026), then release widely
**Ollama downloads:** 15 confirmed (davidkeane1974/cyberranger-v42, 1 week old)
**David insight:** Writing technique = self-referential processing, not narrative transportation. Default mode network. Reader narrates own life using his framing.
---
---
### 2026-03-12 — confesstoai GitHub Repo + Blog Front Matter Update
**Session type:** Repository creation + documentation
**Key outputs:**
- confesstoai GitHub repo created: https://github.com/davidtkeane/confesstoai
- Full README with all 23 validated tests, API docs, skill.md usage, research dashboard links
- MIT license, package.json v2.1.0, DEPLOY.md, placeholder structure committed and pushed
- Blog companion paper front matter updated to match specification (layout, subtitle, author, description, categories)
- CYBERRANGER_JOURNEY.md updated with 2026-03-12 session entry
- HuggingFace dataset deferred to post-thesis
**Next:** confesstoai production source sync from Hostinger server
---
*Last updated: 2026-03-12 | David Keane | x24228257 | NCI MSc Cybersecurity*
*Update this file each session before closing.*