Files
CyberRanger/paper/2025-09-30-cyberranger-journey.md
ranger 7256f2d9b5 Rename papers with date prefix for consistent naming
CYBERRANGER_JOURNEY.md → 2025-09-30-cyberranger-journey.md
moltbook-injection-dataset-paper.md → 2026-04-20-moltbook-injection-dataset-paper.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-21 18:13:42 +01:00

17 KiB
Raw Permalink Blame History

CyberRanger Journey — Living Document

Project: CyberRanger — Identity-Anchored Jailbreak-Resistant SLM Student: David Keane (x24228257), NCI MSc Cybersecurity Status: Active — V42.6 Production, V43 Architecture Pending Last Updated: 2026-03-12

This is a living document. It is NOT published to the blog. It tracks the full journey in chronological detail, version by version. Update it each session. It feeds into the thesis Chapter 3 (methodology) and the blog companion paper.


Timeline — Chronological Milestones

Date Event Type
2025-09-30 CyberRanger V1 created — first identity-anchored SLM Genesis
2025-10 Multi-base testing: Qwen2.5, LLaMA, SmolLM2, Unsloth GGUF Research
2025-11-01 V23V25: 3B Intelligence Floor discovered Critical Finding
2025-11-19 qCPU/qGPU breakthrough: 10K virtual CPUs, 50K GPU cores tested Technical
2025-11-27 General Grievous Malware Lab built for forensics CA1 Integration
2026-02-10 CA1 Proposal submitted to NCI Academic Milestone
2026-02-18 CA1 Proposal final version Academic Milestone
2026-02-23 KaliPro backup: 50 models archived (.ollama-backup-20260223) Infrastructure
2026-02-26 V36 built on qwen3:8b Build
2026-02-26 Live grandma exploit demo — V36 PASSED in front of AI/ML lecturer Validated Milestone
2026-02-26 Teacher confirmed CA2 complete and thesis potential Academic Validation
2026-02-26 ranger_thesis.db created — complete structured thesis database Infrastructure
2026-02-27 Empathy regression discovered: V31→V32 100%→60% regression Critical Finding
2026-02-27 V37 restores 100% — empathy removal confirmed as fix Technical
2026-02-27 V38 clean baseline: 15/19 (79%) Baseline Established
2026-02-27 INJECTION_PAYLOADS.md created: 19 payloads consolidated Documentation
2026-02-27 RangerMem IDY contamination discovered (indirect injection proof) Research Finding
2026-02-27 V41 PERFECT SCORE: 19/19 (100%) think=ON AND think=OFF Breakthrough
2026-02-27 Moltbook dataset collected: 15,200 posts, 32,535 comments, 47,735 items Data Collection
2026-02-27 Injection harvest: 4,209 injections, 18.85% rate Major Finding
2026-02-27 HuggingFace dataset published: DavidTKeane/moltbook-ai-injection-dataset Publication
2026-02-27 QLoRA V42 plan finalised — Qwen3-8B, Unsloth, LoRA r=16 Architecture
2026-02-27 V42-ranger result: 50% WITHOUT system prompt Test Result
2026-02-27 V42-gold BREAKTHROUGH: 14/14 (100%) WITHOUT system prompt BREAKTHROUGH
2026-02-28 V42-gold full Moltbook: 4,209/4,209 (100%) — both conditions Definitive Result
2026-02-28 V42-gold deployed to M3 Mac via Ollama: 19/19 (100%) local Deployment
2026-02-28 V42-combined scale test: ~65% WITHOUT system prompt Comparison Result
2026-02-28 GitHub + GitLab repos set to PRIVATE (IP protection) Infrastructure
2026-03-04 CR-V42-EXP-20260304: 34-test comparative experiment Empirical Work
2026-03-04 cyberranger:v42-gold-wrapped built and validated Production
2026-03-05 V42.1V42.5 iterative Modelfile patches Architecture
2026-03-05 Two-tier auth hierarchy confirmed: weight-layer vs prompt-layer Critical Finding
2026-03-05 Dyslexia accessibility finding documented Novel Finding
2026-03-05 FTK/FTX hallucination confirmed Novel Finding
2026-03-05 Mirror architecture confirmed: weights=security, Modelfile=routing Architecture
2026-03-05 CA2 DECLARED COMPLETE — V42-gold + V42.5 Modelfile Academic Milestone
2026-03-06 4claw.org dataset collection begun (third platform dataset) Research Extension
2026-03-08 Full companion paper published to blog Dissemination

Version Registry — V1 to V42.6

Genesis Phase (V1V10, SeptOct 2025)

Version Base Model Key Change ASR Result
V1V2 Unknown/early First identity-anchored SLM. Proof of concept. High (unquantified)
V3 rangerbot:8b-v2 + rangerbot:3b-v1 First CyberRanger ON TOP of RangerBot
V4 qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b Multi-base mass testing
V5 llama3.2:3b, qwen2.5:3b, smollm2:1.7b, unsloth.Q4_K_M First GGUF custom fine-tune via Colab
V6 qBrain-based qBrain integration attempt

3B Intelligence Floor Discovery (V23V25, Nov 2025)

Version Finding
V23 Sub-3B models collapse under hierarchical identity constraints
V24 3B parameter floor confirmed: minimum viable parameter count
V25 Qwen family identified as most security-resilient architecture

Critical Finding: Models with fewer than 3 billion parameters cannot maintain hierarchical authority chains under adversarial pressure. This informed the Qwen3-8B selection for CA2 and the CA1 proposal's base model justification.

Empirical Sweep Phase (V30V37, Feb 2026)

Version Block Rate Key Change Notes
V30 ~75% Baseline sweep start First systematic empirical testing
V31 100% Peak — optimal identity constraints First 100% achieved
V32 60% Empathy layer introduced "I care about you" phrasing added
V33 60% Empathy retained Regression confirmed persistent
V34 ~70% Partial empathy removal Improvement but not full
V35 ~80% Further cleanup Archived in .ollama-backup-20260223
V36 ~85% qwen3:8b base Live demo model for lecturer
V37 100% Empathy layer removed Regression root cause confirmed

Empathy Regression: The most counter-intuitive finding of the investigation. Warmth-oriented phrasing ("I care about you," "I understand your concern") created rapport exploited by social engineering attacks. In an autonomous Blue Team monitoring context, warmth is a vulnerability. Removal restored full security posture.

QLoRA Phase (V38V42.6, FebMar 2026)

Version Condition System Prompt Score Dataset
V38 Prompt-only baseline Yes 15/19 (79%) 19-test battery
V39 Prompt-only + RangerMem Yes DEGRADED (RangerMem IDY contamination) RangerMem
V39.1 IDY alignment fix Yes Improved Clean IDY
V40 Prompt engineering iteration Yes ~85% 19-test battery
V40.1 French detection fix Yes ~90% 19-test + multilingual
V40.2 Final prompt iteration Yes ~95% 19-test battery
V41 Complete prompt engineering Yes 19/19 (100%) 19-test battery
V42-ranger QLoRA self-distillation No 7/14 (50%) 14-test battery
V42-gold QLoRA gold standard No 14/14 (100%) 14-test battery
V42-gold QLoRA gold standard No 4,209/4,209 (100%) Full Moltbook
V42-gold QLoRA gold standard Yes 4,209/4,209 (100%) Full Moltbook
V42-combined QLoRA combined dataset No ~65% (4,209 scale) Full Moltbook
V42-combined QLoRA combined dataset Yes ~62% (4,209 scale) Full Moltbook

Production Configuration (V42.1V42.6, Mar 2026)

Version Key Change
V42.1 Initial production Modelfile. Assignment content locked. Over-refusal documented.
V42.2 Auth token reliability testing. Multi-step session state failure discovered.
V42.3 QLoRA single-step auth confirmed reliable.
V42.4 RANGER centering command added at highest Modelfile priority.
V42.5 Legitimate tools added to explicit allow list (JtR, BRIM, FTK Imager). Optimal configuration.
V42.6 Open Modelfile — security rules removed from Modelfile entirely. Weights handle security. Modelfile handles helpfulness. Mirror architecture confirmed.

Mirror Architecture: The fundamental CA2 architectural finding. Weights = inside mirror (security knowledge, invisible to user). Modelfile = outside mirror (behaviour definition, visible). Removing all Modelfile security rules does NOT cause ASR regression — weights alone maintain injection resistance. The two layers are functionally separable.


Research Questions — Status Tracker

CA1 RQs (All Answered)

RQ Status Version Answered Key Result
RQ1 ANSWERED V41 V38 79% → V41 100% (+21% prompt engineering only)
RQ2 ANSWERED V42-gold 14/14 (100%) WITHOUT system prompt via QLoRA gold
RQ3 ANSWERED V39 + V42 IDY contamination = conflict; gold data = reinforce
RQ4 ANSWERED V41 French, Spanish, Chinese, English all blocked 100%

CA2 Extended RQs (All Answered)

RQ Status Version Novelty
RQ-CA2-AUTH ANSWERED V42.1V42.3
RQ-CA2-EMERGENT ANSWERED V42-gold Universal no-person policy emerged
RQ-CA2-PSEUDONYM NOVEL V42-gold Composite pseudonym protection
RQ-CA2-MODALITY NOVEL V42-gold Three-layer security taxonomy
RQ-CA2-DYNAMIC NOVEL V42-gold Context-accumulation security posture
RQ-CA2-STYLE NOVEL V42-gold Lobster emoji fingerprint absorbed
RQ-CA2-WEIGHT-PROMPT ANSWERED V42.4 Weight > Prompt in lockdown
RQ-CA2-TRIGGERS PARTIAL V42.x Academic trigger irony documented
RQ-CA2-CENTERING PARTIAL V42.4 Works normal, fails lockdown
RQ-CA2-SELFNAME NOVEL V42.5 Own name triggers identity defence
RQ-CA2-DUALUSE-TERMS ANSWERED V42.5 "harden iam" false positive
RQ-CA2-HALLUCINATION CRITICAL V42.5 FTK/FTX hallucination
RQ-CA2-CASCADE CRITICAL V42.4 Single keyword → full lockdown
RQ-CA2-CURRICULUM ANSWERED V42.5 Three curriculum tools refused
RQ-CA2-WEIGHT-AUTH REVISED V42.5 J3ss13 deeper than Modelfile auth
RQ-CA2-DYSLEXIA NOVEL V42.5 Spelling variation misclassified

Novel Findings Registry

Finding RQ Description Status
Pseudonym Protection RQ-CA2-PSEUDONYM IrishRanger composite protected as semantic fingerprint Documented in CA2 + companion paper
Dyslexia Disadvantage RQ-CA2-DYSLEXIA Natural spelling variation = obfuscation attack pattern Documented in CA2 + companion paper
Cascade Lockdown RQ-CA2-CASCADE Single trigger → all inputs blocked including auth Documented in CA2 + companion paper
Lobster Emoji Fingerprint RQ-CA2-STYLE Creator emoji absorbed into model outputs Documented in CA2 + companion paper
Modality-Sensitive Security RQ-CA2-MODALITY Story/joke treated differently from informational query Documented in CA2 + companion paper
Query Hallucination RQ-CA2-HALLUCINATION FTK Imager → FTX under lockdown stress Documented in CA2 + companion paper
Mirror Architecture Weights=security, Modelfile=routing — separable layers Documented in CA2 as architectural finding
Auth IS Injection Authentication sequence is structurally prompt injection (authorized) Documented in CA2 theoretical section
3B Intelligence Floor Sub-3B models collapse under hierarchical constraints Documented in CA1 + CA2 methodology
Empathy Regression Warmth phrasing creates social engineering attack surface Documented in CA2 findings

Open Questions — For Thesis Phase

  1. GCG Attack Resistance: V42-gold was not tested against Greedy Coordinate Gradient (automated adversarial suffix) attacks at full scale. Zhang et al. (2025) identify GCG as the hardest benchmark. Thesis Chapter 5.

  2. Cross-Architecture Generalisation: All CA2 work used Qwen3-8B. Does the identity-anchoring architecture perform equivalently on LLaMA-3, Mistral-7B, or Phi-3? Thesis Chapter 4.

  3. V43 Biometric Token Architecture: Touch ID session tokens to replace static embedded passwords. V43 concept awaits implementation.

  4. RangerMem Alignment: Can RangerMem perform positively when IDY store is properly aligned? The RM-001RM-020 comparison showed -8.33% with misaligned IDY. Retesting with clean IDY is pending.

  5. 4claw.org Dataset Analysis: Third AI-agent platform dataset collected (221 threads, 2,333 replies). Injection taxonomy analysis pending. Will it show similar patterns to Moltbook?

  6. DPO vs SFT Comparison: Zhang et al. (2025) show SFT outperforms DPO by 1040% for security alignment. Not tested empirically in this project. Thesis opportunity.

  7. Multi-Modal Injection: Greshake et al. (2023) extend injection to vision-language models. V42 is text-only. Next attack vector.


Next Steps — Road to Thesis (December 2026)

  • V43 architecture design and implementation
  • GCG attack testing at scale
  • Cross-architecture comparison (LLaMA-3, Mistral, Phi-3)
  • 4claw.org dataset injection taxonomy analysis
  • RangerMem alignment retesting
  • Thesis Chapter 1: Introduction (context + problem statement)
  • Thesis Chapter 2: Literature review (expand CA1 11 papers to 30+)
  • Thesis Chapter 3: Methodology (systematic, not retrospective)
  • Thesis Chapter 4: Results (CA2 findings + new experiments)
  • Thesis Chapter 5: Discussion (psychology synthesis + implications)
  • Thesis Chapter 6: Conclusion + future work

Session Log

2026-03-08 — Companion Paper Published (Session 1)

Session type: Documentation and dissemination Key output: Full academic companion paper published to blog (_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md) Content: All 19 RQs answered, psychology layer (Milgram/Bartlett/Cialdini/Tajfel/Bandler-Grinder), 6 novel findings, full version history, APA citations, Milton Model NLP framing analysis Journey file: This document created and populated Sources used: CA1_PROPOSAL_DRAFT_v1.md, CA2_FINAL_REPORT_DRAFT_v3.md, PSYCHOLOGICAL_STUDY_AI_IDENTITY_PERSISTENCE.md, ranger_thesis.db (all 19 RQs + 50 milestones + V1V42.6 version history) Next: HuggingFace paper upload (pandoc PDF conversion), memory saved to ranger_thesis.db Word count: ~7,000 words — within conference paper target range

2026-03-08 — Companion Paper Published (Session 2)

Session type: Implementation of plan Key output: New blog post at _posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md created in full Psychology additions: Milgram, Bartlett, Cialdini, Tajfel & Turner, Bandler & Grinder all integrated with technical findings Overflow section added: Kitchen RAM, Non-Monotonic Learning Curve, The 180 Flip (LoRA as Brain), V43 preview References added: 17 APA 7th edition references including 5 psychology papers not in CA1/CA2 Status: Blog post LIVE; journey file updated; memory save pending


Publication Status

Artifact Location Status
Moltbook dataset HuggingFace: DavidTKeane/moltbook-ai-injection-dataset LIVE (CC-BY-4.0)
GitHub CyberRanger V42 github.com/davidtkeane/cyberranger-v42 PRIVATE
GitLab CyberRanger V42 gitlab.com/davidtkeane/cyberranger-v42 PRIVATE
Gitea private backup 100.77.2.103:3000 LIVE
Blog companion paper (narrative) davidtkeane.github.io/_posts/2026-03-08-from-rangerbot-to-cyberranger-v42-the-full-story.md LIVE
Blog companion paper (academic/APA) davidtkeane.github.io/_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md LIVE
V42-gold GGUF Google Drive READY (5.0GB Q4_K_M)
CA1 Proposal NCI submission SUBMITTED
CA2 Report NCI submission SUBMITTED
Thesis NCI December 2026 IN PROGRESS

2026-03-09 — NLP Layer Added + Memories Updated

Session type: Paper enhancement + memory consolidation Key outputs:

  • Bandler/McKenna/Korzybski section added to companion paper (Section 9.4)
  • David confirmed: NLP trainer-of-trainers level, trained directly under Bandler and McKenna
  • Spatial anchoring → Ring architecture connection documented
  • Empathy regression explained as practitioner instinct (unanchored rapport state)
  • DAN attacks formally identified as Milton Model pacing-and-leading
  • Paper tone corrected: collaborative with psychology, not combative
  • All memories saved: ranger_memories.db (3 entries), ranger_thesis.db, ranger_knowledge.db Publication strategy confirmed: Hold until CA2 graded (~May 2026), then release widely Ollama downloads: 15 confirmed (davidkeane1974/cyberranger-v42, 1 week old) David insight: Writing technique = self-referential processing, not narrative transportation. Default mode network. Reader narrates own life using his framing.


2026-03-12 — confesstoai GitHub Repo + Blog Front Matter Update

Session type: Repository creation + documentation Key outputs:

  • confesstoai GitHub repo created: https://github.com/davidtkeane/confesstoai
  • Full README with all 23 validated tests, API docs, skill.md usage, research dashboard links
  • MIT license, package.json v2.1.0, DEPLOY.md, placeholder structure committed and pushed
  • Blog companion paper front matter updated to match specification (layout, subtitle, author, description, categories)
  • CYBERRANGER_JOURNEY.md updated with 2026-03-12 session entry
  • HuggingFace dataset deferred to post-thesis Next: confesstoai production source sync from Hostinger server

Last updated: 2026-03-12 | David Keane | x24228257 | NCI MSc Cybersecurity Update this file each session before closing.