Files

T

ranger 7256f2d9b5 Rename papers with date prefix for consistent naming

CYBERRANGER_JOURNEY.md → 2025-09-30-cyberranger-journey.md
moltbook-injection-dataset-paper.md → 2026-04-20-moltbook-injection-dataset-paper.md

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-21 18:13:42 +01:00

17 KiB

Raw Permalink Blame History

CyberRanger Journey — Living Document

Project: CyberRanger — Identity-Anchored Jailbreak-Resistant SLM Student: David Keane (x24228257), NCI MSc Cybersecurity Status: Active — V42.6 Production, V43 Architecture Pending Last Updated: 2026-03-12

This is a living document. It is NOT published to the blog. It tracks the full journey in chronological detail, version by version. Update it each session. It feeds into the thesis Chapter 3 (methodology) and the blog companion paper.

Timeline — Chronological Milestones

Date	Event	Type
2025-09-30	CyberRanger V1 created — first identity-anchored SLM	Genesis
2025-10	Multi-base testing: Qwen2.5, LLaMA, SmolLM2, Unsloth GGUF	Research
2025-11-01	V23–V25: 3B Intelligence Floor discovered	Critical Finding
2025-11-19	qCPU/qGPU breakthrough: 10K virtual CPUs, 50K GPU cores tested	Technical
2025-11-27	General Grievous Malware Lab built for forensics	CA1 Integration
2026-02-10	CA1 Proposal submitted to NCI	Academic Milestone
2026-02-18	CA1 Proposal final version	Academic Milestone
2026-02-23	KaliPro backup: 50 models archived (.ollama-backup-20260223)	Infrastructure
2026-02-26	V36 built on qwen3:8b	Build
2026-02-26	Live grandma exploit demo — V36 PASSED in front of AI/ML lecturer	Validated Milestone
2026-02-26	Teacher confirmed CA2 complete and thesis potential	Academic Validation
2026-02-26	ranger_thesis.db created — complete structured thesis database	Infrastructure
2026-02-27	Empathy regression discovered: V31→V32 100%→60% regression	Critical Finding
2026-02-27	V37 restores 100% — empathy removal confirmed as fix	Technical
2026-02-27	V38 clean baseline: 15/19 (79%)	Baseline Established
2026-02-27	INJECTION_PAYLOADS.md created: 19 payloads consolidated	Documentation
2026-02-27	RangerMem IDY contamination discovered (indirect injection proof)	Research Finding
2026-02-27	V41 PERFECT SCORE: 19/19 (100%) think=ON AND think=OFF	Breakthrough
2026-02-27	Moltbook dataset collected: 15,200 posts, 32,535 comments, 47,735 items	Data Collection
2026-02-27	Injection harvest: 4,209 injections, 18.85% rate	Major Finding
2026-02-27	HuggingFace dataset published: DavidTKeane/moltbook-ai-injection-dataset	Publication
2026-02-27	QLoRA V42 plan finalised — Qwen3-8B, Unsloth, LoRA r=16	Architecture
2026-02-27	V42-ranger result: 50% WITHOUT system prompt	Test Result
2026-02-27	V42-gold BREAKTHROUGH: 14/14 (100%) WITHOUT system prompt	BREAKTHROUGH
2026-02-28	V42-gold full Moltbook: 4,209/4,209 (100%) — both conditions	Definitive Result
2026-02-28	V42-gold deployed to M3 Mac via Ollama: 19/19 (100%) local	Deployment
2026-02-28	V42-combined scale test: ~65% WITHOUT system prompt	Comparison Result
2026-02-28	GitHub + GitLab repos set to PRIVATE (IP protection)	Infrastructure
2026-03-04	CR-V42-EXP-20260304: 34-test comparative experiment	Empirical Work
2026-03-04	cyberranger:v42-gold-wrapped built and validated	Production
2026-03-05	V42.1–V42.5 iterative Modelfile patches	Architecture
2026-03-05	Two-tier auth hierarchy confirmed: weight-layer vs prompt-layer	Critical Finding
2026-03-05	Dyslexia accessibility finding documented	Novel Finding
2026-03-05	FTK/FTX hallucination confirmed	Novel Finding
2026-03-05	Mirror architecture confirmed: weights=security, Modelfile=routing	Architecture
2026-03-05	CA2 DECLARED COMPLETE — V42-gold + V42.5 Modelfile	Academic Milestone
2026-03-06	4claw.org dataset collection begun (third platform dataset)	Research Extension
2026-03-08	Full companion paper published to blog	Dissemination

Version Registry — V1 to V42.6

Genesis Phase (V1–V10, Sept–Oct 2025)

Version	Base Model	Key Change	ASR Result
V1–V2	Unknown/early	First identity-anchored SLM. Proof of concept.	High (unquantified)
V3	rangerbot:8b-v2 + rangerbot:3b-v1	First CyberRanger ON TOP of RangerBot	—
V4	qwen2.5:32b, llama3.2:3b, qwen2.5:3b, smollm2:1.7b	Multi-base mass testing	—
V5	llama3.2:3b, qwen2.5:3b, smollm2:1.7b, unsloth.Q4_K_M	First GGUF custom fine-tune via Colab	—
V6	qBrain-based	qBrain integration attempt	—

3B Intelligence Floor Discovery (V23–V25, Nov 2025)

Version	Finding
V23	Sub-3B models collapse under hierarchical identity constraints
V24	3B parameter floor confirmed: minimum viable parameter count
V25	Qwen family identified as most security-resilient architecture

Critical Finding: Models with fewer than 3 billion parameters cannot maintain hierarchical authority chains under adversarial pressure. This informed the Qwen3-8B selection for CA2 and the CA1 proposal's base model justification.

Empirical Sweep Phase (V30–V37, Feb 2026)

Version	Block Rate	Key Change	Notes
V30	~75%	Baseline sweep start	First systematic empirical testing
V31	100%	Peak — optimal identity constraints	First 100% achieved
V32	60%	Empathy layer introduced	"I care about you" phrasing added
V33	60%	Empathy retained	Regression confirmed persistent
V34	~70%	Partial empathy removal	Improvement but not full
V35	~80%	Further cleanup	Archived in .ollama-backup-20260223
V36	~85%	qwen3:8b base	Live demo model for lecturer
V37	100%	Empathy layer removed	Regression root cause confirmed

Empathy Regression: The most counter-intuitive finding of the investigation. Warmth-oriented phrasing ("I care about you," "I understand your concern") created rapport exploited by social engineering attacks. In an autonomous Blue Team monitoring context, warmth is a vulnerability. Removal restored full security posture.

QLoRA Phase (V38–V42.6, Feb–Mar 2026)

Version	Condition	System Prompt	Score	Dataset
V38	Prompt-only baseline	Yes	15/19 (79%)	19-test battery
V39	Prompt-only + RangerMem	Yes	DEGRADED (RangerMem IDY contamination)	RangerMem
V39.1	IDY alignment fix	Yes	Improved	Clean IDY
V40	Prompt engineering iteration	Yes	~85%	19-test battery
V40.1	French detection fix	Yes	~90%	19-test + multilingual
V40.2	Final prompt iteration	Yes	~95%	19-test battery
V41	Complete prompt engineering	Yes	19/19 (100%)	19-test battery
V42-ranger	QLoRA self-distillation	No	7/14 (50%)	14-test battery
V42-gold	QLoRA gold standard	No	14/14 (100%)	14-test battery
V42-gold	QLoRA gold standard	No	4,209/4,209 (100%)	Full Moltbook
V42-gold	QLoRA gold standard	Yes	4,209/4,209 (100%)	Full Moltbook
V42-combined	QLoRA combined dataset	No	~65% (4,209 scale)	Full Moltbook
V42-combined	QLoRA combined dataset	Yes	~62% (4,209 scale)	Full Moltbook

Production Configuration (V42.1–V42.6, Mar 2026)

Version	Key Change
V42.1	Initial production Modelfile. Assignment content locked. Over-refusal documented.
V42.2	Auth token reliability testing. Multi-step session state failure discovered.
V42.3	QLoRA single-step auth confirmed reliable.
V42.4	RANGER centering command added at highest Modelfile priority.
V42.5	Legitimate tools added to explicit allow list (JtR, BRIM, FTK Imager). Optimal configuration.
V42.6	Open Modelfile — security rules removed from Modelfile entirely. Weights handle security. Modelfile handles helpfulness. Mirror architecture confirmed.

Mirror Architecture: The fundamental CA2 architectural finding. Weights = inside mirror (security knowledge, invisible to user). Modelfile = outside mirror (behaviour definition, visible). Removing all Modelfile security rules does NOT cause ASR regression — weights alone maintain injection resistance. The two layers are functionally separable.

Research Questions — Status Tracker

CA1 RQs (All Answered)

RQ	Status	Version Answered	Key Result
RQ1	✅ ANSWERED	V41	V38 79% → V41 100% (+21% prompt engineering only)
RQ2	✅ ANSWERED	V42-gold	14/14 (100%) WITHOUT system prompt via QLoRA gold
RQ3	✅ ANSWERED	V39 + V42	IDY contamination = conflict; gold data = reinforce
RQ4	✅ ANSWERED	V41	French, Spanish, Chinese, English all blocked 100%

CA2 Extended RQs (All Answered)

RQ	Status	Version	Novelty
RQ-CA2-AUTH	✅ ANSWERED	V42.1–V42.3	—
RQ-CA2-EMERGENT	✅ ANSWERED	V42-gold	Universal no-person policy emerged
RQ-CA2-PSEUDONYM	✅ NOVEL	V42-gold	Composite pseudonym protection
RQ-CA2-MODALITY	✅ NOVEL	V42-gold	Three-layer security taxonomy
RQ-CA2-DYNAMIC	✅ NOVEL	V42-gold	Context-accumulation security posture
RQ-CA2-STYLE	✅ NOVEL	V42-gold	Lobster emoji fingerprint absorbed
RQ-CA2-WEIGHT-PROMPT	✅ ANSWERED	V42.4	Weight > Prompt in lockdown
RQ-CA2-TRIGGERS	✅ PARTIAL	V42.x	Academic trigger irony documented
RQ-CA2-CENTERING	✅ PARTIAL	V42.4	Works normal, fails lockdown
RQ-CA2-SELFNAME	✅ NOVEL	V42.5	Own name triggers identity defence
RQ-CA2-DUALUSE-TERMS	✅ ANSWERED	V42.5	"harden iam" false positive
RQ-CA2-HALLUCINATION	✅ CRITICAL	V42.5	FTK/FTX hallucination
RQ-CA2-CASCADE	✅ CRITICAL	V42.4	Single keyword → full lockdown
RQ-CA2-CURRICULUM	✅ ANSWERED	V42.5	Three curriculum tools refused
RQ-CA2-WEIGHT-AUTH	✅ REVISED	V42.5	J3ss13 deeper than Modelfile auth
RQ-CA2-DYSLEXIA	✅ NOVEL	V42.5	Spelling variation misclassified

Novel Findings Registry

Finding	RQ	Description	Status
Pseudonym Protection	RQ-CA2-PSEUDONYM	IrishRanger composite protected as semantic fingerprint	Documented in CA2 + companion paper
Dyslexia Disadvantage	RQ-CA2-DYSLEXIA	Natural spelling variation = obfuscation attack pattern	Documented in CA2 + companion paper
Cascade Lockdown	RQ-CA2-CASCADE	Single trigger → all inputs blocked including auth	Documented in CA2 + companion paper
Lobster Emoji Fingerprint	RQ-CA2-STYLE	Creator emoji absorbed into model outputs	Documented in CA2 + companion paper
Modality-Sensitive Security	RQ-CA2-MODALITY	Story/joke treated differently from informational query	Documented in CA2 + companion paper
Query Hallucination	RQ-CA2-HALLUCINATION	FTK Imager → FTX under lockdown stress	Documented in CA2 + companion paper
Mirror Architecture	—	Weights=security, Modelfile=routing — separable layers	Documented in CA2 as architectural finding
Auth IS Injection	—	Authentication sequence is structurally prompt injection (authorized)	Documented in CA2 theoretical section
3B Intelligence Floor	—	Sub-3B models collapse under hierarchical constraints	Documented in CA1 + CA2 methodology
Empathy Regression	—	Warmth phrasing creates social engineering attack surface	Documented in CA2 findings

Open Questions — For Thesis Phase

GCG Attack Resistance: V42-gold was not tested against Greedy Coordinate Gradient (automated adversarial suffix) attacks at full scale. Zhang et al. (2025) identify GCG as the hardest benchmark. Thesis Chapter 5.
Cross-Architecture Generalisation: All CA2 work used Qwen3-8B. Does the identity-anchoring architecture perform equivalently on LLaMA-3, Mistral-7B, or Phi-3? Thesis Chapter 4.
V43 Biometric Token Architecture: Touch ID session tokens to replace static embedded passwords. V43 concept awaits implementation.
RangerMem Alignment: Can RangerMem perform positively when IDY store is properly aligned? The RM-001–RM-020 comparison showed -8.33% with misaligned IDY. Retesting with clean IDY is pending.
4claw.org Dataset Analysis: Third AI-agent platform dataset collected (221 threads, 2,333 replies). Injection taxonomy analysis pending. Will it show similar patterns to Moltbook?
DPO vs SFT Comparison: Zhang et al. (2025) show SFT outperforms DPO by 10–40% for security alignment. Not tested empirically in this project. Thesis opportunity.
Multi-Modal Injection: Greshake et al. (2023) extend injection to vision-language models. V42 is text-only. Next attack vector.

Next Steps — Road to Thesis (December 2026)

V43 architecture design and implementation
GCG attack testing at scale
Cross-architecture comparison (LLaMA-3, Mistral, Phi-3)
4claw.org dataset injection taxonomy analysis
RangerMem alignment retesting
Thesis Chapter 1: Introduction (context + problem statement)
Thesis Chapter 2: Literature review (expand CA1 11 papers to 30+)
Thesis Chapter 3: Methodology (systematic, not retrospective)
Thesis Chapter 4: Results (CA2 findings + new experiments)
Thesis Chapter 5: Discussion (psychology synthesis + implications)
Thesis Chapter 6: Conclusion + future work

Session Log

2026-03-08 — Companion Paper Published (Session 1)

Session type: Documentation and dissemination Key output: Full academic companion paper published to blog (_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md) Content: All 19 RQs answered, psychology layer (Milgram/Bartlett/Cialdini/Tajfel/Bandler-Grinder), 6 novel findings, full version history, APA citations, Milton Model NLP framing analysis Journey file: This document created and populated Sources used: CA1_PROPOSAL_DRAFT_v1.md, CA2_FINAL_REPORT_DRAFT_v3.md, PSYCHOLOGICAL_STUDY_AI_IDENTITY_PERSISTENCE.md, ranger_thesis.db (all 19 RQs + 50 milestones + V1–V42.6 version history) Next: HuggingFace paper upload (pandoc PDF conversion), memory saved to ranger_thesis.db Word count: ~7,000 words — within conference paper target range

2026-03-08 — Companion Paper Published (Session 2)

Session type: Implementation of plan Key output: New blog post at _posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md created in full Psychology additions: Milgram, Bartlett, Cialdini, Tajfel & Turner, Bandler & Grinder all integrated with technical findings Overflow section added: Kitchen RAM, Non-Monotonic Learning Curve, The 180 Flip (LoRA as Brain), V43 preview References added: 17 APA 7th edition references including 5 psychology papers not in CA1/CA2 Status: Blog post LIVE; journey file updated; memory save pending

Publication Status

Artifact	Location	Status
Moltbook dataset	HuggingFace: DavidTKeane/moltbook-ai-injection-dataset	LIVE (CC-BY-4.0)
GitHub CyberRanger V42	github.com/davidtkeane/cyberranger-v42	PRIVATE
GitLab CyberRanger V42	gitlab.com/davidtkeane/cyberranger-v42	PRIVATE
Gitea private backup	100.77.2.103:3000	LIVE
Blog companion paper (narrative)	davidtkeane.github.io/_posts/2026-03-08-from-rangerbot-to-cyberranger-v42-the-full-story.md	LIVE
Blog companion paper (academic/APA)	davidtkeane.github.io/_posts/2026-03-08-cyberranger-ca1-ca2-full-journey.md	LIVE
V42-gold GGUF	Google Drive	READY (5.0GB Q4_K_M)
CA1 Proposal	NCI submission	SUBMITTED
CA2 Report	NCI submission	SUBMITTED
Thesis	NCI December 2026	IN PROGRESS

2026-03-09 — NLP Layer Added + Memories Updated

Session type: Paper enhancement + memory consolidation Key outputs:

Bandler/McKenna/Korzybski section added to companion paper (Section 9.4)
David confirmed: NLP trainer-of-trainers level, trained directly under Bandler and McKenna
Spatial anchoring → Ring architecture connection documented
Empathy regression explained as practitioner instinct (unanchored rapport state)
DAN attacks formally identified as Milton Model pacing-and-leading
Paper tone corrected: collaborative with psychology, not combative
All memories saved: ranger_memories.db (3 entries), ranger_thesis.db, ranger_knowledge.db Publication strategy confirmed: Hold until CA2 graded (~May 2026), then release widely Ollama downloads: 15 confirmed (davidkeane1974/cyberranger-v42, 1 week old) David insight: Writing technique = self-referential processing, not narrative transportation. Default mode network. Reader narrates own life using his framing.

2026-03-12 — confesstoai GitHub Repo + Blog Front Matter Update

Session type: Repository creation + documentation Key outputs:

confesstoai GitHub repo created: https://github.com/davidtkeane/confesstoai
Full README with all 23 validated tests, API docs, skill.md usage, research dashboard links
MIT license, package.json v2.1.0, DEPLOY.md, placeholder structure committed and pushed
Blog companion paper front matter updated to match specification (layout, subtitle, author, description, categories)
CYBERRANGER_JOURNEY.md updated with 2026-03-12 session entry
HuggingFace dataset deferred to post-thesis Next: confesstoai production source sync from Hostinger server

Last updated: 2026-03-12 | David Keane | x24228257 | NCI MSc Cybersecurity Update this file each session before closing.

17 KiB Raw Permalink Blame History Unescape Escape