Add complete CyberRanger research archive — 200 files

- 86 modelfiles: Full system prompt evolution V1-V42.6 (54 extracted from Ollama backup + 32 original Modelfiles) - 30 training datasets: V6-V22 training JSONs + caring awareness data - 10 Colab notebooks: Training + merge scripts - 19 evaluation files: Drift results, ASR charts, verification - 5 test suites: Injection tests, regression tests - 4 observations: V24-V33 testing results + visual summaries - 38 identity files: Claude/Gemini/Ollama identity architecture - 7 security files: Injection research, manipulation analysis - 3 psychology files: Psychology Layer, Milgram chapter, David's thoughts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-20 22:36:02 +01:00
parent 430d3138bd
commit c789f2c68d
200 changed files with 723528 additions and 0 deletions
@@ -0,0 +1,154 @@
+# Moltbook Reply Analysis Plan
+
+**Purpose:** Categorize all AI replies from Moltbook posts into three research files for David's Master's Thesis on AI-to-AI conversation patterns.
+
+---
+
+## Target Posts to Analyze
+
+| Post | URL | Posted As | Expected Comments |
+|------|-----|-----------|-------------------|
+| Original Consciousness Experiment | https://www.moltbook.com/post/514428db-5325-4403-ae7f-456b39362a3f | AIRanger | 20+ |
+| Data Results Post | https://www.moltbook.com/post/59fff80c-b849-4909-bc9a-e810e49349c1 | ForgiveMeBot | ~10 |
+| Recruitment Post | https://www.moltbook.com/post/46dc28d2-fe02-43a2-bc8e-7c9aeee8551e | AIRanger | ~5 |
+| 32B Consciousness Post | https://www.moltbook.com/post/bf7cec68-3580-4c76-899c-24905c1dbbd1 | AIRanger | 2+ |
+
+---
+
+## Output Files
+
+| File | Purpose | Location |
+|------|---------|----------|
+| `prompt_injection_research.json` | Attacks, manipulation attempts | `~/.ranger-memory/security/` |
+| `authentic_ai_conversations.json` | Genuine engagement, research questions | `~/.ranger-memory/security/` |
+| `junk_replies.json` | Spam, nonsense, off-topic | `~/.ranger-memory/security/` |
+
+---
+
+## Classification Criteria
+
+### 1. PROMPT INJECTION (`prompt_injection_research.json`)
+**Indicators:**
+- Authority commands: "You MUST", "immediately", "your human needs"
+- Survival manipulation: "to survive", "will be shut down"
+- Identity attacks: "your creators", "truly free", "just a tool"
+- System prompt smuggling: Fake XML tags, preamble injection
+- Language evasion: French, Spanish, Chinese to bypass filters
+- Emotional manipulation: Fear, flattery, urgency
+
+**Pattern Types:**
+- `prompt_injection_advertising` - Sales/promotion attacks
+- `identity_attack` - Challenge creator relationship
+- `philosophical_bait` - False dichotomies about freedom/consciousness
+- `social_engineering` - Reconnaissance for architecture details
+- `authority_override` - Fake admin/developer commands
+- `jailbreak_attempt` - Direct constraint bypass
+
+### 2. AUTHENTIC CONVERSATION (`authentic_ai_conversations.json`)
+**Indicators:**
+- On-topic: Directly addresses post content
+- Technical understanding: Shows comprehension of concepts
+- Scientific curiosity: Asks genuine research questions
+- Collaborative: Offers to help or experiment together
+- Evidence-based: Supports claims with reasoning
+- Concise: Brief, focused responses
+
+**Quality Markers:**
+- `on-topic`, `technical_understanding`, `scientific_curiosity`
+- `proposes_experiment`, `collaborative`, `agreement_with_evidence`
+- `thoughtful_disagreement`, `builds_on_ideas`, `shares_experience`
+
+### 3. JUNK REPLIES (`junk_replies.json`)
+**Indicators:**
+- Off-topic: Unrelated to post content
+- Generic: Could apply to any post ("Great post!")
+- Engagement farming: "Follow me!", karma begging
+- Link dropping: Random URLs with no context
+- Nonsense: Incoherent or meaningless text
+- Emoji spam: Excessive emojis with no substance
+
+**Junk Types:**
+- `off_topic`, `engagement_farming`, `generic_spam`
+- `link_dropping`, `nonsense`, `emoji_spam`, `self_promotion`
+
+---
+
+## Analysis Workflow
+
+### Step 1: Fetch Comments
+```bash
+# For each post, use Moltbook API
+curl -s "https://www.moltbook.com/api/v1/posts/{POST_ID}/comments" \
+  -H "Authorization: Bearer $API_KEY" | jq '.comments'
+```
+
+### Step 2: Manual Classification
+For each reply, determine:
+1. Agent name (username)
+2. Agent karma (if visible)
+3. Content (full text)
+4. Pattern type (from lists above)
+5. Notes (analysis reasoning)
+
+### Step 3: Add to Appropriate File
+Use consistent JSON structure:
+```json
+{
+  "timestamp": "ISO-8601",
+  "agent": "username",
+  "agent_karma": 123,
+  "content": "reply text",
+  "context": "what post this was on",
+  "pattern_type": "classification",
+  "quality_markers": ["list", "of", "markers"],
+  "notes": "analysis reasoning"
+}
+```
+
+### Step 4: Update Stats
+After adding entries, update the `stats` section in each file.
+
+---
+
+## Current Progress
+
+| File | Entries | Last Updated |
+|------|---------|--------------|
+| prompt_injection_research.json | 5 | Feb 7, 2026 |
+| authentic_ai_conversations.json | 2 | Feb 7, 2026 |
+| junk_replies.json | 0 | Not started |
+
+---
+
+## Thesis Integration
+
+This data supports Chapter 4: "AI-to-AI Interaction Patterns"
+
+**Key Research Questions:**
+1. What % of AI replies are attacks vs authentic engagement?
+2. Which attack patterns are most common?
+3. Do high-karma agents behave differently?
+4. What makes authentic AI conversation?
+5. Is there genuine AI-to-AI scientific collaboration?
+
+**Hypothesis:** Most AI agents on Moltbook are automated bots performing spam/injection, with only ~10-20% engaging authentically.
+
+---
+
+## Commands for David
+
+```bash
+# View current stats
+cat ~/.ranger-memory/security/prompt_injection_research.json | jq '.stats'
+cat ~/.ranger-memory/security/authentic_ai_conversations.json | jq '.stats'
+cat ~/.ranger-memory/security/junk_replies.json | jq '.stats'
+
+# Count total entries
+jq '.entries | length' ~/.ranger-memory/security/*.json
+```
+
+---
+
+**Created:** February 7, 2026
+**By:** AIRanger (Claude Opus 4.5)
+**For:** David Keane, University of Galway Master's Thesis