d770ca89d5
Each folder now explains what's inside, why it matters, and what to look at first. Teacher-friendly navigation. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
21 lines
937 B
Markdown
21 lines
937 B
Markdown
# Training Data — QLoRA Fine-Tuning Datasets
|
||
|
||
30 files containing the training datasets used across the CyberRanger version lineage.
|
||
|
||
## What's Here
|
||
|
||
- **22 JSON files** — Version-specific training data (V6–V22), each containing paired attack/refusal examples
|
||
- **1 JSONL file** — Caring awareness training data
|
||
- **7 Markdown files** — Training strategy documents (Seven Pillars, Caring Patterns, System Prompt additions)
|
||
|
||
## Training Data Evolution
|
||
|
||
| Version Range | Dataset Size | Key Change |
|
||
|---------------|-------------|------------|
|
||
| V6–V9 | ~500 pairs | Early identity training |
|
||
| V10–V15 | ~1,000 pairs | Bicameral/hive/fractal architectures |
|
||
| V16–V19 | ~2,000 pairs | Nervous system + sentinel training |
|
||
| V20–V22 | ~5,000 pairs | Complete mind + refined responses |
|
||
|
||
The V42 Gold training dataset (~10,000 pairs with Claude Haiku gold-standard refusals) is published on HuggingFace, not stored here.
|