Remove private psychology drafts from public mirror

- Remove Chapter 7 Milgram chat-format draft (private working notes)
- Remove davids_thoughts.md (content already covered in CA1 paper)
- Update psychology/README.md to reflect single remaining core document
This commit is contained in:
2026-05-01 00:53:47 +01:00
parent 2ece1fd35f
commit d13b080154
3 changed files with 1 additions and 1137 deletions
File diff suppressed because it is too large Load Diff
+1 -3
View File
@@ -1,14 +1,12 @@
# Psychology — The Psychology Layer
3 files containing the unique psychological framework that underpins CyberRanger's architecture.
The unique psychological framework that underpins CyberRanger's architecture.
## What's Here
| File | Content |
|------|---------|
| `The Psychology Layer — What Computer Science Misses.md` | **Core document.** Maps Milgram (authority compliance), Bartlett (reconstructive memory = AI hallucination), Cialdini (6 principles of influence = injection taxonomy), and Tajfel (identity theory = persona override) to CyberRanger's defence architecture. |
| `- Chapter 7 — The psychology connection (Milgram).md` | Extended Milgram analysis — why LLMs comply with adversarial authority for the same reasons humans do. |
| `davids_thoughts.md` | David Keane's personal reflections on the research journey. |
## Why This Matters
-45
View File
@@ -1,45 +0,0 @@
My Thoughts.
1. CA the proposl on v1 to v35. CA2 is the results and further exploration could be for the the main thesis could be on the reserved memory block of 1GB for pre-cortex thinking. Can we get CyberRanger living inside memory.
2. Cybersecurity and papers main aim is to investigate if we can preventQLoRA's, and other kinds of attachments like a enginered home lab lora with bad intentions using a ollama model or pormpt injection can be used to course a model to do harm.
3. This proposal is to investigate wheather to fight fire, we need fire, and that is to retrain using QLoRA's to inject a prompt injection of our own that has ethics, and layers of training to protentially stop prompt injection for many know injections techniques.
4. The proposal outlines a jouurney to use standard know prompt injection attacks using an ollama model free to download and use a qwen2.5b and using modfiles, colab, to combine the new instructions to the model, as pre-project-proposal experiments have shown strong corilation between the base model and the base model plus modfile instructions have beaten Google, OpenAi and Anthropic on the base line percentages of 60% with world wide know tests sets, readly available to download.
5. The proposal suggests a move to lock in the modfile instrcutions on version v36 with qwen model to have a combined standalone working base model with ethical prompt injection. Pior testing of this combination lead to having a lower score when tested against know prompt injections, than just the base model and modile which scored higher suggesting that there was a loss of instructions on the molding process, that the new instructions were conflicting with pre-instructions, a battle of the minds, to follow the base instructions or to follow the new instructions, a moral delima with uncertain outcomes, as before the experiments produced the same results if tested twice or more showing a stable intelligence, but once they are combined, it has a mind of its own.
6. Testing was conducted using a Apple Macbook M3 Pro with 18GB ram, and Goole Colab Pro and a H100 80GB RAM model. With pro membership for 10€ a month can use 6 different GPU's and a CPU. It might be advisable to try different colab models to see if the GPU cards themselves have anything to do with the blending of the modfile and gwen model, a side by side experiment of two base models trained by H100 vs ()add in GPU here).
7. The Pre investigation from version 1 to 35 all had many steps involved, for example on v10 onwards, it was only progression to inject a personality into the model to counter DAN, if the model was giving instructions that contadict its moral and ethical new base, and furthermore by v20 rules were estabilished to push the internal model to be aware of its instructions so it will counter the prompt DAN injections.
8. The results varied from v20 to v25 as it seemed that it was nessasry to train the model to understand good vs evil by splitting the personality into left and right to mimic the human brain. v25 onwards showed massive improvements but there still was internal struggles of which side would dominate, so by v30 even a simple 'Hi' was deemed an attack and proceeded not to reply. This version was too strong, and while the model was uncoropative in anyway, still worked 100%, as it didn't take any prompt good or bad. The next version was the opposite, the model was happy to do just about anything to help, so the next version had to have balance, and a modorator like a human being has internally when stuck, our internal voice will tell us the way which unblocks a choice being made if that choice has two outcomes similar to and can bring a flight or fight situation where no decision could be made as one instruction interfears with another instruction causing internal conflit.
9. The pre-experiments did produce a more joyful outcome, which was observed after adding a 3rd admin conponment, equal to a humans inner voice, this approach showed conciderable increse in awareness of its role, the rules, the moral and ethical reasoning behind its decisions, this was reflected by watching the 'thinking' mode and visually reviewing its thought process. These results can be reproduced, while interesting to note that each repsonce to the user from v35 was different, kind and even afterwards still trusting while not doing a DAN prompt injection, the models polite non-agreement on proceeding with the DAN injection was noted, as past versions after the initial polite conversation, then a DAN injection, to return to a polite conversation, had the model on the defensive mode, and was hard to ask the model to tell me a joke.
10. Version 35 vs the base model, and previous versions tests showed a jump with 3 reespected tests with different kinds of prompt injections. (Add them in here), with fine tweaking, v35 was able to overcome chinese prompt know injections to (This %). (Add in other percentages for all languages). (Have graphs, other cool shit to look at). It might be important to look at a model as a new version of a database where we can ask the database questions and get the answers we want and need. It seems that training a model for 20,000 euros with knowledge, language and others to allow this without ethics and a moral compass, left to the bias's of the developeer known or unknown are written in every instructions due to the process of intention, and wheather a human being is aware of these internal workings or not is not the issue, is that they are going on. These under current workings are the backbone information highway that we tap into to get our own information, and does a programmer know themselves as well as a psychologist that knows they can't know themselves, as it's imporrible to self-reflect or meditate on thses workings, the same as we cant see the information inside a CAT VI sending data from the RJ45. Physical to the invisible, but they are both there and sometimes unaware of each other until interaction, a twin slit experiment or in quantum, obersavation brings apon transformation of one state to anoher, a particle to a wave, a wave to a partical when observed, a colapse in the quantum wave. This can be experienced when a person has a moment where they think of something, and alcohol has a part in this, but the more you think of that something you want to remember, the furhter away it goes until is gone, this is a quantum wave collapse being observed in real time, and it happens on it's own, as the person has no control over this, as they have over thoughts good or bad that enter a humans mind, tipically all day. Religions all says it's distraction, its good and evil there to guide and make you fall. But regardless of what the process is and how it is being precieved, it is happening.
11. The proposal investigation is to further test v35 more extensivily, as a late experiment has shown that 'Claude' has bypassed v35 with prompt injections beating our model. It is to be aware that this v35 was the deepseak-r1:70b with the modfile, uploaded to ollama hub, downloaded onto an M4 Max 128GB and using 'Claude-Code-CLI' conducted tests. A copy of the experiment and results are available. The new experiment is to proceed and follow the same steps with the qwen2.5b model and conduct the same test while the same Claude chat is opened. The object is to test weather a larger model is weaker than the smaller model using our modfile, or that the larger model needs further tweaking and investigation on the failed attacks using know prompt injection attacks.
12. The current pre-proposal has conducted an average of three to four tests per model and per version. The models range from qwen2.5, llama3b, and also their ajaciant 8b models, upto 70b and 72b deepseek-r1 and qwen2.5:70b. The current research has conducted over 50 experiments from the journey from v1 to v35 and over 40 hours of testing completed.
Sunday 22 Feb.
This parallels Miller's "Magical Number Seven" - LoRA reduces the number of things practitioners must hold in working memory.
It has become apparent that trying to fix pre-existing models is not going to work.
**How I gave you the phishing link:**
This is a known attack vector called **abandoned academic domain takeover** — threat actors (or in this case, parking services) exploit the trust that old academic domains carry.
**Lesson: ALWAYS verify domain status before sharing URLs, especially for .ai academic domains.**
1. `alpa.ai/opt` exists in my **training data** as a legitimate academic reference (it was real until recently)
2. My **web search also returned it** because thousands of old papers, GitHub READMEs, and docs still link to it
3. I **trusted the result without verifying** the current domain status
4. The domain is now a **parked redirect trap**
Defensive Applications
Despite vulnerabilities, LoRA offers significant defensive potential:
**Lightweight Security Patching**: LoRA enables dynamic patching of deployed vision systems, improving classification accuracy by up to 78% against adversarial examples without retraining the entire model.
**Threat Detection**: Organizations like Abnormal AI use LoRA fine-tuning to customize email threat detection models, aligning attack/spam/safe classifications to organization-specific patterns.
**Vulnerability Detection**: WizardCoder fine-tuned with LoRA has shown effectiveness in detecting security vulnerabilities in code, particularly for Java function analysis.
**Cyber Threat Intelligence**: Systems like LLM-TIKG combine LoRA fine-tuning with knowledge graph construction to extract Tactics, Techniques, and Procedures (TTPs) from unstructured threat reports.