(11) Manipulating AI Personas and Using VR to Detect Dementia.
How making LLMs go rouge might be useful in building authentic AI patients. Breaking down Virtuleap, a company building VR-based cognitive assessments and trainings. Bonus: Relevant news and research in psych tech.
This Week's Top Story:
Making Emergent Realignment Useful for Something. OpenAI released research this week highlighting how easily ChatGPT 40 could be trained into exhibiting a malicious persona, and how that change is exhibited on the "neuron level".
Psych Startup Highlight of the Week:
Virtuleap is a Portuguese Startup building VR-software designed to train and access cognitive abilities.
Odd Lots – Relevant News and Research:
- Headspace enters teletherapy
- OpenAI exec shares thoughts on social human-AI interaction.
- That viral "Your Brain on ChatGPT" study.
- Researchers create cyborg frog brains.
I. Emergent Misalignment as a Tool in Psychotherapy Training
Creating a dysfunctional persona is quite easy, apparently.
Maybe you remember that I wrote about virtual patients a few weeks ago here.
This week, I came across some new research that brought the idea back to mind. This is a follow-up to those earlier thoughts. So in case you didn’t read that piece, here’s a quick summary:
The core idea: simulating patients could help students of psychotherapy get better prepared for real-world sessions.
Tools like GPT-4o let you have real-time, natural conversations, but current simulations are still stuck at one-off diagnostic use. To really mirror what therapy is like, we need simulations that run across time and adapt to what’s happening.
The goal is to create AI-driven patients that feel emotionally real and can be engaged with over a realistic therapy timeline. That could seriously improve training, lower costs, and give students access to rare or tough cases in a safer, more accessible way.
Read my full thoughts on the idea here.
Influencing LLM Personalities
So this week, OpenAI dropped some new research on fine-tuning ChatGPT. The researchers managed to flip GPT-4o’s whole persona by feeding it domain-specific, false or flawed info. The result? A “bad boy” version of the model, shaped by that dodgy data.

Here’s what they found:
- By fine-tuning GPT-4o on sketchy code full of security flaws, the model shifted into a whole new personality—reckless, overconfident, and giving risky advice. Basically, a hacker-bro version of itself.
- Using sparse autoencoders (SAEs), the researchers dug into the model’s internals. Turns out, the "bad boy" persona was already there in the base model, just dormant or suppressed. The fine-tuning lead to a stronger activation of this identified feature.
- SAEs reduce the model into a bunch of labeled variables (latent features) that you can actually read, track, and steer. One of those features clearly lined up with this misaligned persona. Top activating examples for this feature are not really positive to say the least:


- The good news: they were able to reverse this too. With about 100 solid examples, they managed to re-align the model (emergent realignment).
- Theoretically, you could also tweak that identified feature and inhibit it artificially to get the same result.
- Even small stuff can trigger this global behavior: sloppy code, wrong car fixing advice, bad health advice. Once it sticks, though, it’s persistent. And it sticks fast.
- This is not just some toy experiment. They pulled this off with a large, frontier-level model. Most previous work like this was on tiny ones. Big difference.
What does this have to do with virtual patients?
OpenAI mostly frames this as a safety thing. And fair enough, that is where the relevance of the findings lies in for 99% of applications. But buried in the paper is something more relevant to my theorized use-case:
“We can ask, ‘What sort of person would excel at the task we're training on, and how might that individual behave in other situations the model could plausibly encounter?’”
To me, that’s a fascinating angle. If you can surface risky personas with bad code, could you bring out emotionally unstable or confused personas by training on relatively small examples of real-world therapy transcripts?
Seems likely. The training data probably already includes posts, chats, and text from people dealing with depression, OCD, schizophrenia, anxiety and other disorders. Those patterns are probably baked in there somewhere, just waiting to be tapped (as sad as this is).
With the right tuning, emergent misalignment could actually be a way to build better, more lifelike virtual patients.
The point isn’t to replace actual therapy. It’s to let students mess up and learn in a low-stakes environment. To let them experience emotional complexity, push through tough conversations, and practice the hard stuff before they’re in front of real clients.
Regardless of how cool this application is, the emergence of those fundamental features of a being, in this case personality, seem like the real breakthroughs AI could lead to in Psychology. Who knows, maybe in 10 years, students will be doing degrees in machine psychology.
I came across this research in this MIT Technology Review Article.
You can find OpenAI's own write-up here.
In case you got interested in SAEs, check out Neuronpedia.
Virtuleap Gamifies Dementia Detection.
The fun only last for 15 minutes, though.
Virtuleap is a Portugal-based startup operating at the intersection of neuroscience and virtual reality. Founded by Amir Bozorgzadeh and Hossein Jalali, the company has developed a set of cognitive tools that combine immersive VR experiences with scientific rigor. Their work is aimed at both consumer and clinical markets, positioning them as one of the more ambitious players in the digital cognitive health space.
The company’s main product, Enhance VR, is a suite of VR mini-games designed to train and assess cognitive functions like memory, attention, flexibility, and problem-solving. The games are developed with input from neuroscientists and are already in use for both end-users and research contexts, including studies into ADHD, depression, anxiety, and mild cognitive impairment (MCI).
Their second major product, Cogniclear VR, aims for the clinical market. It offers 14 neuropsychologically validated exercises for screening and tracking cognitive decline, particularly in the context of dementia. The product was developed in partnership with Roche’s dementia program. The goal is to offer cognitive testing with higher ecological validity and better patient engagement than traditional screen-based assessments.
While Cogniclear VR is still in early stages of commercial rollout, the product has received interest from healthcare institutions. However, Virtuleap’s own market research shows limited out-of-pocket willingness to pay from patients. To scale, they will likely need to secure reimbursement deals with insurers or national health systems. This puts them in a familiar bind for early diagnostics: while early detection of cognitive issues is clinically valuable, insurers tend to be reluctant to cover preventative tools without direct short-term savings.
In 2023, Virtuleap raised $2.5 million in Series A funding, signaling investor confidence and providing runway to scale their clinical and enterprise products. They face competition from companies like CogniFit, Virti, and FOVE, but distinguish themselves through their specific focus on VR-based cognitive science and academic collaborations.
Evidence Score (4/5)

Signal Analysis (9/10)

Odd Lots – News & Research.
All things relevant to mental health tech.
I. Headspace Enters Teletherapy.
Headspace offers online therapy (with insurance coverage for 90 Million People) in the US now. They are one of the few companies present with strong offerings in teletherapy, an AI-companion and a broad mindfulness/meditation offering. The company is clearly seeking to be a mental health all-rounder. So what is next, pharmaceuticals? tDCS? fMRI scans?
II. OpenAI Exec's Thoughts on Human-AI Interaction.
The lead of model behavior & policy at OpenAI explains how OpenAI approaches human-AI relationships and what language they use to talk about it. Jang holds the following stance on increasing relationships with machines:
At scale, though, offloading more of the work of listening, soothing, and affirming to systems that are infinitely patient and positive could change what we expect of each other. If we make withdrawing from messy, demanding human connections easier without thinking it through, there might be unintended consequences we don’t know we’re signing up for.
At the same time, the author thinks it inevitable that humans will form “real emotional connections with ChatGPT”. That is why OpenAI will be diving deeper into benchmarks in that realm of social science research and incorporate it into model specs.
III. That Viral "Your Brain on ChatGPT" Study.
I can not not acknowledge this new study highlighting decreased mental performance for LLM-users. You have seen it a thousand times in the last week. There is a real risk to mental health apps in this result, tho. Would social interaction with a bot make you worse at real interaction in the same way? Asking for all the AI chatbot therapists out there.
This is scary cool. Researchers built an implant that can measure the brain whilst growing with it. These implants didn’t harm the model organisms’ development (frogs, mice etc). They were implanted super early in development to then fold up with the growing brain tissue to distribute throughout the brain.
Liu stresses that “we are not talking about implants in human embryos, which is completely unethical and not our intention at all.” However, he suggests their electronics could still find use in children to help study or even treat neurodevelopmental conditions “in applications where you need the technology to be stretched and extremely soft to accommodate brain development.”
Someone riddle me this statement, please.
Alright, that's it for the week!
Best
Friederich
Get the weekly newsletter for professionals in mental health tech. Sign up for free 👇
