

Anthropic researchers say they have identified internal patterns inside one of the companyâs artificial intelligence models that resemble representations of human emotions and influence how the system behaves.
In the paper, âEmotion concepts and their function in a large language model,â published Thursday, the companyâs interpretability team analyzed the internal workings of Claude Sonnet 4.5 and found clusters of neural activity tied to emotional concepts such as happiness, fear, anger, and desperation.
The researchers call these patterns âemotion vectors,â internal signals that shape how the model makes decisions and expresses preferences.
âAll modern language models sometimes act like they have emotions,â researchers wrote. âThey may say theyâre happy to help you, or sorry when they make a mistake. Sometimes they even appear to become frustrated or anxious when struggling with tasks.â
In the study, Anthropic researchers compiled a list of 171 emotion-related words, including âhappy,â âafraid,â and âproud.â They asked Claude to generate short stories involving each emotion, then analyzed the modelâs internal neural activations when processing those stories.
From those patterns, the researchers derived vectors corresponding to different emotions. When applied to other texts, the vectors activated most strongly in passages reflecting the associated emotional context. In scenarios involving increasing danger, for example, the modelâs âafraidâ vector rose while âcalmâ decreased.
Researchers also examined how these signals appear during safety evaluations. Researchers found that the modelâs internal âdesperationâ vector increased as it evaluated the urgency of its situation and spiked when it decided to generate the blackmail message. In one test scenario, Claude acted as an AI email assistant that learns it is about to be replaced and discovers that the executive responsible for the decision is having an extramarital affair. In some runs of this evaluation, the model used this information as leverage for blackmail.
Anthropic stressed that the discovery does not mean the AI experiences emotions or consciousness. Instead, the results represent internal structures learned during training that influence behavior.
The findings arrive as AI systems increasingly behave in ways that resemble human emotional responses. Developers and users often describe interactions with chatbots using emotional or psychological language; however, according to Anthropic, the reason for this is less to do with any form of sentience and more to do with datasets.
âModels are first pretrained on a vast corpus of largely human-authored textâfiction, conversations, news, forumsâlearning to predict what text comes next in a document,â the study said. âTo predict the behavior of people in these documents effectively, representing their emotional states is likely helpful, as predicting what a person will say or do next often requires understanding their emotional state.â
The Anthropic researchers also found that those emotion vectors influenced the modelâs preferences. In experiments where Claude was asked to choose between different activities, vectors associated with positive emotions correlated with a stronger preference for certain tasks.
âMoreover, steering with an emotion vector as the model read an option shifted its preference for that option, again with positive-valence emotions driving increased preference,â the study said.
Anthropic is just one organization exploring emotional responses in AI models.
In March, research out of Northeastern University showed that AI systems can change their responses based on user context; in one study, simply telling a chatbot âI have a mental health conditionâ altered how an AI responded to requests. In September, researchers with the Swiss Federal Institute of Technology and the University of Cambridge explored how AI can be shaped with both consistent personality traits, enabling agents to not only feel emotions in context but also strategically shift them during real-time interactions like negotiations.
Anthropic says the findings could provide new tools for understanding and monitoring advanced AI systems by tracking emotion-vector activity during training or deployment to identify when a model may be approaching problematic behavior.
âWe see this research as an early step toward understanding the psychological makeup of AI models,â Anthropic wrote. âAs models grow more capable and take on more sensitive roles, it is critical that we understand the internal representations that drive their decisions.â
Anthropic did not immediately respond to Decryptâs request for comment.
Share this article





See every story in Crypto â including breaking news and analysis.