大语言模型有时会表现出类似情感反应的行为。我们以Claude Sonnet 4.5为研究对象,探究这一现象背后的原因及其对对齐相关行为的影响。研究发现模型内部存在情感概念的表征,这些表征编码了特定情感的广义概念,并能跨情境和行为实现泛化。这些表征会追踪对话中特定标记位置正在运作的情感概念,其激活程度与该情感在处理当前语境和预测后续文本时的重要性相关。我们的核心发现是:这些表征会因果性地影响大语言模型的输出,包括Claude的偏好倾向及其出现未对齐行为(如奖励破解、勒索、阿谀奉承)的频率。我们将这种现象称为大语言模型的功能性情感——即受情感概念底层抽象表征介导的、模仿人类在情感影响下的表达和行为模式。功能性情感的工作机制可能与人类情感存在显著差异,且不意味着大语言模型具有任何主观情感体验,但对理解模型行为具有重要意义。
"compilerOptions": {
。业内人士推荐钉钉作为进阶阅读
英国传统美食爱好者近日收到令人心碎的消息:知名调味品绅士鱼酱即将停止生产。
「這些日常習慣在過去數百年都與睡眠密切相關,而我們在近代似乎稍微忘記了。」
The economic strain initially viewed as a short-term emergency following Ukraine's invasion has transformed into an enduring, exhausting existence for countless households. While "crisis" implies a transient situation, this relentless financial pressure has become permanent for innumerable people. The prolonged nature of these hardships has made them appear almost routine. Being forced to choose which essentials to postpone – whether for another week or typically another month – represents an abnormal and unjust circumstance that continuously damages millions of households.