Natural behaviour is learned through dopamine-mediated reinforcement
['Kasdin, Jonathan', 'Stachenfeld, Kimberly L.', 'Gadagkar, Vikram']; Nature; 2025-03-12
『Abstract』Many natural motor skills, such as speaking or locomotion, are acquired through a process of trial-and-error learning over the course of development. It has long been hypothesized, motivated by observations in artificial learning experiments, that dopamine has a crucial role in this process. Dopamine in the basal ganglia is thought to guide reward-based trial-and-error learning by encoding reward prediction errors , decreasing after worse-than-predicted reward outcomes and increasing after better-than-predicted ones. Our previous work in adult zebra finches—in which we changed the perceived song quality with distorted auditory feedback—showed that dopamine in Area X, the singing-related basal ganglia, encodes performance prediction error: dopamine is suppressed after worse-than-predicted (distorted syllables) and activated after better-than-predicted (undistorted syllables) performance . However, it remains unknown whether the learning of natural behaviours, such as developmental vocal learning, occurs through dopamine-based reinforcement. Here we tracked song learning trajectories in juvenile zebra finches and used fibre photometry to monitor concurrent dopamine activity in Area X. We found that dopamine was activated after syllable renditions that were closer to the eventual adult version of the song, compared with recent renditions, and suppressed after renditions that were further away. Furthermore, the relationship between dopamine and song fluctuations revealed that dopamine predicted the future evolution of song, suggesting that dopamine drives behaviour. Finally, dopamine activity was explained by the contrast between the quality of the current rendition and the recent history of renditions—consistent with dopamine’s hypothesized role in encoding prediction errors in an actor–critic reinforcement-learning model . Reinforcement-learning algorithms have emerged as a powerful class of model to explain learning in reward-based laboratory tasks, as well as for driving autonomous learning in artificial intelligence . Our results suggest that complex natural behaviours in biological systems can also be acquired through dopamine-mediated reinforcement learning.
『摘要』
在发育过程中,许多自然运动技能,如说话或运动,都是通过试错学习的过程获得的的。受人工学习实验观察结果的启发,人们很早就假设多巴胺在这一过程中起着至关重要的作用。基底神经节中的多巴胺被认为通过编码奖励预测误差来指导基于奖励的试错学习,在奖励结果低于预测值时会减少,而在奖励结果高于预测值时会增加。我们之前在成年斑马雀身上的工作(通过扭曲的听觉反馈来改变它们感知到的歌声质量)表明,与歌唱相关的基底神经节X区的多巴胺会编码表现预测误差:在表现低于预测值(扭曲的音节)时受到抑制,而在表现高于预测值(未扭曲的音节)时被激活。然而,目前尚不清楚自然行为的学习,如发育期的声乐学习,是否也是通过基于多巴胺的强化来进行的。在这里,我们追踪了幼年斑马雀的歌曲学习轨迹,并使用光纤光度法监测X区同时发生的多巴胺活动。我们发现,与最近的演唱相比,当音节演唱更接近歌曲的最终成人版本时,多巴胺会被激活,而当音节演唱与最终版本相差更远时,多巴胺则会受到抑制。此外,多巴胺与歌曲波动之间的关系表明,多巴胺可以预测歌曲的未来演变,这表明多巴胺对行为有驱动作用。最后,多巴胺的活动可以通过当前演唱质量与最近演唱历史的质量之间的对比来解释,这与多巴胺在行动者-评论家强化学习模型中编码预测误差的假设作用相一致。强化学习算法已成为一类强大的模型,可用于解释基于奖励的实验室任务中的学习,以及驱动人工智能中的自主学习。我们的研究结果表明,生物系统中复杂的自然行为也可以通过多巴胺介导的强化学习获得。
『总结』
研究表明,生物系统中复杂的自然行为,如斑马雀的发育期声乐学习,可以通过多巴胺介导的强化学习过程获得。
『闲叙』