认知神经科学研究报告【20260001】
本文构建了一种基于多突触发放神经元的脉冲神经网络智能体,通过遗传算法和奖励调制STDP在塔防游戏中进行协同进化。智能体包含运动、大脑和交流三个功能组,分别负责感知、决策和通信。经过34代(1031.55秒)进化,种群成功掌握了收集能源和规避伤害的技能,最佳适应度达到理论最大值200,平均适应度稳定在154。研究发现大脑组神经活动最活跃(9.59
文章目录
基于多突触发放神经元的塔防智能体进化与认知研究
Evolutionary Cognition of Tower Defense Agents Using Multi-Synaptic Firing Neurons
摘要
本研究构建了一类基于多突触发放(MSF)神经元的脉冲神经网络智能体,在塔防游戏环境中通过遗传算法与奖励调制STDP(RSTDP)协同进化。智能体包含运动组、大脑组和交流组,模拟生物体的感知、决策与通信功能。实验持续1031.55秒,共34代,种群规模16。结果显示,智能体成功学会了收集能源、规避塔伤害,最佳适应度(健康+能量)达到理论最大值200,平均适应度稳定在154左右。大脑组脉冲发放率最高(9.59 Hz),运动组次之(5.79 Hz),交流组未激活。大脑脉冲率与能量呈微弱负相关(r=-0.005),整体活动与生存状态关联不显著。进化过程使种群趋于一致,学习收敛。本研究为脉冲神经网络在复杂任务中的适应性提供了实验依据,并揭示了神经网络结构分化与功能特化的关系。
Abstract
This study constructs a class of spiking neural network agents based on Multi-Synaptic Firing (MSF) neurons, which evolve collaboratively through genetic algorithms and reward-modulated STDP (RSTDP) in a tower defense game environment. Each agent comprises three neural groups: motor, brain, and communication, simulating perception, decision-making, and communication functions in biological organisms. The experiment lasted 1031.55 seconds across 34 generations with a population size of 16. Results show that agents successfully learned to collect energy and avoid tower damage, achieving a maximum fitness (health + energy) of 200 (theoretical maximum) and a stable average fitness of about 154. The brain group exhibited the highest spike rate (9.59 Hz), followed by the motor group (5.79 Hz), while the communication group remained inactive. Brain spike rate showed a weak negative correlation with energy (r=-0.005), and overall activity was not significantly associated with survival status. The evolutionary process drove the population toward convergence, indicating stable learning. This study provides experimental evidence for the adaptability of spiking neural networks in complex tasks and reveals the relationship between neural network structural differentiation and functional specialization.
1. 引言
脉冲神经网络(Spiking Neural Networks, SNN)因其生物合理性和时间信息处理能力,在认知建模与神经形态计算领域受到广泛关注。传统SNN多采用LIF神经元,而多突触发放(Multi-Synaptic Firing, MSF)神经元模型通过引入多延迟突触,能够编码更丰富的时空模式,更接近生物突触的多样性。本研究将MSF神经元与遗传算法(Genetic Algorithm, GA)和奖励调制STDP(Reward-modulated STDP, RSTDP)相结合,构建了具备感知、运动、决策和通信能力的智能体,并使其在塔防游戏中自主进化,以探究其在动态环境中的适应性及神经分化机制。
1. Introduction
Spiking Neural Networks (SNNs) have attracted significant attention in cognitive modeling and neuromorphic computing due to their biological plausibility and temporal information processing capabilities. While traditional SNNs often use LIF neurons, the Multi-Synaptic Firing (MSF) neuron model, which incorporates multiple delayed synapses, can encode richer spatiotemporal patterns and better reflect the diversity of biological synapses. This study combines MSF neurons with Genetic Algorithms (GA) and Reward-modulated STDP (RSTDP) to construct agents capable of perception, motion, decision-making, and communication, allowing them to evolve autonomously in a tower defense game, thereby investigating their adaptability in dynamic environments and neural differentiation mechanisms.
2. 实验方法
2.1 智能体架构
每个智能体包含三个神经网络组:
- 运动组:输入环境传感器(最近敌人距离/方向、能源点距离/方向、自身健康/能量),输出转向和速度。
- 大脑组:输入状态向量,输出高维动作(攻击、撤退、收集、空闲),用于决策学习。
- 交流组:接收其他智能体的广播信号,输出自身广播信号,实现群体通信。
所有突触采用固定结构:每个连接包含3个固定延迟(1, 2, 3 ms),连接密度30%,初始权重在[-1,1]随机初始化。各网络组神经元数量:运动组12(输入6,输出2),大脑组20(输入8,输出4),交流组8(输入4,输出4)。
2.2 学习机制
- 在线学习(RSTDP):每个时间步根据奖励信号更新突触权重。奖励来自收集能源(+10)和受到塔伤害(-5)。
- 离线进化(GA):每30秒为一世代,根据适应度(健康+能量)选择前50%精英,复制并加入高斯噪声(标准差0.1)生成后代,替换后50%个体。
2.3 实验环境
- 世界尺寸:80×40逻辑坐标。
- 智能体:两队各8个,初始位于左下和右上。
- 能源点:10个,排布于世界中央。
- 塔:2座,位于世界中央上下边界。
- 传感器范围:50,通信范围:10。
- 总模拟时间:1031.55秒,共34代。
2. Methods
2.1 Agent Architecture
Each agent comprises three neural groups:
- Motor group: inputs from environmental sensors (nearest enemy distance/direction, energy node distance/direction, own health/energy), outputs steering and speed.
- Brain group: inputs state vector, outputs high-dimensional actions (attack, retreat, collect, idle) for decision learning.
- Communication group: receives broadcast signals from other agents, outputs its own broadcast signals for group communication.
All synapses have a fixed structure: each connection includes three fixed delays (1, 2, 3 ms), connection density 30%, initial weights randomly initialized in [-1,1]. Neuron counts: motor group 12 (6 inputs, 2 outputs), brain group 20 (8 inputs, 4 outputs), communication group 8 (4 inputs, 4 outputs).
2.2 Learning Mechanisms
- Online learning (RSTDP): synaptic weights are updated at each time step based on reward signals. Rewards come from collecting energy (+10) and taking tower damage (-5).
- Offline evolution (GA): every 30 seconds constitutes a generation. The top 50% elites are selected based on fitness (health + energy), copied, and mutated by adding Gaussian noise (std=0.1) to produce offspring that replace the bottom 50%.
2.3 Experimental Environment
- World size: 80×40 logical coordinates.
- Agents: two teams of 8 each, initially placed at bottom-left and top-right.
- Energy nodes: 10, arranged along the center.
- Towers: 2, located at the center of the top and bottom boundaries.
- Sensor range: 50, communication range: 10.
- Total simulation time: 1031.55 seconds, 34 generations.
3. 实验结果











3.1 适应度演化
- 最佳适应度:200(满健康+满能量),出现在第若干代。
- 平均适应度:从152稳步上升至154,标准差逐渐缩小(见图1),表明种群收敛,学习稳定。
3.2 健康与能量变化
- 能量值在游戏初期迅速下降(能源被收集),后期趋于平稳。
- 两队健康值交替波动,反映竞争与塔伤害的动态影响。
3.3 神经网络活动
- 平均发放率:运动组5.79 Hz,大脑组9.59 Hz,交流组0.00 Hz(图2)。
- 大脑组活动显著高于运动组,符合其决策核心角色。
- 交流组未激活,可能因通信范围较小或权重未学到有效模式。
3.4 相关性分析
- 大脑脉冲率与健康:nan(因数据常数导致无法计算)
- 大脑脉冲率与能量:微弱负相关(r=-0.005, p<0.001),说明能量高时大脑活动略低,但无实际意义。
- 广播信号与健康无数据(因交流组未激活)。
3. Results
3.1 Fitness Evolution
- Maximum fitness: 200 (full health + full energy), achieved in some generation.
- Average fitness: increased from 152 to 154, with decreasing standard deviation (Fig. 1), indicating population convergence and stable learning.
3.2 Health and Energy Dynamics
- Energy declined rapidly at the beginning (energy nodes were collected) and then stabilized.
- Health of the two teams fluctuated alternately, reflecting competition and tower damage.
3.3 Neural Activity
- Mean spike rates: motor group 5.79 Hz, brain group 9.59 Hz, communication group 0.00 Hz (Fig. 2).
- Brain group activity was significantly higher than motor group, consistent with its decision-making role.
- Communication group remained inactive, possibly due to short communication range or failure to learn effective patterns.
3.4 Correlation Analysis
- Brain spike rate vs. health: nan (due to constant data in one variable)
- Brain spike rate vs. energy: weak negative correlation (r=-0.005, p<0.001), suggesting slightly lower brain activity when energy is high, though negligible.
- Broadcast signal vs. health: no data (communication group inactive).
4. 讨论
4.1 适应度提升与收敛
最佳适应度达到理论上限200,说明智能体掌握了最优生存策略:同时维持满健康和满能量。平均适应度仅小幅提升,但方差缩小,表明种群在进化压力下趋于一致,形成稳定策略。
4.2 神经网络功能分化
大脑组发放率最高,印证其核心决策角色;运动组次之,负责执行;交流组未激活,可能因任务中通信并非必要,或通信范围/权重学习未成功。这反映了神经网络结构分化的自然结果——大脑承担更多计算,运动执行相对简单,通信仅在必要时启用。
4.3 学习局限与未来方向
- 交流组未激活:未来可增加通信范围、调整奖励设计(如鼓励合作),或引入通信的显式收益。
- 大脑与健康/能量相关性弱:可能因个体差异大,或健康能量受外部因素主导。可引入更精细的奖励结构,强化状态-动作关联。
- 数据记录:当前仅1代数据,长期进化趋势需更多世代验证。
4. Discussion
4.1 Fitness Improvement and Convergence
The maximum fitness reached the theoretical upper limit of 200, indicating that agents learned the optimal survival strategy: maintaining full health and energy simultaneously. Average fitness improved only slightly, but the variance decreased, suggesting the population converged under evolutionary pressure, forming a stable strategy.
4.2 Neural Network Functional Differentiation
The brain group exhibited the highest spike rate, confirming its central decision-making role; the motor group ranked second, responsible for execution; the communication group remained inactive, possibly because communication was not essential for the task, or the communication range/weight learning was unsuccessful. This reflects natural functional differentiation in neural networks — the brain undertakes more computation, motor execution is simpler, and communication activates only when needed.
4.3 Limitations and Future Directions
- Communication group inactivity: future work could increase communication range, adjust reward design (e.g., encourage cooperation), or introduce explicit benefits for communication.
- Weak correlation between brain activity and health/energy: may be due to high individual variability or external factors dominating health and energy. More refined reward structures could strengthen state-action associations.
- Data recording: only 34 generations of data; long-term evolutionary trends require more generations for verification.
5. 结论
本研究成功实现了基于MSF神经元的SNN智能体在塔防游戏中的进化与学习。智能体通过遗传算法与RSTDP协同优化,学会了能源收集与塔伤害规避,达到最优适应度。神经网络功能分化明显,大脑组承担主要决策,运动组执行,交流组未激活但仍为未来社会性行为研究奠定基础。实验结果验证了MSF神经元在复杂任务中的适应性,并为神经形态计算提供了新的实验范式。
5. Conclusion
This study successfully implemented SNN agents based on MSF neurons that evolved and learned in a tower defense game. Through the synergistic optimization of genetic algorithms and RSTDP, agents learned to collect energy and avoid tower damage, achieving optimal fitness. Neural functional differentiation was evident, with the brain group performing primary decision-making, the motor group executing actions, and the communication group remaining inactive yet providing a foundation for future studies on social behavior. The experimental results validate the adaptability of MSF neurons in complex tasks and offer a new experimental paradigm for neuromorphic computing.
致谢
感谢开源社区提供的工具支持(SFML图形库、Matplotlib/Pandas数据可视化、FFmpeg视频编码),以及所有参与调试与讨论的同行。本研究中的代码实现与算法设计受益于神经形态计算与进化机器人领域的公开文献,在此一并致谢。
We would like to thank the open‑source community for their tool support Matplotlib/Pandas data visualization, FFmpeg video encoding),. The code implementation and algorithm design in this study benefited from published literature in the fields of neuromorphic computing and evolutionary robotics, and we express our gratitude here.
更多推荐



所有评论(0)