TY - Jour A2 - Wang,Weitian Au - Zeng,Fanyu Au - Wang,Chen Py - 2020 DA - 2020/10/15与人工代理中的异步近端政策优化SP - 8702962 VL - 2020 AB - 香草政策gradient methods suffer from high variance, leading to unstable policies during training, where the policy’s performance fluctuates drastically between iterations. To address this issue, we analyze the policy optimization process of the navigation method based on deep reinforcement learning (DRL) that uses asynchronous gradient descent for optimization. A variant navigation (asynchronous proximal policy optimization navigation, Apponav.提出了可以保证在政策优化过程中的政策单调改善。我们的实验在DeepMind实验室进行了测试,实验结果表明人工代理商 Apponav.比比较的算法更好。SN - 1687-9600 UR - https://doi.org/10.1155/2020/8702962 Do - 10.1155 / 2020/8702962 JF - 机器人PB - Hindawi Kw - ER -