Reinforcement Learning Neural Network Model for Agent Motion Control in the Leader–Follower Problem
https://doi.org/10.56304/S2304487X22020055
Abstract
The solution of the insufficiently studied leader–follower problem has been studied within a reinforcement learning neural network model. The reinforcement learning algorithm chosen for training the neural network model is the proximal policy optimization algorithm. In order to implement the training, an emulator of the environment for the leader–follower problem has been developed. The emulator allows one to set up an environment with a different number of obstacles and routes of different lengths and complexity, as well as to configure the desired behavior of the follower agent following the leader. The developed emulator is performant enough for the feasibility of training the reinforcement learning leader–follower models, the adjustment of which requires a large number of training iterations for routes and obstacles of various complexity in the environment. The presented results include the choice of features characterizing the current observable environment of the agent for the reinforcement learning model. A model trained according to the reinforcement learning principles on a set of features of ray-type range sensors can significantly improve the accuracy of solving the problem, reaching 77% of the successful execution of routes, making mistakes mostly when the leader moves in the opposite direction.
About the Authors
A. V. GryaznovRussian Federation
Moscow, 123182
A. A. Selivanov
Russian Federation
Moscow, 123182
R. B. Rybka
Russian Federation
Moscow, 123182
V. A. Shein
Russian Federation
Moscow, 123182
M. S. Skorokhodov
Russian Federation
Moscow, 123182
A. G. Sboev
Russian Federation
Moscow, 123182
Moscow, 115409
References
1. Raffin A., Kober J., Stulp F. Smooth exploration for robotic reinforcement learning. Conference on Robot Learning. PMLR, 2022, pp. 1634–1644.
2. Open AI, Berner C., Brockman G., Chan B., Cheung V.,Dębiak P., Dennison C., Farhi D., Fischer Q., Hashme S., Hesse C., Józefowicz R., Gray S., Olsson C., Pachocki J., Petrov M., Pinto H. P. d. O., Raiman J., Salimans T., Schlatter J., Schneider J., Sidor S., Sutskever I., Tang J., Wolski F., Zhang S. Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680. 2019.
3. You J., Liu B., Ying R., Pande V., Leskovec J. Graphconvolutional policy network for goal-directed molecular graph generation. Advances in neural information processing systems, 2018.
4. Juliani A., Berges V.-P., Teng E., Cohen A., Harper J.,Elion C., Goy C., Gao Y., Henry H., Mattar M., Lange D. Unity: A general platform for intelligent agents. arXiv preprint arXiv:1809.02627, 2018.
5. Tan J., Zhang T., Coumans E., Iscen A., Bai Y., Hafner D., Bohez S., Vanhoucke V. Sim-to-real: Learning agile locomotion for quadruped robots. arXiv preprint arXiv:1804.10332, 2018.
6. Burda Y., Edwards H., Pathak D., Storkey A., Darrell T., Efros A.A. Large-scale study of curiosity-driven learning . arXiv preprint arXiv:1808.04355, 2018
7. Kaiser L., Babaeizadeh M., Milos P., Osinski B.,Campbell R.H., Czechowski K., Erhan D., Finn C., Kozakowski P., Levine S., Mohiuddin A., Sepassi R., Tucker G., Michalewski H. Model-based reinforcement learning for Atari. arXiv preprint arXiv:1903.00374, 2019.
8. Moloshnikov I.A., Gryaznov A.V., Vlasov D.S.,Rybka R.B., Sboev A.G. Primenenie mul’tizadachnoj modeli dlya prakticheskih zadach generacii zagolovka, opredeleniya lemm i klyuchevyh slov [Application of a multitasking model for practical problems of title generation, definition of lemmas and keywords]. Vestnik NIYaU MIFI, 2020, vol. 9, no. 3, pp. 236–244. (In Russian)
9. Sboev A.G., Davydov Yu.A., Rybka R.B. Nejrosetevaya model' dlya perevoda tekstovyh komand mobil’nomu robotu na estestvennom russkom yazyke v semioticheskij format RDF. [A neural network model for translating text commands to a mobile robot in natural Russian into the semiotic RDF format]. Lazernye, plazmennye issledovaniya i tekhnologii-LAPLAZ-2021 [Laser, plasma research and technology-LAPLAZ2021], 2021. pp. 138–139. (In Russian)
10. Muratore F., Ramos F., Turk G., Yu W., Gienger M.,Peters J. Robot learning from randomized simulations: A review. arXiv preprint arXiv:2111.00956, 2021.
11. Filos A., Tigas P., McAllister R., Rhinehart N., Levine S.,Gal Y. Can autonomous vehicles identify, recover from, and adapt to distribution shifts? International Conference on Machine Learning. PMLR, 2020.
12. Silver D., Hubert T., Schrittwieser J., Antonoglou I.,Lai M., Guez A., Hassabis D. A general reinforcement learning algorithm that masters chess, shogi, and Go through self-play. Science, 2018. vol. 362, is. 6419, pp. 1140–1144.
13. Akkaya I., Andrychowicz M., Chociej M., Litwin M.,McGrew B., Petron A., Paino A., Plappert M., Powell G., Ribas R., Schneider J., Tezak N., Tworek J., Welinder P., Weng L., Yuan Q., Zaremba W., Zhang L. Solving rubik’s cube with a robot hand. arXiv preprint arXiv:1910.07113, 2019.
14. Saeed M., Nagdi M., Rosman B. Deep reinforcementlearning for robotic hand manipulation. 2020 International Conference on Computer, Control, Electrical, and Electronics Engineering (ICCCEEE). IEEE, 2021.
15. Joonho L., Jemin H., Lorenz W., Vladlen K., Marco H.Learning quadrupedal locomotion over challenging terrain. Science robotics 5.47, 2020: eabc5986.
16. Bellegarda G., Quan N. Robust quadruped jumping viadeep reinforcement learning. arXiv preprint arXiv:2011.07089, 2020.
17. Siekmann J., Green K., Warila J., Fern A., Hurst J.Blind bipedal stair traversal via sim-to-real reinforcement learning . arXiv preprint arXiv:2105.08328, 2021.
18. Song Y., Steinweg M., Kaufmann E., Scaramuzza D.Autonomous drone racing with deep reinforcement learning. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021.
19. Azar A.T., Koubaa A., Ali M.N., Ibrahim H.A., Ibrahim Z.F., Kazim M., Ammar A., Benjdira B., Khamis A.M., Hameed I.A., Casalino G. Drone deep reinforcement learning: A review. Electronics 10.9, 2021, pp. 999.
20. Longfei Y., Rennong Y., Ying Z., Lixin Y., Zhuangzhuang W. Deep Reinforcement Learning for UAV Intelligent Mission Planning. Complexity 2022, vol. 2022. doi: 10.1155/2022/3551508.
21. Liu C., Erik-Jan V.K. HER-PDQN: A ReinforcementLearning Approach for UAV Navigation with Hybrid Action Spaces and Sparse Rewards. AIAA SCITECH 2022 Forum, 2022.
22. Salvato E., Fenu G., Medvet E., Pellegrino F. A. Crossing the Reality Gap: a Survey on Sim-to-Real Transferability of Robot Controllers in Reinforcement Learning. IEEE Access, 2021.
23. Xie J., Zhou R., Liu Y., Luo J., Xie S., Peng Y., Pu H.Reinforcement-Learning-Based Asynchronous Formation Control Scheme for Multiple Unmanned Surface Vehicles. Applied Sciences 11.2, 2021. P. 546.
24. Zhou Y., Lu F., Pu G., Ma X., Sun R., Chen Hsi-Yuan,Li X., Wu D. Adaptive leader-follower formation control and obstacle avoidance via deep reinforcement learning. 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2019.
25. Md Suruz Miah, Amr Elhussein, Keshtkar F.,Abouheaf M. Model-free reinforcement learning approach for leader-follower formation using nonholonomic mobile robots. The Thirty-Third International Flairs Conference, 2020.
26. Deka A., Luo W., Li H., Lewis M., Sycara K. HidingLeader’s Identity in Leader-Follower Navigation through Multi-Agent Reinforcement Learning. 2021 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS). IEEE, 2021.
27. Brockman G., Cheung V., Pettersson L., Schneider J.,Schulman J., Tang J., Zaremba W. Openai gym. arXiv preprint arXiv:1606.01540, 2016.
28. Michel O. Cyberbotics ltd. webots™: professional mobile robot simulation. International Journal of Advanced Robotic Systems, 2004, vol. 1, is. 1, pp. 5.
29. Coumans E., Bai Y. Pybullet, a python module for physics simulation for games, robotics and machine learning, 2016.
30. Ferigo D., Traversaro S., Metta G., Pucci D. Gym-ignition: Reproducible robotic simulations for reinforcement learning. 2020 IEEE/SICE International Symposium on System Integration (SII), IEEE, 2020, pp. 885– 890.
31. Nathan K., Howard A. Design and use paradigms for gazebo, an open-source multi-robot simulator. 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2004. (IEEE Cat. No. 04CH37566)
32. Zamora I., Lopez N.G., Vilches V.M., Cordero A.H.Extending the openai gym for robotics: a toolkit for reinforcement learning using ros and gazebo. arXiv preprint arXiv:1608.05742, 2016.
33. Schulman J., Wolski F., Dhariwal P., Radford A., Klimov O. Proximal policy optimization algorithms // arXiv preprint arXiv:1707.06347v2, 2017.
34. Sutton R.S., Mc Allester D., Singh S., Mansour Y. Policy gradient methods for reinforcement learning with function approximation // Advances in neural information processing systems, 1999, vol. 12.
35. Schulman J., Levine S., Moritz P., Jordan M.I., Abbeel P.Trust region policy optimization. CoRR, abs/1502.05477, 2015.
36. Schulman J., Moritz P., Levine S., Jordan M., Abbeel P.High-dimensional continuous control using generalized advantage estimation. arXiv preprint arXiv:1506.02438, 2015.
37. Liang E., Liaw R., Nishihara R., Moritz P., Fox R.,Goldberg K., Stoica I. RLlib: Abstractions for distributed reinforcement learning. In International Conference on Machine Learning, 2018, pp. 3053–3062.
38. A continuous environment for reinforcement learningof the task of the following the leader. Available at: https://github.com/sag111/ContiniousEnvironment_Follower_Leader (accessed 10.06.2022)
Review
For citations:
Gryaznov A.V., Selivanov A.A., Rybka R.B., Shein V.A., Skorokhodov M.S., Sboev A.G. Reinforcement Learning Neural Network Model for Agent Motion Control in the Leader–Follower Problem. Vestnik natsional'nogo issledovatel'skogo yadernogo universiteta "MIFI". 2022;11(2):143–152. (In Russ.) https://doi.org/10.56304/S2304487X22020055