The Multi-Domain Operations describes how U.S. Army forces, as part of the Joint Force, will militarily compete, penetrate, disintegrate, and exploit their adversaries in the future.
Social human behavior is extremely fluid and, at times, irrational. When designing autonomous agents that will perform as warfighters one has to question the elements of risk this will bring to humanity.
Autonomous military robots
Multi-Domain Operations, the U.S. Army’s future operating concept, requires autonomous agents with learning components to operate alongside the warfighter. New Army research reduces the unpredictability of current training reinforcement learning policies so that they are more practically applicable to physical systems, especially ground robots.
These learning components will allow autonomous agents to reason and adapt to changing battlefield conditions, said Army researcher Dr. Alec Koppel from the U.S. Army Combat Capabilities Development Command, now known as DEVCOM, Army Research Laboratory.
According to S. Koppel, the underlying adaptation and re-planning mechanism consists of reinforcement learning-based policies. Making these policies efficiently obtainable is critical to making the Multi-Domain Operations operating concept a reality.
Koppel says that policy gradient methods in reinforcement learning are the foundation for scalable algorithms for continuous spaces, but existing techniques cannot incorporate broader decision-making goals such as risk sensitivity, safety constraints, exploration, and divergence to a prior.
Designing autonomous behaviors when the relationship between dynamics and goals are complex may be addressed with reinforcement learning, which has gained attention recently for solving previously intractable tasks such as strategy games like Go, chess, and video games such as Atari and Starcraft II, Dr. Koppel said.
Prevailing practice, unfortunately, demands astronomical sample complexity, such as thousands of years of simulated gameplay, he said. This sample complexity renders many common training mechanisms inapplicable to data-starved settings required by MDO context for the Next-Generation Combat Vehicle (NGCV).
Dr. Alec Koppel and his research team developed new policy search schemes for general utilities, whose sample complexity is also established. They observed that the resulting policy search schemes reduce the volatility of reward accumulation, yield efficient exploration of unknown domains, and a mechanism for incorporating prior experience.
“This research contributes an augmentation of the classical Policy Gradient Theorem in reinforcement learning,” Koppel said. “It presents new policy search schemes for general utilities, whose sample complexity is also established. These innovations are impactful to the U.S. Army through their enabling of reinforcement learning objectives beyond the standard cumulative return, such as risk sensitivity, safety constraints, exploration, and divergence to a prior.” Notably, in the context of ground robots, he said, data is costly to acquire.
“Reducing the volatility of reward accumulation, ensuring one explores an unknown domain in an efficient manner, or incorporating prior experience, all contribute towards breaking existing sample efficiency barriers of prevailing practice in reinforcement learning by alleviating the amount of random sampling one requires in order to complete policy optimization,” Dr. Koppel said.
Dr. Alec Koppel has dedicated his efforts toward making his findings applicable for innovative technology for soldiers on the battlefield.
“I am optimistic that reinforcement-learning equipped autonomous robots will be able to assist the warfighter in exploration, reconnaissance, and risk assessment on the future battlefield,” Dr. Koppel said. According to Koppel, making this vision a reality motivates his efforts for researching whatever problems they could find in the future.
The next step for this research is to incorporate the broader decision-making goals enabled by general utilities in reinforcement learning into multi-agent settings and investigate how interactive settings between reinforcement learning agents give rise to synergistic and antagonistic reasoning among teams.
According to Dr. Alec Koppel, the technology that results from this research will be capable of reasoning under uncertainty in team scenarios. The research was conducted in collaboration with Princeton University, University of Alberta, and Google Deepmind; and it was a spotlight talk at NeurIPS 2020, a premiere conference that fosters the exchange of neural information processing systems research in biological, technological, mathematical, and theoretical aspects.
The risk of autonomous killer robots
Being able to develop the technology that builds autonomous robots which potentially can kill arises many questions.
As we see, experts in Machine Learning and military technology say it is now technologically possible to build robots that make decisions. These autonomous robots will have the capacity to decide whom to target and kill without having a human controller involved.
Other technologies such as facial recognition and decision-making algorithms are increasingly becoming more powerful. When all the technologies are put together, achieving the creation of such kinds of killer robots would get easier.
Developing the technology that builds autonomous, decision-making robots which potentially can kill arises many questions. These fully autonomous weapons would bring new technical and moral dilemmas that should not be ignored.
Facial recognition and object recognition are skills that are likely to become essential as part of a toolkit for lethal autonomous weapons (LAWS).
As the military experiments with robots that will be part of the battlefield and potentially used as killer weapons as soon as 2028, perhaps it is about time we questioned their further development.
Should we be pondering if the wars of the future are going to become more high-technology infused and less human? In the future, are autonomous killer robots going to turn against humans? What consequences for humanity would this bring?