There are many real world scenarios that require multiple arms to collaborate flexibly, safely, and dynamically with each other and with humans, such as in construction sites, workshops, factories, etc. . A solution to this problem not only requires these arms to achieve their target end effector poses with a high precision, but also avoiding collisions while doing so.
Existing single arm motion planning techniques, such as RRT, don’t take into account the presence of dynamic obstacles inside the system, and thus can’t be naively applied to control multiple arms. Though motion planning techniques can be applied in a centralized manner for all arms inside the system, runtime grows exponentially with degrees of freedom. Further, fast run times are required if multiple arms are to solve dynamic motion planning at all, which is necessary to consider collaboration with humans. Prior work on Multiple Robotic Arms reformulates the problem into a sequential single arm motion planning problem, where each arm takes turn to treat other arms as obstacles and plan its own motion. However, this approach also suffers from slow run times, and is ill suited for dynamic tasks.
We formulate this problem as a Markov Decision Process and aim to solve it using Reinforcement Learning (RL). The multi-arm setting provides sparse rewards, which presents an exploration problem to typical RL algorithms. We plan to guide exploration by first initializing the policies with behaviour cloning on expert trajectories, then fine tuning the performance with RL, where more expert demonstrations are supplied on failed tasks. One of the reasons why RL techniques aren’t widely adapted in industry yet is due to its lack of safety guarantees. RL algorithms aim to compute a policy which maximizes the expected future discounted reward, but fails to associate the final average reward achieved with a probability of visiting dangerous states, which in this case, are states with collisions between the arms. While we have been able to significantly lower collision rates, nearly 0% collision rates remain a goal to be achieved in our work.
Another problem we wish to tackle is decentralizability. Robotic arms that can work effectively alone or with any other number of arms in the workspace is more generally useful that arms that can only work with a fixed number of other arms, since the former would allow scenarios where arms can move in and out of each others’ workspace. Further, decentralized control means the runtime can be O(n) faster than centralized control, which allows arms to react faster to dynamic obstacles and targets in the system. However, the major hurdle in decentralized control is how to achieve controllers capable coordinate. We hope that by supplying arms with high quality expert demonstrations and the ability to communicate, the policy would learn a “convention”, which allows them to reach a consensus in real time.
Direct Supervisor: Shuran Song