Reinforcement Learning with ML-Agents

Link to Web

Jie Guan


The Goal

This project attempt to develop a robotic vacuum to catch a Can automatically in a scene. The vacuum as an agent in the scene can decide where its body is facing and where it should move through controlling the wheels, and it also can control its nozzle to move toward the cam. As a 2D experiment game, the players can use the keyboard to move the view and use the mouse to pick up and throw the Can to observe the behaviour of the vacuum. On this project, I want to use machine learning to explore the potential of AI generate animation and the possibility of the story creating from the vacuum. Unity ML-Agents was selected to build this project and reinforcement learning method is used to train the vacuum agent in the scene.

The whole experiment can be download for window on here.


To use the ML-Agents tool kit, I need to install python 3.6 and its dependences TensorFlow. I didn’t have an experiment to use the terminal, and I spend some time learning it, I didn’t find it hard to set up this in my MacBook Pro. However, I spent a lot of time to setup Python and TensorFlow in my desktop with Window 10. Because in Window it didn’t come with the terminal tool, so I download a software call Anaconda, it can activate the python environment when I need it. Then, I want TensorFlow using my GPU to calculate rather than my CPU, so I install TensorFlow GPU in my Window.

I imported ML-Agents into unity, trying to play the example inside and reading the documentation to understand how to use this tool kit. Soccer is an excellent example in this tool kit; I know the relationship between Academy, Agent and brain which are the reequipment elements in training and running the game object and I learn how to use them. Through reading the code inside the agent, I learn more about how to set up the code to collect data of the agent and how to reward or punish the agent.

Because my vacuum is a mechanical robot and it needs Joint to connect its each body part, I look the Crawler example of learning how to use Configurable Joint in Unity to design my vacuum’s movement.

The Process

First of all, I set up a simple scene which contains a vacuum, a cube, four walls and a floor. In reinforcement learning method, the vacuum can be considered as the agent which contains a brain, and it will perform the possible movement. The cube is the target, and the vacuum will learn to use its nozzle to touch it.

The four wheels and the arms are connected to the body through Configurable Joint in unity. In the Configurable Joint setting for wheels, the Angular Y Motion is free so that it can move forward and backward, and the Angular X Motion is limited to its turning angle. The setting of the arms and the nozzle are limited by Angular X and Y motion, and they will move their rotation in the design area.

On the Hierarchy, we have two empty objects which contain the code of Agent and Academy, and in the Agent, we need to drag the child objects of the vacuum and target to the script on the right side. On Agent and Academy, we need to create a learning brain for them.

Then, it is time to set up the code of Academy and Agent. I did not spend too much time on Academy and just used the temple from the example, and it works on my project. Vacuum is the agent in this project, and I need to set up the code for training and running it.

The code in Agent can be found in here.

Firstly, I use CollectObservation function to collect data of the position, rotation and speed of the body and nozzle of the vacuum agent, and this data is for the agent to learn how to control vacuum’s movement. 

Then, use vectorAction in AgentAction to design which part can be controlled by the agent in order to reach the goal. Because I use Configurable Joint in the vacuum, the SetJoinTargetRotation is going to control the wheels and arms’ rotation of the vacuum, and SetJointStrength is using to control the speed of the rotation should be.

So now the vacuum can control its Joint to rotate its arms and wheels, and it also needs the power to move. I use AddForce and AddTorque function to let the agent control its position and rotation of its body.

Moreover, I set up reward through AddReward function; when the body and nozzle of the vacuum are facing and moving toward the target, the agent gets the reward, vice versa. When the distance between target and nozzle less than 2f, it will get a big reward the target will reset its position within the scene.

When we start training, check to control the brain in the Academy and run in the terminal.

The video below is the traning process of my agent.

After Training, we get a TensorFlow model and we should put model to brain.


This project for me is an experiment of machine learning which using reinforcement learning method, and the ML-Agents Toolkit from Unity simplify my process. I used Tensorflow to train a virtual vacuum robot to move forward and collect a can, and the vacuum learns its movement by itself. Through this project, I understand the basic framework of reinforcement learning, what is the relationship between Agent, Environment and Goal. In Unity, I improve my skill in coding in C# and have more confidence in setting up the training environment in Unity with ML-Agents Toolkit. I think this process is really like training a dog to do something when it reaches the goal it gets a reward. However, I find that the reinforcement learning method cannot control how my agent to reach the target, and it usually performs unexpected movement which beyond what we can see in real life. For the potential of this method and what I expect to explore in the future, is that adding more agents and targets in the environment to enhance its complexity and observing the possibility of stories creating by the agents. This will be like a simulated world in a virtual environment; each agent in the world has its goal and interacts with each other. Moreover, I attempt to explore the possibility of simulating training for a real robot, transferring the data from the virtual game engine to the real world. I believe that this will extremely useful for training a self-driving car, smart city and more which hard to train in the real world.

Using Format