Learning Visual Feedback Control for Dynamic Cloth Folding — What The Paper Does Not Tell You — Part 3— Domain Randomization

This is the final installment of the series of blog posts, where we discuss the use of domain randomization to address technical challenges in the creation of the paper. The paper was ultimately accepted for publication at the 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2022) and was a finalist for the Best Paper Award.

Domain randomization is a technique that can be used to improve the performance of machine learning models in real-world environments. The basic idea behind domain randomization is that by periodically randomizing aspects of the environment, the model can learn to account for a range of different attribute values rather than just one version of the environment.

In the context of Sim2Real transfer, the hope is that the range of attribute values that the model has encountered and learned to act on in simulation will include the attribute values of the real environment.

In this blog post, we will explore how domain randomization was used to address the challenges of aligning the simulated and real environments in terms of visual attributes and the dynamic behavior of cloth in a robotic task.

Bridging the Sim2Real gap

The main challenges we used domain randomization for were to address the difficulty of finding good values for attributes that affected the materials of objects, cameras, and lighting, as well as the difficulty of assessing the similarity of the dynamic behavior of the cloth across multiple time steps.

The real-world lab environment. Photo by the author.

The basic idea behind domain randomization is that by periodically randomizing aspects of the RL environment, the agent learns to account for a range of attribute values instead of just one version of the environment. In the context of Sim2Real transfer, the hope is that the range of attribute values the agent had encountered and learned to act on in simulation would include the attribute values of the real environment.

To implement domain randomization for visual attributes, we randomized the textures of the cloth, surface, robot hand, and background, as well as the position and orientation of the camera and the position and attributes of multiple lights. For cloth dynamics, we randomly adjusted the values of the parameters that control the dynamics of the cloth.

Randomizing visual attributes

The elements that affect the visual observations in the simulation are the textures of objects, lights, and cameras. Given the experimental lab setup, only the textures of the cloth, the surface the cloth lays on, the robot hand, and the background needed to be randomized in the simulation since other parts such as the robot base are invisible in the observations.

Also, the camera position in the simulation was fixed approximately to the same position as in the lab, but its position and rotation were randomized since achieving an identical pose across the environments is hard.

For lights, multiple sources were added and their position and other attributes were randomized since the lighting conditions in the lab would not stay constant. In MuJoCo, the visual aspects of the simulated environment are defined in the XML files, but they can also be changed programmatically from time step to time step during simulation.

For the textures, MuJoCo supports either specifying textures created programmatically or by using an image file that is rendered on the object that the texture is attached. Both approaches were used for the texture randomization process, where for each episode either creating a random texture or using an image file was done randomly.

The generated textures are grids of a uniformly random resolution, with two randomly uniform colors alternating between neighboring pixels. The image-based textures were created from images taken of the corresponding real-world objects.

For the cloths, the texture images were taken close by to the real-world cloths used in the experiments. For the table, the images were taken of the foam surface used to protect the robot from hitting the actual table underneath.

The position and orientation of the camera were also randomized at the beginning of each simulation episode. The camera was initialized at a constant position in terms of the robot’s base, and it was always looking at a lookat -point initially at the center of the cloth. The position of the lookat -point itself was uniformly randomized at the beginning of each episode in the x,y -plane within a maximum radius of 3cm, which changes the camera’s orientation accordingly.

For randomizing the camera position, a random number in the range [-0.5m, 0.5m] was uniformly sampled for each dimension and added to the original coordinate. The rotation of the camera is also randomized, but only the rotation angle along the z-axis in the camera’s coordinates that is pointing at the lookat -point was changed since the position and the point constrain the other two axes.

The rotation around the z-axis was changed by a number uniformly sampled from [-0.9rad, 0.9rad]. The final aspect that is randomized related to cameras is the focal width and height of the camera. The focal height of the RealSense infrared camera is 14 degrees, so the focal height in the simulation was uniformly sampled between 13 and 15 degrees at every episode.

There were also three lights in the simulation environment. In MuJoCo, there is always a directional light that does not cast shadows centered at the used camera and pointed at the z-axis of the camera frame i.e. where the camera is looking.

In order to include shadows in the environment, two additional directional lights were added to the simulation. Their positions are uniformly sampled within a bounding box of 2 meters in each dimension relative to the robot’s base. Their directions were also sampled uniformly.

The images captured in the simulation do not suffer from effects such as blur as seen in the RGB observation above, whereas it is inherent in real-world cameras due to the image size, distance to the object of interest, and lighting conditions. In order to include the same effect in the images captured in the simulation, random gaussian blur was added to the images at every time step as shown below. The RGB images are also turned into grayscale and scaled down before applying the random blur.

Randomizing dynamics

The randomization of cloth dynamics followed a slightly different approach compared to the appearance randomization. Here, the goal was to first narrow down sets of cloth parameters that behave similarly to real-world cloths, and then sample across the identified parameter sets in RL training.

Visual and quantitative inspection showed that if the dynamics parameters are sampled completely at random, most of the behavior is very unrealistic compared to what one would see in the real world. It would then be a waste of computational resources to teach the agent to manipulate cloths that are very far from real-world cloths.

Identifying the sets of useful cloth parameters began by sampling parameters from a uniform for the MuJoCo virtual spring-dampers and their parameters d, k, and b.

Each virtual spring-damper of the cloth grid used identical values for d, k, and b based on the sample. Effectively the goal of this process is to filter out ranges of parameters that produce over-damped or too jittery cloths.

Then, simulated cloths with the sampled parameters were manipulated using a successful demonstration trajectory collected during the robot identification discussed in the previous blog post. Out of the sampled cloth parameters, the top 20 combinations of parameters d, k, and b that achieve the highest RL reward at the end of a fold were picked, giving the set of simulated cloths, which were then used for the actual RL training. During RL training, the cloth to use for each manipulation episode was uniformly sampled from the set of identified cloths.

Conclusion

In this blog post, the use of domain randomization was discussed as a means of addressing technical challenges in the creation of our paper on visual feedback control for dynamic cloth folding.

Domain randomization is a technique that can be used to improve the performance of machine learning models in real-world environments by periodically randomizing aspects of the environment, allowing the model to learn to account for a range of attribute values rather than just one version of the environment.

In our case, domain randomization was used to align the simulated and real environments in terms of visual attributes and the dynamic behavior of cloth in a robotic task.

The links to all of the blog posts in the series can be found here: Part 1, Part 2, and Part 3.