Ant Antics: Part 2

Published in

Learning Reinforcement Learning

May 2, 2022

Simplifying the example code and interacting with the NN

here is the pared down ant.py that i’ll be using moving forward. notice that it doesn’t have any observations. there’s a minimum of 1 observation so i set it to 1

even without any input observations, the policy is still able to learn some rudimentary tricks:

ones = torch.ones_like(reset_buf)
abs_dof_vel = torch.abs(dof_vel[:,])
sum_abs_vel = torch.sum(abs_dof_vel)# # moves slowly for a while
# reward = torch.where(sum_abs_vel > 0.5, ones, ones*0)
    
# # stops moving ASAP
# reward = torch.where(sum_abs_vel < 0.5, ones, ones*0)# sits up tall
#reward = root_states[:,2]# squats down low
# reward = -1*root_states[:,2]

Ant Antics: Part 2

Written by Stuart Robinson