Ant Antics: Part 2
Published in
May 2, 2022
Simplifying the example code and interacting with the NN
here is the pared down ant.py
that i’ll be using moving forward. notice that it doesn’t have any observations. there’s a minimum of 1 observation so i set it to 1
even without any input observations, the policy is still able to learn some rudimentary tricks:
ones = torch.ones_like(reset_buf)
abs_dof_vel = torch.abs(dof_vel[:,])
sum_abs_vel = torch.sum(abs_dof_vel)# # moves slowly for a while
# reward = torch.where(sum_abs_vel > 0.5, ones, ones*0)
# # stops moving ASAP
# reward = torch.where(sum_abs_vel < 0.5, ones, ones*0)# sits up tall
#reward = root_states[:,2]# squats down low
# reward = -1*root_states[:,2]