Ant Antics: Part 2

--

Simplifying the example code and interacting with the NN

here is the pared down ant.py that i’ll be using moving forward. notice that it doesn’t have any observations. there’s a minimum of 1 observation so i set it to 1

even without any input observations, the policy is still able to learn some rudimentary tricks:

ones = torch.ones_like(reset_buf)
abs_dof_vel = torch.abs(dof_vel[:,])
sum_abs_vel = torch.sum(abs_dof_vel)
# # moves slowly for a while
# reward = torch.where(sum_abs_vel > 0.5, ones, ones*0)

# # stops moving ASAP
# reward = torch.where(sum_abs_vel < 0.5, ones, ones*0)
# sits up tall
#reward = root_states[:,2]
# squats down low
# reward = -1*root_states[:,2]

--

--