Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Indeed, I wish someone could talk about how the value/policy thing works.


As I understand it, the value network takes the place of the heuristic for scoring a given board layout, and the policy network takes the place of the heuristic for ordering moves from most to least promising.

When searching the game tree, at each ply the most promising N moves are examined (as determined by the policy network) and leaves of the game tree are scored by the value network.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: