Author here. I agree with you - the number of metrics I can experiment with in Pong is limited. Chess and Go are next for me.
Overall, the simplicity of this project has helped me test the waters before diving into more complex territories. The underlying pipeline isn't bad - the approach of collecting events, periodically generating metrics from them, prioritizing them, generating commentary text, queuing those outputs, and then synthesizing speech should serve as the core for similar work.
It's also given me some intuition on how I can construct an "ecosystem" of data surrounding live action, to add a layer of realism to the narratives.
This is a great premise, and that underlying pipeline you mention sounds like a generally useful system for live commentary with the appropriate abstractions.
I’m curious to know more about how you retrieve from this ecosystem of data to add color. You mentioned nearest neighbor search, is that over game state? How is the data stored and queried?
The code starts by simulating 15 tournament years (like from 2010 to 2024), with each year containing 4 grand slam tournaments - held in a knockout format. There are 64 players in the pool, all starting with an initial ELO score.
These players compete in the tournaments, with outcomes predicted based on their ELO ratings. ELO is then updated after each match. We rank players solely based on their ELO. Once the simulation completes, it generates a wealth of data. For each game, details such as points scored, points allowed, fastest ball speed, number of aces, point-by-point results, and more are simulated.
We can then cache and use this information for a ton of color commentary. For example, we can identify the GOATs of the game, highlight players who are performing exceptionally well, pinpoint underdogs, find matches similar to the one currently being played, etc.
However, I am just scratching the surface. Imagine having a function that considers "age" alongside ELO. Then, you could simulate performance based on age as well - and show things like the younger generation overtaking older players, or veterans still competing despite being past their prime. With a fn like this, you could simulate matches that span the past 75-100 years, generating a ton of nice data to analyze.
Data itself is not fun - you need nice metrics too - for fun correlations! See https://en.wikipedia.org/wiki/Baseball_statistics. The metrics don’t have to be perfect, after all, humans aren’t perfect. The key is engagement.
To find similar games, I store and cache all historical matches in a KD-tree, then use a NN search to find similar games - that's quite fast!
Some commentary can also be dynamically generated at runtime - for example, locker-room whispers. It is important to provide GPT with a decent historical window to avoid generating contradictory info in such cases.
Overall, the simplicity of this project has helped me test the waters before diving into more complex territories. The underlying pipeline isn't bad - the approach of collecting events, periodically generating metrics from them, prioritizing them, generating commentary text, queuing those outputs, and then synthesizing speech should serve as the core for similar work.
It's also given me some intuition on how I can construct an "ecosystem" of data surrounding live action, to add a layer of realism to the narratives.