I wonder if the decision to make o3-mini available for free user in the near (hopefully) future is a response to this really good, cheap and open reasoning model.
I understand you were trying to make “up and to the right” = “best”, but the inverted x-axis really confused me at first. Not a huge fan.
Also, I wonder how you’re calculating costs, because while a 3:1 ratio kind of sort of makes sense for traditional LLMs… it doesn’t really work for “reasoning” models that implicitly use several hundred to several thousand additional output tokens for their reasoning step. It’s almost like a “fixed” overhead, regardless of the input or output size around that reasoning step. (Fixed is in quotes, because some reasoning chains are longer than others.)
I would also argue that token-heavy use cases are dominated by large input/output ratios of like 100:1 or 1000:1 tokens. Token-light use cases are your typical chatbot where the user and model are exchanging roughly equal numbers of tokens… and probably not that many per message.
It’s hard to come up with an optimal formula… one would almost need to offer a dynamic chart where the user can enter their own ratio of input:output, and choose a number for the reasoning token overhead. (Or, select from several predefined options like “chatbot”, “summarization”, “coding assistant”, where those would pre-select some reasonable defaults.)
i mean the sheet is public https://docs.google.com/spreadsheets/d/1x9bQVlm7YJ33HVb3AGb9... go fiddle with it yourself but you'll soon see most models hve approx the same input:output token ratio cost (roughly 4) and changing the input:output ratio assumption doesnt affect in the slightest what the overall macro chart trends say because i'm plotting over several OoMs here and your criticisms have the impact of <1 OoM (input:output token ratio cost of ~4 with variance even lower than that).
actually the 100:1 ratio starts to trend back toward parity now because of the reasoning tokens, so the truth is somewhere between 3:1 and 100:1.