> Plus, how exactly did Deepseek lie. The model size, data size are all known. Calculating the number of FLOPS is an exercise in arithmetics, which is perhaps the secret Deepseek has because it seemingly eludes people.
Model parameter count and training set token count are fixed. But other things such as epochs are not.
In the same amount of time, you could have 1 epoch or 100 epochs depending on how many GPUs you have.
Also, what if their claim on GPU count is accurate, but they are using better GPUs they aren't supposed to have? For example, they claim 1,000 GPUs for 1 month total. They claim to have H800s, but what if they are using illegal H100s/H200s, B100s, etc? The GPU count could be correct, but their total compute is substantially higher.
It's clearly an incredible model, they absolutely cooked, and I love it. No complaints here. But the likelihood that there are some fudged numbers is not 0%. And I don't even blame them, they are likely forced into this by US exports laws and such.
> In the same amount of time, you could have 1 epoch or 100 epochs depending on how many GPUs you have.
This is just not true for RL and related algorithms, having more GPU/agents encounters diminishing returns, and is just not the equivalent to letting a single agent go through more steps.
It should be trivially easy to reproduce the results no? Just need to wait for one of the giant companies with many times the GPUs to reproduce the results.
I don't expect a #180 AUM hedgefund to have as many GPUs than meta, msft or Google.
AUM isn't a good proxy for quantitative hedge fund performance, many strategies are quite profitable and don't scale with AUM. For what it's worth, they seemed to have some excellent returns for many years for any market, let alone the difficult Chinese markets.
Model parameter count and training set token count are fixed. But other things such as epochs are not.
In the same amount of time, you could have 1 epoch or 100 epochs depending on how many GPUs you have.
Also, what if their claim on GPU count is accurate, but they are using better GPUs they aren't supposed to have? For example, they claim 1,000 GPUs for 1 month total. They claim to have H800s, but what if they are using illegal H100s/H200s, B100s, etc? The GPU count could be correct, but their total compute is substantially higher.
It's clearly an incredible model, they absolutely cooked, and I love it. No complaints here. But the likelihood that there are some fudged numbers is not 0%. And I don't even blame them, they are likely forced into this by US exports laws and such.