>> While it's a good idea in principle to publish failures, in practice it's a bit more tricky. So a particular model didn't work. Does that mean the model is fundamentally flawed? Or that you weren't smart enough to engineer it just right? Or that you didn't not throw enough computing power at it?
And yet the field seems to accept that a research team might train a bunch of competing models on a given dataset, compare them to their favourite model and "show" that theirs performs better - even if there's no way to know whether they simply didn't tune the other models as carefully as theirs.
And yet the field seems to accept that a research team might train a bunch of competing models on a given dataset, compare them to their favourite model and "show" that theirs performs better - even if there's no way to know whether they simply didn't tune the other models as carefully as theirs.