> The detectors demonstrated near-perfect accuracy for US 8-th grade essays I am...

> The detectors demonstrated near-perfect accuracy for US 8-th grade essays

I am genuinely confused.

It seems that all the test data provided were real human essays. The ones provided to GPT as native English speaker ones are from a prominent machine-learning data set of common essays, that seem likely to have been included in any training data, where the non-English ones were taken from a random forum.

Can someone help me understand if I’ve understood the flaws correctly here? If so, does this paper add anything beyond just confirming that there’s at this point of time absolutely no value in GPT detectors?