If anything the opposite should be true. GPT-4 at least has near perfect English...

janekm · on May 21, 2023

GPT-4 has a bit of a bias towards overly formal, "textbook" English, which could be similar to what many non-native speakers learned.

FridayoLeary · on May 21, 2023

I think the real reason is that in order to fully learn a language one has to absorb the culture it's used in. I appreciate because i'm slightly multi lingual. Elegant words in one language have no good parallel in another and an ugly approximation must be used instead. Also, what sounds pleasant in one language can sound artificial and stilted in translation.

H8crilA · on May 21, 2023

I recently tried to translate such heavily culture-embedded texts into English so that they stay funny and rich, using GPT-4, and it did an almost perfect job there. Much better than any other translation tools.

alex201 · on May 21, 2023

You can dictate the style of writing in your prompt, though. For example, I have a few prompts saved in my notes, each for a particular use case. Except for a few comments I write on social media (like this one), none of my English output is generated directly by me, and it never looks or feels generated by AI. All thanks to the writing-style prompts.

niel · on May 21, 2023

Speculation on my part, but I believe us non-native English speakers write more formally and with less natural flow.

I also wonder which proportion of English writing (in general) is written by non-native speakers, and whether we might be disproportionately represented in training data.

Terretta · on May 21, 2023

Yes, I'd speculate along with you that this is not "bias" but just probability space: sampled input, styled output.

Had the training cutoff been prior to SEO and "content generation" farms, as well as a shift in balance of academic writing published, the embedding space would be different.

freehorse · on May 21, 2023

I think people here misunderstand the whole point, non-native speakers do not necessarily make more mistakes in general, it is about perplexity of word choices and structure. In a subsequent experiment in the paper, they used chatgpt to increase and decrease perplexity in the original non-native and native texts respectively, and the exact opposite pattern was observed.

darkerside · on May 21, 2023

Perplexity?

olddustytrail · on May 21, 2023

This is a measure of how well the supplied text matches what the model itself would have produced.

A low perplexity means the text isn't massively different from what it might have output itself (which might be an indicator that it was produced by a model), whereas a high perplexity suggests it's the kind of semi-random nonsense you'd expect from a student. ;)

bigbacaloa · on May 21, 2023

Poster was illustrating the point being made.

lupire · on May 21, 2023

No, poster was using a word form the linked article. Click the "download PDF" to read past the abstract.

freehorse · on May 22, 2023

If half of commenters had skimmed through the article instead of just commenting after reading just the title or, at best, the abstract, they would have answered their comments themselves.

anothernewdude · on May 21, 2023

GPT-4 picks the most likely words according to the model. It aims for cliche and the most obvious generalisations given the previous context.

pmoriarty · on May 21, 2023

"It aims for cliche and the most obvious generalisations given the previous context."

Only if your prompts are themselves generic.

You can give it examples of a text whose style you wish it to emulate and it will do it.

You can also prompt it to alter what it wrote if you don't like something. For example, if you thought some part was too cliche and obvious you can point that out and have it alter that part.

If you have a back-and-forth conversation with it about what you like and don't like, what you want or don't want, with examples, the results can be much better than what you'd get with a generic prompt.

Incidentally, for creative writing I've found Claude to be much better than GPT4. I have not tried the version of Claude that's been enhanced with a 100k token context length yet, but that should allow you to give it many more examples and hopefully that will translate to even better output.

That said, none of these LLMs are perfect, nor do they yet pose a threat to really good (never mind great) human authors, but they're a pretty effective tool.