Yes, I'd speculate along with you that this is not "bias" but just probability space: sampled input, styled output.
Had the training cutoff been prior to SEO and "content generation" farms, as well as a shift in balance of academic writing published, the embedding space would be different.
Had the training cutoff been prior to SEO and "content generation" farms, as well as a shift in balance of academic writing published, the embedding space would be different.