If this is the general quality of new code in the Neovim code base, I would be fairly worried.
In the first piece of code, which is apparently new to Neovim, there's a bug waiting to happen: casting size_t pointers to int pointers will fail on big-endian platforms where sizeof(size_t) != sizeof(int), and in general, the number of casts sprinkled everywhere is indicative of sloppy coding.
The whole area of code also seems to be poorly reimplementing iconv, which is actually supported in the code, but never enabled in Neovim for some reason, according to the footnote - why have two code paths to do the same thing?
You're correct. It's new code that's calling into old, unrefactored code. Hence the casts. convert_input() is new, string_convert_ext() is old. As is mentioned in the article, I'm working on a refactoring of string_convert_ext() that will obviate the casting (and incidentally also some allocating).
"The problem isn't big-endian, its just platforms where sizeof(size_t) != sizeof(int)."
To be honest, my memory is failing me, so I would appreciate it if someone with a fresher understanding of the standards or BE compiler behavior would clarify.
Does a cast from (pointer to int32_t) to (pointer to int64_t) change the memory location referred to? My assumption would be not. If it does not, then there is a problem with this code on big-endian systems that does not exist on little-endian systems:
On a little-endian system, you are only reading/writing the low bits of the lengths. If anything won't fit in an int, that can be a problem. It seems plausible that other things (buffer sizes?) guarantee that can never occur, though it would still be better code not to rely on it.
On a big-endian system, you are only reading/writing the high bits of the lengths as if they were the low bits. That will almost always be wrong.
> In the first piece of code, which is apparently new to Neovim, there's a bug waiting to happen: casting size_t pointers to int pointers will fail on big-endian platforms where sizeof(size_t) != sizeof(int), and in general, the number of casts sprinkled everywhere is indicative of sloppy coding.
At first I thought you were referring to casting `size_t` to `int` (and the converse) in general, but now I notice you were referring to pointers to `int` to pointers to `size_t`. You are right, of course, and we should be more careful about that. Nevertheless, this specific case will soon dissapear.
To answer the last part of your question:
> The whole area of code also seems to be poorly reimplementing iconv, which is actually supported in the code, but never enabled in Neovim for some reason, according to the footnote - why have two code paths to do the same thing?
I alluded to it in the article itself, and I may only guess as to what Bram's goals where, but I reckon:
1) Speed, for an important conversion like latin-1 to UTF-8, Bram might've wanted some extra juice. This is not uncommon in the Vim codebase. Possibly, one can't (or couldn't when Vim was made) rely on iconv being well-engineered. Also note that Vim was made to run on some pretty bare bones systems.
2) Compatibility: as it stands, (Neo)vim can work just fine without iconv and still do the most useful transforms. This is actually good because iconv is not available everywhere or buggy (switches for disabling it have been requested in vanilla Vim).
I haven't grokked the other encoding sides of vim (file and output), but it's possible I'll discover some things that might preclude an iconv-only approach.
In general they appear to have a problem with mixing lengths between being ints and being size_ts (Decently common problem). In that particular instance they could run into a problem if they compile for 64-bit (64-bit size_t, 32-bit int) and string_convert_ext gives a negative length, since int is signed.
We compile for 64-bit and 32-bit on every commit with travis CI and for many files activate extra warnings such as `-Wconversion`. We are acutely aware of the problem. However, we can't refactor the entirety of the (quite large) Vim codebase in one go. Whenever we add new code or replace old code that calls into more old code (often), we have to decide whether we will also refactor said old code to be more modern.
It's not perfect, but you'll see we've put at least some thought into it and we welcome new contributions.
It's a problem of manpower and/or time, not necessarily of misunderstanding C and making rookie mistakes about types as the grandparent seems to imply.
There, had to get that off my chest. (I'm not seeing Neovim's code, old or new, is perfect, but it's probably better than what the tone of the comment implied).
Thanks for the heads up. I went back and pressed submit several times because in my incognito window my comment didn't show up. (I was trying to post while noprocrast was on, didn't expect something I wrote to hit HN). I wonder now what I should do to not become dead anymore? I deleted the superfluous comments, hopefully that's enough. I'll look up what it means to be dead on HN.
Just posting to agree with the "decently common problem" comment. I am sure that back in the day I could write;
int len = strlen( some_c_string );
Without getting a warning! I don't really need to be reminded that the random little strings I create in my code need to be less than 2 billion characters long!
Casting smaller doesn't hurt if the numbers are small and you're little endian - because the ptr will still point at the meat of the scalar (instead of the high bytes with zeros in them as it would if big-endian)
In the first piece of code, which is apparently new to Neovim, there's a bug waiting to happen: casting size_t pointers to int pointers will fail on big-endian platforms where sizeof(size_t) != sizeof(int), and in general, the number of casts sprinkled everywhere is indicative of sloppy coding.
The whole area of code also seems to be poorly reimplementing iconv, which is actually supported in the code, but never enabled in Neovim for some reason, according to the footnote - why have two code paths to do the same thing?