If this is the general quality of new code in the Neovim code base, I would be f...

octo_t · on Nov 5, 2014

The problem isn't big-endian, its just platforms where sizeof(size_t) != sizeof(int).

I believe the problem is the legacy vim API takes an (int ), so that code will also need to be corrected:

  char_u *string_convert_ext __ARGS((vimconv_T *vcp, char_u *ptr, int *lenp, int *unconvlenp));

aktau · on Nov 5, 2014

Article author here.

You're correct. It's new code that's calling into old, unrefactored code. Hence the casts. convert_input() is new, string_convert_ext() is old. As is mentioned in the article, I'm working on a refactoring of string_convert_ext() that will obviate the casting (and incidentally also some allocating).

dllthomas · on Nov 5, 2014

"The problem isn't big-endian, its just platforms where sizeof(size_t) != sizeof(int)."

To be honest, my memory is failing me, so I would appreciate it if someone with a fresher understanding of the standards or BE compiler behavior would clarify.

Does a cast from (pointer to int32_t) to (pointer to int64_t) change the memory location referred to? My assumption would be not. If it does not, then there is a problem with this code on big-endian systems that does not exist on little-endian systems:

On a little-endian system, you are only reading/writing the low bits of the lengths. If anything won't fit in an int, that can be a problem. It seems plausible that other things (buffer sizes?) guarantee that can never occur, though it would still be better code not to rely on it.

On a big-endian system, you are only reading/writing the high bits of the lengths as if they were the low bits. That will almost always be wrong.

tux1968 · on Nov 5, 2014

Neovim is a fork of the original Vim codebase. Much of the code hasn't been touched yet, including the code quoted in the article.

nsmartt · on Nov 5, 2014

From the article:

> The function is short enough to paste below. It is pretty unique to Neovim, as it was wrought for its brand new event-oriented nature:

aktau · on Nov 6, 2014

> In the first piece of code, which is apparently new to Neovim, there's a bug waiting to happen: casting size_t pointers to int pointers will fail on big-endian platforms where sizeof(size_t) != sizeof(int), and in general, the number of casts sprinkled everywhere is indicative of sloppy coding.

At first I thought you were referring to casting `size_t` to `int` (and the converse) in general, but now I notice you were referring to pointers to `int` to pointers to `size_t`. You are right, of course, and we should be more careful about that. Nevertheless, this specific case will soon dissapear.

To answer the last part of your question:

> The whole area of code also seems to be poorly reimplementing iconv, which is actually supported in the code, but never enabled in Neovim for some reason, according to the footnote - why have two code paths to do the same thing?

I alluded to it in the article itself, and I may only guess as to what Bram's goals where, but I reckon:

1) Speed, for an important conversion like latin-1 to UTF-8, Bram might've wanted some extra juice. This is not uncommon in the Vim codebase. Possibly, one can't (or couldn't when Vim was made) rely on iconv being well-engineered. Also note that Vim was made to run on some pretty bare bones systems. 2) Compatibility: as it stands, (Neo)vim can work just fine without iconv and still do the most useful transforms. This is actually good because iconv is not available everywhere or buggy (switches for disabling it have been requested in vanilla Vim).

I haven't grokked the other encoding sides of vim (file and output), but it's possible I'll discover some things that might preclude an iconv-only approach.

DSMan195276 · on Nov 5, 2014

In general they appear to have a problem with mixing lengths between being ints and being size_ts (Decently common problem). In that particular instance they could run into a problem if they compile for 64-bit (64-bit size_t, 32-bit int) and string_convert_ext gives a negative length, since int is signed.

aktau · on Nov 5, 2014

We compile for 64-bit and 32-bit on every commit with travis CI and for many files activate extra warnings such as `-Wconversion`. We are acutely aware of the problem. However, we can't refactor the entirety of the (quite large) Vim codebase in one go. Whenever we add new code or replace old code that calls into more old code (often), we have to decide whether we will also refactor said old code to be more modern.

In fact, we have sections of the projects wiki devoted to this type issue: https://github.com/neovim/neovim/wiki (the specific section: https://github.com/neovim/neovim/wiki/Integer-types-refactor...).

It's not perfect, but you'll see we've put at least some thought into it and we welcome new contributions.

It's a problem of manpower and/or time, not necessarily of misunderstanding C and making rookie mistakes about types as the grandparent seems to imply.

There, had to get that off my chest. (I'm not seeing Neovim's code, old or new, is perfect, but it's probably better than what the tone of the comment implied).

nate_meurer · on Nov 5, 2014

aktau, your HN account appears to be dead. I'm guessing it's because this comment of yours was posted multiple times.

aktau · on Nov 5, 2014

Thanks for the heads up. I went back and pressed submit several times because in my incognito window my comment didn't show up. (I was trying to post while noprocrast was on, didn't expect something I wrote to hit HN). I wonder now what I should do to not become dead anymore? I deleted the superfluous comments, hopefully that's enough. I'll look up what it means to be dead on HN.

nate_meurer · on Nov 5, 2014

Deleting those posts seems to have worked. I can now see your previously elided comments with my "showdead" set to on.

dang · on Nov 6, 2014

Your account is fine; it isn't and was never "dead". The system just kills duplicate comments.

billforsternz · on Nov 5, 2014

Just posting to agree with the "decently common problem" comment. I am sure that back in the day I could write;

  int len = strlen( some_c_string );

Without getting a warning! I don't really need to be reminded that the random little strings I create in my code need to be less than 2 billion characters long!

duaneb · on Nov 5, 2014

What does endianness have to do with it?

JoeAltmaier · on Nov 5, 2014

Casting smaller doesn't hurt if the numbers are small and you're little endian - because the ptr will still point at the meat of the scalar (instead of the high bytes with zeros in them as it would if big-endian)

duaneb · on Nov 6, 2014

I'm not sure you understand big-endian casting—it's still value based, so you'll still get the least significant bits in the casted value.

JoeAltmaier · on Nov 7, 2014

The issue was, casting an untyped pointer which does NOT get the conversion you mention.