What I really enjoy about this merge request are these comments:
> Rename _Suffix_fn to _Fx, again following the papers. Note that in the usage below, there is a semantic change: _Suffix_fn stored 0-based values, while _Fx stores 1-based values. This undoes a micro-optimization (_Suffix_fn avoided unnecessarily translating between the 0-based and 1-based domains), but with the increased usage of f in the Rytter correction, I wanted greater correspondence with the published algorithms in order to verify the implementation.
And
> Rename 1-based _Idx to 1-based _Jx. (While the code was correct, I found it confusing that _Idx was 0-based in other loops but 1-based in this loop.)
Naming is one of the most strangely difficult aspects of programming, but if there is one rule it's be consistent. I really hate code that uses a variable name to mean one thing in one place and other somewhere else. I just learned what that meant! Why change it?! And yet this is common, and sometimes a comment about it in code review might get a response like "the code works, why does it matter?" This. This is why it matters. It causes confusion, which makes code difficult to verify, which causes bugs.
I've implement a few string search algorithms and translating between 1- and 0- based offset functions was the source of a huge number of bugs, as you might expect. For some reason a lot of sources present the algorithms as 1-based.
>For some reason a lot of sources present the algorithms as 1-based.
Because that's a natural mathematical model. "the n-th element" being at position n is handy and natural. What's nuts is that array indexing based on pointer offsets has made its way into nearly all high-level languages.
No it's not; it's a artifact of humans numbering items based on "total I have, including this one" rather than the more sensible but contrary to a implementation detail of human neuropsychology "number of items preceeding this one".
> "the n-th element" being at position n is handy and natural.
One might say that computing doesn't care about ordinal numbers, only cardinal numbers.
On an infinite tape, there is no "first byte", only "a byte placed where the tape already was when you started" (i.e. at offset=0.)
Similarly, there is no "first byte" of a random-access memory, because the ordering of memory is not guaranteed (i.e. some memories have bit-planes, some memories have banks, some memories are big/little-endian, etc.) The only thing you can say about a memory (and thereby about a RAM-word machine, or about an array) is what exists at address 0 of it, at address 1 of it, etc. It would be incorrect to describe address 0 as "the first byte" of memory, as this would impose a canonical iteration order.
Speaking of things that make code needlessly difficult to verify, we really need to start insisting on correct (zero-based) indexing in academic algorithms.
> Rename _Suffix_fn to _Fx, again following the papers. Note that in the usage below, there is a semantic change: _Suffix_fn stored 0-based values, while _Fx stores 1-based values. This undoes a micro-optimization (_Suffix_fn avoided unnecessarily translating between the 0-based and 1-based domains), but with the increased usage of f in the Rytter correction, I wanted greater correspondence with the published algorithms in order to verify the implementation.
And
> Rename 1-based _Idx to 1-based _Jx. (While the code was correct, I found it confusing that _Idx was 0-based in other loops but 1-based in this loop.)
Naming is one of the most strangely difficult aspects of programming, but if there is one rule it's be consistent. I really hate code that uses a variable name to mean one thing in one place and other somewhere else. I just learned what that meant! Why change it?! And yet this is common, and sometimes a comment about it in code review might get a response like "the code works, why does it matter?" This. This is why it matters. It causes confusion, which makes code difficult to verify, which causes bugs.