The table you show is not actually a linked list. Traversing a table with self references in order requires a recursive query, and doesn’t benefit from indexes for ordering.
There is also the issue of how many writes you do: finding a fraction between two others let’s you do just one INSERT or UPDATE (effectively the same thing in MVCC); but rewiring a linked list implies altering two rows. That may be okay but it’s not ideal.
For the query to retrieve the items back in order, logically it goes like:
select * from (items) order by `prev` --`next` doesn't help
-- also hope that NULL is at the beginning
So it's no better than Approach 1. in the article, but is actually worse with the extraneous column in storage and head scratching for the next maintainer of the system.
Exercise for the reader: what happens when item id #3 is moved in between item id #1 and #2, how many update statements are needed?
No, this query doesn't retrieve the items in the correct order in general. It only works for the particular data in GP because the row ordering coincides with the user's ordering there. Try ordering by `prev` on the table below, which is a different valid representation of "steal underpants" -> "???" -> "profit".
id | text | prev | next
1 | "profit" | 2 | NULL
2 | "???" | 3 | 1
3 | "steal underpants" | NULL | 2
Yep, that's fine - highlights the complexity of implementing the ordering in a linked list. The fundamental thing is that both columns aren't needed, only one column is needed to determine the order.
with recursive todo_list_sorted as (
select
*
from
todo_list tl1 where prev_id is null
union all
select
tl1.*
from
todo_list tl1
join
todo_list_sorted tl2 on tl1.prev_id = tl2.id
)
select * from todo_list_sorted
Because you need to optimize for reads. This setup doesn't make it easy to, say, select only the first 5.
I'd say, unless the list is really large, just change all the "pos" values in a concatenated set of update queries (transactioned of course). Also, don't forget the unique constraint on whatever makes the list unique + the pos.
If you have a hashtable-based index on the PK, lookup by it is o(1) so getting a list of first k elements is o(k).
So what you'll end up with is:
Insert/Update/Delete - o(1) (insert/update/delete new row, update up to 2 other rows -- each individual operation is o(1) and there're fixed number of them)
Get list of first k - o(k)
Can't be better than this?
Elegant, fast, robust, not inventing a wheel?
It's less about whether the index is fast, and more about performing self joins (often hierarchical depending upon the DB engine and what's supported) with arbitrary count. When it comes to DB queries, n queries is not really acceptable for a page of n items.
That's not how databases are generally optimized. You're essentially doing random access, extracting a single tuple each time, in a datastructure optimized for locality. Think of it more like a memory-mapped region.
Nowhere in that article there was anything about hardware-related issues. I believe, what you are talking about has to do with how HDD store db files on the disk, and how it's (much) faster to sequentially read from such disk vs. seek for every record. Hence all sort of optimizations that DBs do - like b+trees, etc. If your dataset fits in memory - it's a nonissue, the engine will never reach for the disk anyway. Even if it doesn't fit in memory (in which case I would start all optimizations by adding more memory, if possible), it's much less of an issue with SSDs. But again - article has more science-ey tone than applied/practical. In some scenarios (large dataset on HDDs) random seek may be an issue.