why not use a linked list made of self references? id | text | prev | next 1 | "...

solidsnack9000 · on March 21, 2018

The table you show is not actually a linked list. Traversing a table with self references in order requires a recursive query, and doesn’t benefit from indexes for ordering.

There is also the issue of how many writes you do: finding a fraction between two others let’s you do just one INSERT or UPDATE (effectively the same thing in MVCC); but rewiring a linked list implies altering two rows. That may be okay but it’s not ideal.

bfung · on March 21, 2018

For the query to retrieve the items back in order, logically it goes like:

   select * from (items) order by `prev`  --`next` doesn't help
   -- also hope that NULL is at the beginning

So it's no better than Approach 1. in the article, but is actually worse with the extraneous column in storage and head scratching for the next maintainer of the system.

Exercise for the reader: what happens when item id #3 is moved in between item id #1 and #2, how many update statements are needed?

dzamo_norton · on March 21, 2018

No, this query doesn't retrieve the items in the correct order in general. It only works for the particular data in GP because the row ordering coincides with the user's ordering there. Try ordering by `prev` on the table below, which is a different valid representation of "steal underpants" -> "???" -> "profit".

  id | text               | prev | next
   1 | "profit"           | 2    | NULL
   2 | "???"              | 3    | 1
   3 | "steal underpants" | NULL | 2

bfung · on March 22, 2018

Yep, that's fine - highlights the complexity of implementing the ordering in a linked list. The fundamental thing is that both columns aren't needed, only one column is needed to determine the order.

FLUX-YOU · on March 21, 2018

Pagination will also be complicated if not spaghetti code.

cthor · on March 21, 2018

How do you make a SELECT that gets the items in the correct order?

The point of the solution here is so you can simply go `SELECT * FROM list ORDER BY pos` and have the correct order.

dzamo_norton · on March 21, 2018

  with recursive todo_list_sorted as (
  	select
  		*
  	from
  		todo_list tl1 where prev_id is null
  	union all
  	select
  		tl1.*	
  	from
  		todo_list tl1
  	join
  		todo_list_sorted tl2 on tl1.prev_id = tl2.id
  )
  select * from todo_list_sorted

schrodinger · on March 21, 2018

You don't. You order it on the client!

kodablah · on March 21, 2018

Because you need to optimize for reads. This setup doesn't make it easy to, say, select only the first 5.

I'd say, unless the list is really large, just change all the "pos" values in a concatenated set of update queries (transactioned of course). Also, don't forget the unique constraint on whatever makes the list unique + the pos.

georgekronberg · on March 21, 2018

If you have a hashtable-based index on the PK, lookup by it is o(1) so getting a list of first k elements is o(k). So what you'll end up with is: Insert/Update/Delete - o(1) (insert/update/delete new row, update up to 2 other rows -- each individual operation is o(1) and there're fixed number of them) Get list of first k - o(k) Can't be better than this? Elegant, fast, robust, not inventing a wheel?

kodablah · on March 21, 2018

It's less about whether the index is fast, and more about performing self joins (often hierarchical depending upon the DB engine and what's supported) with arbitrary count. When it comes to DB queries, n queries is not really acceptable for a page of n items.

always_good · on March 21, 2018

That's not how databases are generally optimized. You're essentially doing random access, extracting a single tuple each time, in a datastructure optimized for locality. Think of it more like a memory-mapped region.

georgekronberg · on March 21, 2018

Nowhere in that article there was anything about hardware-related issues. I believe, what you are talking about has to do with how HDD store db files on the disk, and how it's (much) faster to sequentially read from such disk vs. seek for every record. Hence all sort of optimizations that DBs do - like b+trees, etc. If your dataset fits in memory - it's a nonissue, the engine will never reach for the disk anyway. Even if it doesn't fit in memory (in which case I would start all optimizations by adding more memory, if possible), it's much less of an issue with SSDs. But again - article has more science-ey tone than applied/practical. In some scenarios (large dataset on HDDs) random seek may be an issue.