Solving the ABA Problem in Rust with Tagged Pointers

LegionMammal978 · on Feb 14, 2025

The slotmap crate [0], which stores an unordered object list in a Vec with persistent indices, implements a similar tagging scheme to avoid ABA problems. It's neat to see how this can be applied to raw pointers as well.

[0] https://docs.rs/slotmap/latest/slotmap/

damon_dam · on Feb 14, 2025

A much better solution to the ABA problem is to use load-linked and store-conditional instructions [1] on platforms where they are available (ARM, RISC-V, etc.)

Unfortunately x86 is not one of them. That's when CMPXCHG16B on x86-64 comes in handy, as per the linked article. But if you're going to write library code in 2025, it should at least support x86 and ARM.

Which brings up a more interesting argument IMO: platform fragmentation. Rust has approximately 1.5 advantages over C/C++. To displace them in a reasonable time, it probably needed 4 or 5. Its most significant contribution for the foreseeable future is platform fragmentation.

[1] https://en.wikipedia.org/wiki/Load-link/store-conditional

explodingwaffle · on Feb 14, 2025

Who says you can’t make a library that does both? Rust makes it pretty easy to conditionally compile code based on architecture.

It could even be possible to make some sort of “ABA primitive” and use that for these sort of data structures. This could well exist: I’ve not looked. These sorts of things really aren’t that common in my experience.

On LR/SC: to any atomics experts listening, isn’t it technically “obstruction-free” (as per the Wikipedia definitions at least) rather than lock-free? (though in practice this makes basically no difference and still counts as lock-free in the C++ (and Rust) sense) Just something that stuck out last time I got sucked into this rabbit hole.

adrian_b · on Feb 15, 2025

Compare-and-swap and LR/SC are not per se "obstruction-free" or "lock-free".

They are the primitives with which you can implement shared data structures that are "lock-free" or "obstruction-free".

Anything that can be implemented with compare-and-swap can be implemented with LL/SC, and vice-versa.

The only difference between compare-and-swap and LL/SC is how they detect that the memory word has not been modified since the previous reading.

Compare-and-swap just compares the current value with the old value, while LL/SC uses a monitoring circuit implemented in the cache memory controller, which records if any store has happened to that memory location.

Therefore LL/SC is free of the ABA problem, while the existence of the ABA problem has been recognized already since the first moment when compare-and-swap has been invented.

Compare-and-swap has been invented by IBM, who has introduced this instruction in IBM System/370, in 1973. Simultaneously with compare-and-swap, IBM has introduced the instruction compare-double-and-swap, for solving the ABA problem by using a version counter.

Intel has added compare-and-swap renamed as CMPXCHG to 80486 in 1989, and compare-double-and-swap, renamed as CMPXCHG8B, to Pentium in 1993. On x86-64, CMPXCHG8B, i.e. compare-double-and-swap, has become CMPXCHG16B.

LL/SC has been invented in 1987, in the S-1 Advanced Architecture Processor, at the Lawrence Livermore National Laboratory. Then it has been added in 1989 to MIPS II, from where it has spread several years later to most RISC ISAs.

Using either compare-double-and-swap or LL/SC is equivalent, because both are free of the ABA problem.

However there are many cases when the optimistic access to shared data structures that can be implemented with compare-and-swap or LL/SC results in lower performance than access based on mutual exclusion or on dynamic partitioning of the shared data structure (both being implemented with atomic instructions, like atomic exchange or atomic fetch-and-add).

This is why the 64-bit ARM ISA, Aarch64, had to correct their initial mistake of providing only LL/SC, by adding a set of atomic instructions, including atomic exchange and atomic fetch-and-add, in the first revision of the ISA, Armv8.1-A.

damon_dam · on Feb 14, 2025

> Who says you can’t make a library that does both?

Of course you can. I just meant that the linked article didn't.

> On LR/SC: to any atomics experts listening, isn’t it technically “obstruction-free” (as per the Wikipedia definitions at least) rather than lock-free?

The better criterion IMO is loop-free, which makes it a little easier to understand. Consider the following spin-locking code (with overabundant memory barriers):

   do { p = *a; } while (p == 0x1 || !atomic_compare_and_swap(p, 0x1, a));
   memory_barrier();
   // do stuff that looks at *p
   q->next = p;
   memory_barrier();
   atomic_store(q, a);

Here's the equivalent LL/SC version:

   do {
     p = ll(a);
     memory_barrier();
     // do stuff that looks at *p
     q->next = p;
     memory_barrier();
   } while (!sc(q, a));

The pointer-tagging version is also obviously not loop-free. Which is faster, in which cases, and by how much?

The oversimplified answer is that LL/SC is probably slightly faster than spin-locking on most platforms and cases, but pointer-tagging might not be.

gpderetta · on Feb 14, 2025

My understanding is that all architectures that matter do idiom recognition for ll/sc to guarantee forward progress when ll/sc is used to implement CAS and other common lock-free and wait free patterns, at least as a fallback.

dzaima · on Feb 14, 2025

e.g. this is the guarantee in RISC-V: https://github.com/riscv/riscv-isa-manual/blob/70040578316b9...

saghm · on Feb 14, 2025

> Which brings up a more interesting argument IMO: platform fragmentation. Rust has approximately 1.5 advantages over C/C++. To displace them in a reasonable time, it probably needed 4 or 5. Its most significant contribution for the foreseeable future is platform fragmentation.

I'm honestly not sure what any of this means. How are you counting "advantages", and how did you come up with the requisite number being 4 or 5? It's not clear to me why all advantages would be equal in magnitude. Maybe this is what you intended to convey by not using whole numbers, although it seems like trying to estimate how advantages compare like this in a way that was precise enough to be informative would be difficult, to say the least.

To be clear, I don't disagree with you that C/C++ are not in any immediate risk of being displaced (and I'd go further and argue that C and C++ are distinct enough in how they're used that displacing one wouldn't necessarily imply displacing the other as well). I just don't think I've ever seen things quantified in this way before, and it's confusing enough to me that I don't want to discount the possibility that I'm misunderstanding something.

damon_dam · on Feb 14, 2025

> How are you counting "advantages", and how did you come up with the requisite number being 4 or 5?

The vagueness was intentional. There is of course no homogeneous way of combining advantages into a cardinal measure. It's just a rhetorical device.

The point is that it falls short of the amount needed. Another, more subtle point is that I didn't count disadvantages. The argument applies even if you think Rust doesn't have any.

wyager · on Feb 14, 2025

> Rust has approximately 1.5 advantages over C/C++

I can think of at least 3 that deserve a whole integral bullet point:

* ADTs

* Hindley-Milner/typeclass type system

* Lifetimes and affine types

And a bunch of minor ones that count for something like 0.1-0.5 of an advantage.

I would guess we're on track for a majority-of-new-code switchover point somewhere around 2030-2035.

milesrout · on Feb 14, 2025

Maybe he meant net advantages. Rust has some advantages (ADTs, no constructors, no overloading, no preprocessor, well thought out set of string types) and some disadvantages (compile times, debug performance, heaps of wrapper cruft to step through when debugging, explicit lifetimes, pervasive type inference, ad-hoc polymorphism, no ABI stability, poor support for dynamic linking, anemic standard library, pervasive one-element-thinking encouraging millions of separate allocations all over the place, culture of simplifying code by adding copying and reallocation everywhere because the borrow checker makes sharing verbose and difficult ("just use clone"), general complexity, dramatic and dogmatic community with a penchant for social media bullying (eg serde, actix)).

What is an advantage or a disadvantage depends on who you are. Personally, I find the disadvantages (as I see them) outweigh the advantages. Others believe that some of those disadvantages are actually good things. And some of them depend on whether you are comparing to C or C++: compared to C, Rust has many disadvantages, but C++ has many of the same disadvantages, and often is worse.

I don't think Rust will ever have more new code being written in it than C and C++ combined. I doubt it will overtake either individually.

amluto · on Feb 15, 2025

> compile times

C++, especially “modern” C++, can have truly horrible compile times. The lack of a usable module system makes it worse.

Variadic templates helped some. Fold expressions will help some. I expect concepts and if constexpr and such to help some: relying on SFINAE never did the compiler any favors.

But, in general, C++ template programming is quite nasty for compile times.

milesrout · on Feb 15, 2025

Yes as I said "And some of them depend on whether you are comparing to C or C++: compared to C, Rust has many disadvantages, but C++ has many of the same disadvantages, and often is worse."

C++ compile times can be as bad as Rust's or worse. It does depend on how you structure things and how much you use templates. You don't need to have 10 layers of "zero cost abstraction" around everything. But that is the "proper" way to use it and what is now taught, so there is a lot of it out there and the standard library does it pervasively.

Variadic templates don't fix much compared to just writing straight line code or doing a tiny bit more work at runtime using stdarg.h.

amluto · on Feb 15, 2025

Variadic templates help quite a lot compared to what older libraries did or even still do to approximate them. Check out boost::variant’s shenanigans — I recall a single-line file that includes <boost/variant.hpp> and doesn’t even instantiate the variant template taking several seconds to compile.

kobebrookskC3 · on Feb 15, 2025

all those advantages and disadvantages you list are basically insignificant compared to rust being memory safe by default and c/c++ not. governments aren't bugging projects about any of those, but are about memory safety.

milesrout · on Feb 15, 2025

You might think so. I strongly disagree. I think Rust's supposed memory safety advantages are a distraction. If people really cared about that above all else they would use a memory-safe language, which Rust is not, such as Java or Python or Haskell or something. There are tons.

Governments aren't "bugging" anyone about memory safety and if they were, they wouldn't want you to use Rust. Rust isn't memory-safe.

kobebrookskC3 · on Feb 15, 2025

> If people really cared about that above all else they would use a memory-safe language, which Rust is not, such as Java or Python or Haskell or something.

people already use languages like java or python or js in most cases! it's only in rare cases it made sense to write stuff in c or c++. now with rust, that niche becomes even smaller.

> Governments aren't "bugging" anyone about memory safety

idk this report sounds like bugging from governments (plural) to me: https://www.cisa.gov/sites/default/files/2023-12/The-Case-fo...

no one who isn't being obtuse would say rust isn't memory safe. the linked report also deems rust a memory safe language.

oneshtein · on Feb 15, 2025

Safe Rust is memory-safe enough for practical needs, while maintaining top performance.

menaerus · on Feb 15, 2025

Is it still safe even if it depends on the dependencies which are not safe?

whytevuhuni · on Feb 15, 2025

Note that memory-safe languages like Java and Python will also depend on libraries which are not safe, e.g. in their implementation of the interpreter/VM, standard library, native libraries like numpy, etc.

In practice unsafe Rust tends to be less than 1% of the code (or around 5% in low-level things like kernels), which is a similar ratio of trusted/untrusted code. The top-level application code is generally expected to have 0%.

milesrout · on Feb 15, 2025

This statistic is based on the lines of code bounded by the "unsafe" keyword but that is not the full picture. Unsafe code relies for its safety on the logical correctness of safe code. In theory, as "proper practice", the trust boundary should be the crate. In other words, the safety of a particular "unsafe" block should be verifiable by checking the correctness of code within the same crate. In practice, this is not true: much unsafe code relies for safety on correctness properties in external code, especially from the standard library, like the fact that "Vec" stores elements contiguously in memory in order of index.

That isn't to say the concept is useless. But it is not the case that you just need to inspect code marked "unsafe". A change to code not marked "unsafe" can break the invariants and assumptions that unsafe blocks elsewhere rely on.

damon_dam · on Feb 15, 2025

Agreed. I would add to your argument as follows.

If you stick with "21st Century C++" [1], you never have to use a pointer. At some point, there will be (say) a -W20 flag that produces a warning if you use a pointer or other potentially-unsafe idiom without marking it with a [[20th_century]] attribute.

That attribute is equivalent to Rust's unsafe keyword. At that point, the safety difference will boil down to opt-in versus opt-out. But you can already opt-in today by policing yourself.

That's quite far from the radical language difference it's made out to be.

[1] https://news.ycombinator.com/item?id=42946321

whytevuhuni · on Feb 15, 2025

I've used both, and I'd say it's still quite a difference.

There's many examples, but one I stumble on often is functions that accept a std::string_view, and return a part of that string. It's quite dangerous to make it return a std::string_view as well, especially given how easy it is to create temporaries (e.g. get_word(sentence1 + sentence2) would return a string_view to temporary data). Similar story with functions that return element references out of collections, functions that choose an element out of two, etc.

In Rust on the other hand, you can get very reckless with very long chains of functions that return references out of other references, especially with parsers, iterators, and lambdas/closures.

The Send/Sync traits for thread safety are also unmatched in C++, tagged union pattern matching will also heavily restrict invalid states, and so will Rust's affine(ish) types, which force a single owner for resources, as opposed to C++'s somewhat hard-to-use move semantics.

The standard library also prefers safety over speed by default, e.g. with std::string_view's indexing being UB, std::optional's dereference being UB, and so on.

damon_dam · on Feb 15, 2025

I don't really disagree with what you wrote. To expand upon my argument, the issues you mention can also be self-policed, or maybe enforced with higher safety levels of the presumptive -W20 switch:

- No reference variables in user code (reference parameters are OK.)

- Any classes that hold reference member variables can only be instantiated on the stack. The objects the references refer to must either be stack objects or reference parameters to the instantiating function.

- For durable links to sub-objects, use handles (tuples of std::shared_ptr to the main object plus a sub-object link or identifier.)

- Alternatively, require externally-accessible sub-objects to be held in shared_ptr's by the main object. They can then have superset lifetimes when needed.

I'll stick to the memory safety topic, but similar solutions exist for some of the other topics you mention, e.g. by avoiding std::optional.

whytevuhuni · on Feb 15, 2025

That's right, unsafety extends to the whole module, and anything else that relies on the module's internal implementation. Generally all the way up to the abstraction's public API, whatever that abstraction is (a Vec, a Mutex, etc).

But it's nevertheless still not different from other languages that are deemed memory safe.

kobebrookskC3 · on Feb 16, 2025

modules are meant to be the boundary for a safe abstraction, but sometimes you get fun holes like https://github.com/rust-lang/rust/issues/128872 where you can implement Copy for a type in a different module in the same crate, even if the fields are private

whytevuhuni · on Feb 16, 2025

I'll admit, that's a pretty funny bug. Thankfully it seems like the Rust devs are pretty keen on fixing it somehow.

kobebrookskC3 · on Feb 15, 2025

seems pretty safe in practice https://security.googleblog.com/2024/09/eliminating-memory-s...

menaerus · on Feb 15, 2025

I know it's an unpopular opinion but tbh those Google security blogs always read more as a marketing material than engineering effort. Why I think so it's because statistics always has two side of the coins - if you want to make something look better or worse than it really is, it's trivial to do so. There's simply not enough data or methodology or different angles taken for me personally to take these blogs as a very relevant source of information but YMMV. We are all different.

oneshtein · on Feb 15, 2025

Any and all safe code depends on unsafe code to do something, including Java and Python. Yep, unsafe code may contains bugs, but their surface is much smaller, so, at same ratio of bugs to code, numbers of bugs is much smaller.

menaerus · on Feb 16, 2025

If you're open to a bit different and perhaps thought provoking perspective let me say that for your hypothesis to be true we would need to assume that the distribution of memory-safety bugs is uniform across the codebase. From my experience this is not true, it usually follows uni-modal or at most bi-modal distribution, so it follows that the surface is not much smaller.

oneshtein · on Feb 15, 2025

* unsafe blocks for unsafe code

* Memory safety in safe code

* Fearless concurrency.

* match operator and destructuring

robocat · on Feb 15, 2025

Would C, could C, implement those features?

amluto · on Feb 14, 2025

The tagged pointer example says, in the safety notes:

> Pointer Dereferencing: Pointers are only dereferenced after successful CAS operations

I think a rather stronger argument should be needed. But the code doesn’t even follow what the notes say:

            let current = self.head.load(Ordering::Acquire);
            if current.ptr.is_null() {
                return None;
            }

            let next = unsafe { (*current.ptr).next };

No CAS. And this sure looks wrong — the current head could be popped and freed before being dereferenced.

edit: put the correct code in. Thanks, LegionMammal978

LegionMammal978 · on Feb 14, 2025

Those lines don't actually dereference anything, except for (*new_node) which is owned by the function. Immediately following them is the CAS to ensure that current is still at self.head. If it gets deallocated, then the CAS fails, and it tries again with the new self.head.

amluto · on Feb 14, 2025

Whoops, I pasted entirely the wrong code!

            let current = self.head.load(Ordering::Acquire);
            if current.ptr.is_null() {
                return None;
            }

            let next = unsafe { (*current.ptr).next };

On the one hand, one might say “well, it’s fine if we read the wrong value here if the old head was popped in the mean time”, but it’s still UB, and it can even segfault if the old head is freed and unmapped.

empath75 · on Feb 14, 2025

Miri does catch a problem here:

> error: Undefined Behavior: memory access failed: expected a pointer to 8 bytes of memory, but got 0x2f5f40[noalloc] which is a dangling pointer (it has no provenance) --> src/main.rs:263:33 | 263 | let next = unsafe { (*current.ptr).next }; | ^^^^^^^^^^^^^^^^^^^ memory access failed: expected a pointer to 8 bytes of memory, but got 0x2f5f40[noalloc] which is a dangling pointer (it has no provenance) | = help: this indicates a bug in the program: it performed an invalid operation, and caused Undefined Behavior = help: see https://doc.rust-lang.org/nightly/reference/behavior-conside... for further information = note: BACKTRACE on thread `unnamed-2`: = note: inside `LockFreeStack::pop` at src/main.rs:263:33: 263:52

amluto · on Feb 14, 2025

I think this is an unrelated problem: https://doc.rust-lang.org/std/ptr/index.html#provenance

Fixing it cleanly seems like it might need a new AtomicPtrAndU64 or the ability to have an atomic 128-bit struct, and I have no idea whether the latter is even conceptually compatible with miri.

Phelinofist · on Feb 14, 2025

Can't the ABA problem circumvented quite easily by making the shared state immutable? Every modification will create a new instance, like so (in JAVA):

  class State {
    private int value;
    private String anotherValue;

    State(int value, String anotherValue) {
      this.value = value;
      this.anotherValue = anotherValue;
    }

    State changeValue(int newValue) {
      return new State(newValue, anotherValue);
    }

    // same for anotherValue
  }
  
  AtomicReference<State> state = new AtomicReference<>(new State(1, "a");
  
  State oldState;
  State newState;
  for(;;) {
    oldState = this.state.get();
    newState = oldState.changeValue(2);
    if(this.state.compareAndSet(oldState, newState)) {
      break;
    }
  }

Since object identity is used for comparison, and every transition is a new object, the ABA problem cannot happen because - even if the state are equal - the comparison will fail.

wat10000 · on Feb 14, 2025

The problem isn’t the mutability of the State object, but the mutability of the pointer to it.

Your approach works in Java, but it only works if you have a garbage collector that can ensure the memory is not freed until there are no extant pointers to it, or you never free the objects. The ABA problem occurs when one writer reads the old pointer, another writer swaps out the pointer and frees the old object, a third writer allocates a new object that happens to be placed at the exact same location as the first one, then the first writer does its compare-and-swap.

whizzter · on Feb 14, 2025

In Java if you hold an reference to an "old" object then that will be kept alive until all references are exhausted, thus an identity comparison comparing pointers will never succeed as long as someone holds an reference (that is compared to).

In languages with more manual management, the pointer itself if used can be reused and succeed in bad cases (though from my cursory knowledge of Rust this seems to indicate quite a bit of unsafe behaviours).

gpderetta · on Feb 14, 2025

In any language with GC, a lot of complex concurrency problems become much simpler (but of course now you have to deal with the GC).

catlifeonmars · on Feb 15, 2025

What was surprising to me here is that two threads share the same stack. Is this a rust thing or is this common in programming languages?

wrs · on Feb 15, 2025

This is about “stack”, the general purpose data structure, not “stack”, the place in memory you put local variables.

drivebyhooting · on Feb 14, 2025

For 32bit or 64bit tagged pointer is there a big risk of version overflow?

convolvatron · on Feb 14, 2025

oh certainly. this is one of those slightly unsatisfactory consistency stories where its not a closed answer, its just arbitrarily unlikely that a conflict will arise. we've already got a _really_ small window where one thread has the old pointer and the other thread frees it, it goes through the heap, and is assigned the same address as the old version. now exponentiate the likelihood of that happening with the size of the tag space.

amluto · on Feb 15, 2025

> we've already got a _really_ small window where one thread has the old pointer and the other thread frees it, it goes through the heap, and is assigned the same address as the old version

Plenty of allocator implementations will do this, very efficiently, almost deterministically reusing the same address, if you free and then allocate something of similar size.

gpderetta · on Feb 14, 2025

You just need to make it less likely than random RAM corruption:)

Someone · on Feb 14, 2025

There’s a risk, but far from a big one. You’d need exactly 2³² or even 2⁶⁴ pointer updates (or a multiple of it) between the time you read the pointer and the time you try writing it for running into problems. That’s not likely to be a problem even on a 1000-core CPU.

So, technically, this isn’t solving the ABA problem, it’s just hiding it very, very well.