Hi there, upb author here, thanks for the shout-out!
The stock protobuf implementation is pretty close to optimal given what it does. Where upb is faster, it is faster by doing less.
For example, I can beat protobuf if you don't write the parsed data into a tree structure (or only write a few fields). I can beat it if your input has unknown fields and you don't care about preserving them. But if what you need is a tree structure that contains 100% of the input data, protobuf is hard to beat speed-wise (though there are still a few tricks, like arena allocation, that can beat it given some usage patterns).
The stock protobuf implementation is pretty close to optimal given what it does. Where upb is faster, it is faster by doing less.
For example, I can beat protobuf if you don't write the parsed data into a tree structure (or only write a few fields). I can beat it if your input has unknown fields and you don't care about preserving them. But if what you need is a tree structure that contains 100% of the input data, protobuf is hard to beat speed-wise (though there are still a few tricks, like arena allocation, that can beat it given some usage patterns).