More

petergeoghegan · 2026-01-25T15:42:12 1769355732

> The section on multi-column indexes mirrors how I was taught and how I’ve generally handled such indexes in the past. But is it still true for more recent PG versions?

No, it isn't. PostgreSQL 18 added support for index skip scan:

https://youtu.be/RTXeA5svapg?si=_6q3mj1sJL8oLEWC&t=1366

It's actually possible to use a multicolumn index with a query that only has operators on its lower-order columns in earlier versions. But that requires a full index scan, which is usually very inefficient.

dlt · 2026-01-25T16:40:52 1769359252

Hi Peter, author here. Thanks for weighing in with the extra context on index skip scan, and huge thanks for adding this to Postgres.

I’m going to revise the multi-column index section to be more precise about when leftmost-prefix rules apply, and I’ll include a note on how skip scan changes the picture

petergeoghegan · 2025-09-22T16:31:57 1758558717

Can you provide more detail/a reference?

I've done extensive work on improving the Postgres B-Tree code, over quite a number of releases. I'm not aware of any problems with high-insert workloads in particular. I have personally fixed a number of subtle issues that could lead to lower space utilization with such workloads [1][2] in the past, though.

if there's a remaining problem in this area, then I'd very much like to know about it.

[1] https://www.youtube.com/watch?v=p5RaATILoiE [2] https://speakerdeck.com/peterg/nbtree-arch-pgcon

dangoodmanUT · 2025-09-23T02:41:00 1758595260

In a previous use case, when using postgres as a WAL-like append only store, I noticed that indexes would get massive. Then, after a while, they'd magically shrink. I had eventually switched to an API on top of Badger (golang KV), which afforded me an order of magnitude lower latency at ~30% of the resources IIRC. I'm sure there might have been some tuning I could have done to improve it.

I've also heard similar behaviors exhibited from other folks who had similar high-write workloads on postgres.

Sorry, I don't have anything super tangible to provide off the top of my head, or metrics/code I can share to recreate! It was also a project that required a lot of data to recreate the setup for.

petergeoghegan · 2025-09-23T04:35:35 1758602135

> In a previous use case, when using postgres as a WAL-like append only store, I noticed that indexes would get massive. Then, after a while, they'd magically shrink.

It's possible to recycle pages within indexes that have some churn (e.g., with workloads that use bulk range deletions). But it's not possible for indexes to shrink on their own, in a way that can be observed by monitoring the output of psql's "\di+" command. For that you'd need to REINDEX or run VACUUM FULL.

dangoodmanUT · 2025-09-23T17:14:02 1758647642

I may not be remembering fully, maybe the indexes never shrunk but the tables did in size.

Is there no way to automatically clean up indexes then?

petergeoghegan · 2025-09-23T17:41:44 1758649304

> I may not be remembering fully, maybe the indexes never shrunk but the tables did in size.

That only happens when it is possible to give back space to the OS filesystem using relation truncation in the first place -- which isn't all that common (it's generally only seen when there are bulk range deletions that leave lots of contiguous empty space at the end of a table/heap structure). But you said that this is an append-only workload.

This behavior can be disabled by setting the vacuum_truncate table storage parameter to "off". This is useful with workloads where relation truncation is disruptive (truncation needs to acquire a very heavyweight table lock).

> Is there no way to automatically clean up indexes then?

What I meant was that indexes do not support relation truncation. It follows that the amount of space used for an index (from the point of view of the OS) cannot ever go down, barring a REINDEX or a VACUUM FULL.

This does not mean that we cannot reuse space for previously freed/deleted pages (as long as we're reusing that space for the same index). Nor does it mean that "clean up" isn't possible in any general sense.

jashmatthews · 2025-09-23T05:25:04 1758605104

Does vacuum not release free pages at the end of an index file in the same way it does for the heap?

petergeoghegan · 2025-09-23T16:32:54 1758645174

No, it does not

petergeoghegan · 2025-07-30T16:59:26 1753894766

Recent versions of clangd show pahole-like struct size/alignment padding overhead annotations. I find this built-in support a lot more convenient than using pahole.

petergeoghegan · on June 23, 2024

"So far, so good. What's the problem here? The DDL statement can simply wait patiently until it's able to acquire its ACCESS EXCLUSIVE lock, right? The problem is that any other statements that require a lock on the users table are now queued behind this ALTER TABLE statement, including other SELECT statements that only require ACCESS SHARE locks."

This is the most important individual point that the blog post makes in my view. Lots of Postgres users are left with the wrong general idea about how things in this area work (e.g., the locking implications of autovacuum), just because they missed this one subtlety.

I'm not sure what can be done about that. It seems like Postgres could do a better job of highlighting when an interaction between two disparate transactions/sessions causes disruption.

Fire-Dragon-DoL · on June 23, 2024

Yes this is the keypoint. There is a famous blogpost on how to address this which boils down to acquiring a lock explicitly with a very short timeout, and only after acquiring the lock the alter statement is executed. Do this in a loop and essentially it prevents the disaster, but it might fail the migration if unable to take the lock (just repeat the migration)

Jcowell · on June 24, 2024

> the locking implications of autovacuum

I thought the only case this would be a worry is if the auto vacuum is to prevent wraparound

petergeoghegan · on April 17, 2024

> It is certainly possible that the plans are similar, and that improvements to the execution engine are being measured. The join order benchmark was designed to test optimizer quality.

I don't think that it's possible to test optimizer quality in isolation -- not really, not if it's to be in any way practical. Many (most?) individual improvements to the optimizer are hopelessly intertwined with executor enhancements. This is usually fairly obvious, because the same commit changes both the optimizer and the executor together. Sometimes it's much less obvious, though, because its more a case of the optimizer and executor coevolving.

It's probably still the case that a lot of the improvements seen here are pure executor enhancements, but I can't say that I'm very confident about that. (As the main person behind those B-Tree IO savings you might expect me to have more confidence than that, but I find it horribly difficult to tease these issues apart while keeping the discussion high level/relevant.)

petergeoghegan · on Dec 24, 2023

> Your code can no longer be compiled by a compiler that does not support these special C dialects.

What compiler might that be?

It's true that MSVC doesn't have an equivalent of -fno-strict-aliasing, but that's because it just doesn't apply optimizations that assume strict aliasing in the first place. Admittedly the picture around signed overflow is more complicated, but it's essentially the same story.

> Using these C dialects is a perfectly sensible thing to do, as long as you understand that this is what you are doing.

I suppose that they technically are C dialects, but it seem more than a bit absurd to put it like that -- at least to me. By that standard the Linux kernel isn't written in C. And Firefox isn't written in C++. Postgres also wouldn't count as according-to-hoyle C under your definition. Even though Postgres compiles when -fwrapv and -fno-strict-aliasing are removed, and still passes all tests.

The implication of what you're saying seems to be that all of these open source projects each independently decided to "go there own way". I find it far more natural to explain the situation as one of GCC diverging when it decided to make -fstrict-aliasing the default around 15 years ago.

nikic · on Dec 24, 2023

> It's true that MSVC doesn't have an equivalent of -fno-strict-aliasing, but that's because it just doesn't apply optimizations that assume strict aliasing in the first place. Admittedly the picture around signed overflow is more complicated, but it's essentially the same story.

Of course it's fine if the compiler doesn't implement this as an option, but rather as the default behavior -- as long as this is actually a documented guarantee, rather than "we just haven't implemented those optimizations yet, we might start exploiting this UB at any point in the future". I'm not familiar with what guarantees MSVC documents in this area.

> By that standard the Linux kernel isn't written in C.

I think that in a very real sense, it isn't. The kernel is written in standard C plus a few hundred GCC extensions, of which additional compiler options are but a small part. It took very extensive effort to make the kernel compile with clang (which already was mostly gcc-compatible, but nowhere near enough for the kernel's level of extension use).

> Even though Postgres compiles when -fwrapv and -fno-strict-aliasing are removed, and still passes all tests.

I think there is a lot of difference between "requires signed integer overflow to be well-defined because we explicitly rely on it" and "disable optimization based on signed integer overflow as a hardening measure, in case we got this wrong somewhere".

> The implication of what you're saying seems to be that all of these open source projects each independently decided to "go there own way".

You kind of make it sound like use of these options is common. To the best of my knowledge, this is not the case, and these projects are rare exceptions, not the rule. So, yes, they decided to go their own way (which is perfectly fine.)

petergeoghegan · on Oct 10, 2023

> Postgres has commercial backing and inertia, but it seems like we are lacking a pipeline of (proficient) C developers.

It's difficult to get into Postgres development, but that has little to do with C expertise. The hard part is having the right domain expertise. Building up general familiarity with the system just takes a long time.

petergeoghegan · on Sept 25, 2023

> insert...on conflict is a fine way to do upserts on older PostgreSQL versions. For modern ones (15+) there is a better alternative [SQL Standard: MERGE]

This is incorrect. To quote the Postgres MERGE docs:

"When MERGE is run concurrently with other commands that modify the target table, the usual transaction isolation rules apply; see Section 13.2 for an explanation on the behavior at each isolation level. You may also wish to consider using INSERT ... ON CONFLICT as an alternative statement which offers the ability to run an UPDATE if a concurrent INSERT occurs. There are a variety of differences and restrictions between the two statement types and they are not interchangeable."

MERGE can give duplicate violation errors when concurrent inserts take place (at the default READ COMMITTED isolation level). MERGE is no more capable of avoiding these sorts of race conditions than totally naively application code. In short, upserting is precisely what you're not supposed to use MERGE for -- ON CONFLICT is expressly designed for that use case.

kardianos · on Sept 25, 2023

I agree. In SQL Server, merge comes with a number of quirks that cause me more pain when I try to use it. The ONLY reason I use MERGE today, isn't to merge, but to include non-inserted columns in the OUTPUT clause; only the merge statement can do that. This can be useful when you want to output a lookup from a source to a newly inserted identity (auto-increment) column.

Generally, stay away from MERGE. At least in SQL Server.

singingfish · on Sept 25, 2023

I've done a lot of work on MERGE queries recently. For postgres I suspect that self updating CTEs that report back what actually happened are maybe a better idea.

chasil · on Sept 25, 2023

MERGE is standardized in ANSI SQL 2003, and updated in 2008:

https://en.wikipedia.org/wiki/Merge_(SQL)

No question, everyone should use standardized, cross-platform syntax if at all possible.

Maybe someday ANSI will standardize some kind of upsert syntax; until that time use upsert only if you are explicitly tying your SQL to your particular database.

roenxi · on Sept 25, 2023

> No question, everyone should use standardized, cross-platform syntax if at all possible.

There are big questionmarks around this statement. SQL isn't an interchange format. Standard table design goes a long way because data has lots of different consumers and needs frequently to be moved around between systems, standard SQL is nothing useful to aim for. Everyone already uses DB-specific drivers and syntax because it is better than the alternative.

- The standard is not generally available. Most of us will never learn what is in it.

- SQL standardises terrible syntax. The SQL standards committee has a unique view on programming where "function_name(a, b, c);" is some sort of weird alternative syntax. They haven't gotten as far as working out variables or function composition either [0].

- Tying SQL to your specific database is the best option for performance. Writing database-independent SQL is somethign of a fools errand, it is either trivial performance-insensitive enough that you should be using a code generator or complex enough to deserve DB-specific turning.

[0] I assume. Not going to pay for a copy of the standard to check.

MarkusWinand · on Sept 26, 2023

> The standard is not generally available. Most of us will never learn what is in it.

It also not aimed at users, but at implementors. Funny enough, they don't read the standard either ;) But more seriously: Some implementations are old and generally vendors prefer not changing the behavior of their product. When the standard comes later, the train has already departed. The most critical incompatibilities are in the elder features. The newer ones have a tendency the be more aligned with standard behavior (e.g. window functions are typically implemented just fine).

> They haven't gotten as far as working out variables or function composition either [0].

Part 2 SQL[0] is declarative and intentionally doesn't have variables. Part 4 SQL (aka "pl" SQL) does have variables. I personally consider Part 4 obsolete on systems that have sufficient modern Part 2 support.

> Tying SQL to your specific database is the best option for performance. Writing database-independent SQL is somethign of a fools errand, it is either trivial performance-insensitive enough that you should be using a code generator or complex enough to deserve DB-specific turning.

While this is certainly true for some cases there are also plenty of examples where the standard SQL is more concise and less error prone than product-specific alternatives. E.g. there COALESCE is the way to go rather than isnull, ifnull, nvl, or the like (typically limited to two arguments, sometimes strange treatment of different types).

There is a lot of *unnecessary* vendor-lock in in the field.

[0] https://modern-sql.com/standard/parts

roenxi · on Sept 26, 2023

> Part 2 SQL[0] is declarative and intentionally doesn't have variables...

The issue I have with that is that making a bad decision for a reason doesn't change the fundamental correctness of the decision.

SQL semantics clearly support variables, because they are available as WITH ... clauses and subqueries. So either these elements are a mistake and should be discouraged, or the language has variables and they should be made to read like variables in almost all other languages for consistency and readability.

This is literally a cosmetic issue but a lot of thought on syntax has happened since the 1970s and there seems to be an overwhelming on-the-ground consensus that SQL's syntax is not the way to lay code out. The semantics? Great. But there is nearly nothing from SQL's syntax choices that is present in any surviving programming language since. Many of the more popular languages explicitly reject SQL syntax decisions like Python (significant whitespace), "function(arg,arg,arg)" instead of constructs like "VERB arg,arg WITH TEXT,arg TEXT arg" (everything I've met bar lisp) for functions and not replicating broken English (everything).

Variables are a different thing because the declarative nature of SQL is a useful property. But standard SQL overall does such an appalling job of handling syntax I feel confident that they have gotten this part of the syntax wrong too.

Gotten a bit off topic, just happens to be a thing I feel strongly about.

epgui · on Sept 26, 2023

Lisp is in no way closer to SQL syntax than languages like python. On the contrary, lisp syntax is much more regular.

jdwyah · on Sept 26, 2023

Sigh, I never wanted to read the SQL standard before, but now that I know it’s behind a pay wall I want to read it.

epgui · on Sept 26, 2023

I don’t believe there is a SQL standard.

petergeoghegan · on Sept 25, 2023

What most users want is a statement that allows them to UPSERT, with reasonable guarantees around that never throwing a duplicate violation error, and never deadlocking (at least not in any novel way caused by implementation details). Something practical.

That's not the kind of thing that the standard takes a position on, for a variety of reasons. In general, it says very little about concurrency control/isolation. And what little is does say is all very abstract.

For whatever reason, a lot of people believe that MERGE promises something over and above what using multiple INSERT, UPDATE, and DELETE statements will promise. But that simply isn't true. The standard doesn't say anything that even vaguely suggests otherwise. Same is true of both the SQL Server MERGE docs, and the Oracle MERGE docs.

_a_a_a_ · on Sept 25, 2023

Before you recommend anything, read this about MSSQL's dangerously borked MERGE https://www.mssqltips.com/sqlservertip/3074/use-caution-with...

Anyone using MSSQL should read it.

petergeoghegan · on Aug 18, 2023

> No. This is what I call the "portable assembler"-understanding of undefined behavior and it is entirely false.

"C has been characterized (both admiringly and invidiously) as a portable assembly language" - Dennis Ritchie

The idea of C as a portable assembler is not without its problems, to be sure -- it is an oxymoron at worst, and a squishy idea at best. But the tendency of compiler people to refuse to take the idea seriously, even for a second, just seems odd. The Linux kernel's memory-barriers.txt famously starts out by saying:

"Some doubts may be resolved by referring to the formal memory consistency model and related documentation at tools/memory-model/. Nevertheless, even this memory model should be viewed as the collective opinion of its maintainers rather than as an infallible oracle."

Isn't that consistent with the general idea of a portable assembler?

> I agree that undefined behavior is a silly concept but that's the fault of the standard, not of compilers.

The people that work on compilers have significant overlap with the people that work on the standard. They certainly seem to share the same culture.

qsort · on Aug 18, 2023

> But the tendency of compiler people to refuse to take the idea seriously, even for a second, just seems odd.

It's not taken seriously because it shouldn't be taken seriously. It's a profoundly ignorant idea that's entirely delusional about reality. Architectures differ in ways that are much more profound than how parameters go on the stack or what arguments instructions take. As a matter of fact the C standard bends over backwards in the attempt of not specifying a memory model.

Any language that takes itself seriously is defined in terms of its abstract machine. The only alternative is the Perl way: "the interpreter is the specification", and I don't see how that's any better.

petergeoghegan · on Aug 18, 2023

> It's not taken seriously because it shouldn't be taken seriously

I really don't know what you're arguing against. I never questioned the general usefulness of an abstract machine. I merely pointed out that a large amount of important C code exists that is in tension with the idea that of an all important abstract machine. This is an empirical fact. Is it not?

You are free to interpret this body of C code as "not true ISO C", I suppose. Kind of like how the C standard is free to remove integer overflow checks in the presence of undefined behavior.

nickelpro · on Aug 18, 2023

> As a matter of fact the C standard bends over backwards in the attempt of not specifying a memory model.

I mean, C explicitly specifies a memory model and has since C11

grotorea · on Aug 18, 2023

I wonder what's the best solution here then. A different language that actually is portable assembly, or has less undefined behaviour or simpler semantics (e.g RIIR), or making -O0 behave as portable assembly?

_kst_ · on Aug 18, 2023

Step 1: Define just what "portable assembly" actually means.

An assembly program specifies a sequence of CPU instructions. You can't do that in a higher-level language.

Perhaps you could define a C-like language with a more straightforward abstract machine. What would such a language say about the behavior of integer overflow, or dereferencing a null pointer, or writing outside the bounds of an array object?

You could resolve some of those things by adding mandatory run-time checks, but then you have a language that's at a higher level than C.

dale_glass · on Aug 18, 2023

> Perhaps you could define a C-like language with a more straightforward abstract machine. What would such a language say about the behavior of integer overflow

Whatever the CPU does. Eg, on x86, twos complement.

> or dereferencing a null pointer

Whatever the CPU does. Eg, on X86/Linux in userspace, it segfaults 100% predictably.

> or writing outside the bounds of an array object?

Whatever the CPU does. Eg, on X86/Linux, write to whatever is next in memory, or segfault.

> You could resolve some of those things by adding mandatory run-time checks, but then you have a language that's at a higher level than C.

No checks needed. Since we're talking about "portable assembly", we're talking about translating to assembly in the most direct manner possible. So dereferencing a NULL pointer literally reads from address 0x0.

dooglius · on Aug 18, 2023

> What would such a language say about the behavior of integer overflow

Two's complement (i.e. the result which is equivalent to the mathematical answer modulo 2^{width})

> dereferencing a null pointer

A load/store instruction to address zero.

> writing outside the bounds of an array object

A store instruction to the corresponding address. It's possible this could overwrite something important on the stack like a return address, in which case the compiler doesn't have to work around this (though if the compiler detects this statically, it should complain rather than treating it as unreachable)

mpweiher · on Aug 19, 2023

The reason not to define these things is exactly so C can be used as a high-level assembler, and the answer is always “whatever it is that the CPU naturally does”

"Committee did not want to force programmers into writing portably, to preclude the use of C as a “high-level assembler:”

https://www.open-std.org/JTC1/SC22/WG14/www/docs/n897.pdf

p10, line 39

"C code can be portable. "

line 30

mpweiher · on Aug 18, 2023

That's an interesting opinion.

But it has very little to do with the C programming language.

muldvarp · on Aug 19, 2023

> The idea of C as a portable assembler is not without its problems

The main problem is that C is not a "portable assembler". You mainly argue that it should be, but it simply isn't (and hasn't been for a long time if it ever was).

> The people that work on compilers have significant overlap with the people that work on the standard. They certainly seem to share the same culture.

Isn't that beside the point? If you want C to be a "portable assembler" you have to write a standard that specifies its behavior. The compilers will then follow.

petergeoghegan · on Aug 9, 2023

> The real problem here is the x86 architecture

Even if we could say for sure that x86 has been disproportionately affected by speculative execution bugs (which already seems dubious), that could easily be due to a kind of selection bias. Presumably security researchers as a group more or less focus on the most popular and relevant ISAs/microarchitectures.