But why? There is no GIL in C/C++/Rust/Zig/Whatever, just use threads.

ynik · on Oct 2, 2023

Because in our case, all threads would be running a mixture of Python and C++. Our core data structure is implemented in C++, large (often >10 GB), and can be shared across threads (thread safe, usage is mostly read-only). But we have lots of analysis algorithms accessing that data structure, most implemented in Python. We tried releasing the GIL for every tiny call to C++, but that approach can barely keep two cores busy due to constant fighting over the GIL. (there's no good "inner loop" that could avoid touching the GIL within the loop body)

Rewriting most/all of the Python analyses in a different (GIL-free language) is a no-go, the analyses have accumulated over the years and now there's more than a thousand of them. It would consume all our development resources for the next ~5 years. In retrospect I can say that choosing Python for these was major mistake, but it's one that cannot be fixed without a company-killing rewrite :(

We actually invested several months of developer time in allocating our core data structure in shared memory, allowing us to parallelize with multiprocessing. But there's still a whole bunch of ancillary data structures written in C++ that are not so easy to put in shared memory, so all analyses touching those are limited to a single process, which by Amdahl's law immediately starting dominating our execution time.

nutate · on Oct 2, 2023

Let's say you wanted to run some python code that was written and test it against some C/C++/Rust for accuracy of some sort (numerical, lexicographical, etc). In the old way you would have to fire up multiple processes to do that (like OS level processes) but now you can have your multithreaded compiled code running in threads and your multi-GIL'd interpreted code running all in one process and comparing their results in the `main` of your C/C++/Rust. That's a contrived example, but the issue was that a single GIL isn't threadsafe in and of itself. So if you're using these compiled languages as sort of python runners you couldn't multithread python interpreter execution and guarantee the code working. Also as the above comment stated, you could do hacks, but you'd double your memory allocation by needing a python and C/C++/Rust representation for everything that went back and forth.