More

xxs · 2026-03-27T15:53:08 1774626788

That's java code, though... bit weird, esp. i % 8 (which is just i & 7). The compiler should be able to optimize it since 'i' is guaranteed to be non-negative, still awkward.

Java CRC32 nowadays uses intrinsics and avx128 for crc32.

xxs · 2026-03-27T15:51:17 1774626677

Crc32 can be written in handful lines of code. Although it'd be better to use the vector instruction set - e.g. AVX when available.

xxs · 2026-03-19T09:10:02 1773911402

as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)

That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.

creationix · 2026-03-19T17:45:32 1773942332

yep. I built custom JSON parsers as a first solution. The problem is you can't get away from scanning at least half the document bytes on average.

With RX and other truly random-access formats you could even optimize to the point of not even fetching the whole document. You could grab chunks from a remote server using HTTP range requests and cache locally in fixed-width blocks.

With JSON you must start at the front and read byte-by-byte till you find all the data you're looking for. Smart parsers can help a lot to reduce heap allocations, but you can't skip the state machine scan.

xxs · 2026-03-19T09:08:18 1773911298

what do you mean by little data, most communication protocols are not one off

xxs · 2026-03-17T19:22:56 1773775376

That's a proper late 90s reference, props!

xxs · 2026-03-16T19:50:26 1773690626

What do you mean - if Java returns memory to the OS? Which one - Java heap of the malloc/free by the JVM?

cogman10 · 2026-03-16T20:01:43 1773691303

Java is pretty greedy with the memory it claims. Especially historically it was pretty hard to get the JVM to release memory back to the OS.

To an outsider, that looks like the JVM heap just steadily growing, which is easy to mistake for a memory leak.

k_roy · 2026-03-16T20:12:44 1773691964

> Especially historically it was pretty hard to get the JVM to release memory back to the OS.

This feels like a huge understatement. I still have some PTSD around when I did Java professionally between like 2005 and 2014.

The early part of that was particularly horrible.

xxs · 2026-03-16T20:10:12 1773691812

Java has a quite strict max heap setting, it's very uncommon to let it allocate up to 25% of the system memory (the default). It won't grow past that point, though.

Baring bugs/native leaks - Java has a very predictable memory allocation.

NooneAtAll3 · 2026-03-17T09:10:34 1773738634

we aren't talking about allocation, tho

we are talking about DEallocation

xxs · 2026-03-17T12:12:53 1773749573

it's a reply to:

"To an outsider, that looks like the JVM heap just steadily growing, which is easy to mistake for a memory leak."

I cut the part that it's possible to make JVM return memory heap after compaction but usually it's not done, i.e. if something grew once, it's likely to do it again.

adgjlsfhk1 · 2026-03-16T20:45:48 1773693948

This only really ends up being a problem on windows. On systems with proper virtual memory setups, the cost of unused memory is very low (since the the OS can just page it out)

cogman10 · 2026-03-16T23:09:48 1773702588

Unfortunately, the JVM and collectors like the JVM's plays really bad with virtual memory. (Actually, G1 might play better. Everything else does not).

The issue is that through the standard course of a JVM application running, every allocated page will ultimately be touched. The JVM fills up new gen, runs a minor collection, moves old objects to old gen, and continues until old gen gets filled. When old gen is filled, a major collection is triggered and all the live objects get moved around in memory.

This natural action of the JVM means you'll see a sawtooth of used memory in a properly running JVM where the peak of the sawtooth occasionally hits the memory maximum, which in turn causes the used memory to plummet.

pjmlp · 2026-03-17T07:49:12 1773733752

Depends on which JVM, PTC and Aicas do alright with their real time GCs for embedded deployment.

cogman10 · 2026-03-17T14:13:11 1773756791

I've never really used anything other than the OpenJDK and Azuls.

How does PTC and Aicas does GC? Is it ref counted? I'm guessing they aren't doing moving collectors.

pjmlp · 2026-03-17T15:17:39 1773760659

They are real time GCs, nothing to do with refcounting.

One of the founding members of Aicas is the author of "Hard Realtime Garbage Collection in Modern Object Oriented Programming Languages" book, which was done as part of his PhD.

snackbroken · 2026-03-16T22:50:36 1773701436

For video games it is pretty bad, because reading back a page from disk containing "freed" (from the application perspective, but not returned to the OS) junk you don't care about is significantly slower than the OS just handing you a fresh one. A 10-20ms delay is a noticeable stutter and even on an SSD that's only a handful of round-trips.

cogman10 · 2026-03-16T23:12:34 1773702754

Games today should be using ZGC.

There's a lot of bad tuning guides for minecraft that should be completely ignored and thrown in the trash. The only GC setting you need for it is `-XX:+UseZGC`

For example, a number of the minecraft golden guides I've seen will suggest things like setting pause targets but also survivor space sizes. The thing is, the pause target is disabled when you start playing with survivor space sizes.

xxs · 2026-03-17T04:16:32 1773720992

Overall if java hits the swap, it's a bad case. Windows is a like special beast when it comes to 'swapping', even if you don't truly needed it. On linux all (server) services run with swapoff.

pjmlp · 2026-03-17T07:48:23 1773733703

Not used Windows Server that much?

xxs · 2026-03-16T19:26:34 1773689194

Refactor doesn't mean just artificial puff-up jobs, it's very likely internal changes and reorganization (hence 100s of hours).

There are not many engineers capable of working on memory allocators, so adding more burden by agentic stuff is unlikely to produce anything of value.

xxs · 2026-03-16T19:22:17 1773688937

Few months back, some of the services switched to jemalloc for the Java VM. It took months (of memory dumps and tracing sys-calls) to blame the JVM, itself, for getting killed by the oom_killer.

Initially the idea was diagnostics, instead the the problem disappeared on its own.

yxhuvud · 2026-03-16T20:27:44 1773692864

If you changed from glibc to jemalloc and that solved your issues, then you should blame glibc, not the JVM.

xxs · 2026-03-16T20:47:28 1773694048

Well, indeed - I thought that part was obvious reading it.

xxs · 2026-03-13T11:15:23 1773400523

that doesn't help either. 'Salt' is public and usually different/unique per entry/name.

If you mean to use a "secret" prefix (i.e. pepper) then, that would generate effectively globally unique names each time (and unpredictable too) but you can't change the pepper and it's only a matter of time it'd leak.

lcnPylGDnU4H9OF · 2026-03-13T13:11:49 1773407509

If they can't make the bucket before you do then they are not "bucket squatting", and they can't do so for a salted and hashed bucket name without knowing the salt at runtime.

The public/private distinction seems moot here, too: the salt is a throwaway since you just need the bucket name.

Even if you do need to keep track of the salt, it should be safe for the attacker to know, at least with respect to this attack, because you already own the bucket which the attacker would otherwise hoard.

ethanrutherford · 2026-03-13T19:43:06 1773430986

The "squatting" part of "bucket squatting" is a bit of a misnomer here. The attack vector is actually in the opposite direction.

1. You set up an aws bucket with some name (any name whatsoever).

2. You have code that reads and/or writes data to the bucket.

3. You delete the bucket at some later date, but miss some script/process somewhere that is still attempting to use the bucket. For the time being, that process lies around, silently failing to access the bucket.

4. The bucket name is recycled and someone else makes a bucket with the same name. Perhaps it's an accident, or perhaps it's because by some means an attacker became aware of the bucket name, discovers that the name is available, and decided to "squat" the name.

5. That overlooked script or service is happy to see the bucket it's been trying to access all this time is available again.

You now have something potentially writing out private data, or potentially reading data and performing actions as a result, that is talking to attacker-owned infrastructure.

nulltrace · 2026-03-13T23:08:59 1773443339

Seen this happen with Terraform. One team tears down a stack, bucket gets deleted, but another stack still has the name hardcoded in an output. Next CI run uploads artifacts to a bucket name that's now up for grabs. You only notice when deploys start failing. Or worse, succeeding against someone else's bucket.

tosti · 2026-03-13T13:12:58 1773407578

Random pepper. Or just, y'know, randomly generate the effing string. Can't be that hard.

xxs · 2026-03-16T11:40:28 1773661228

Of course, any UUIDv4 would do it (or any random stuff in general). I suppose the idea was having a naming scheme, instead of sharing the paths explicitly (and having an internal mapping for them)

xxs · 2026-03-12T09:32:05 1773307925

>For every "20 min max" take home assignment, there will be people who are willing to spend 4+ hours doing it to outshine candidates who have jobs, families and lives.

The ones we use have a clear scoring system and prepared inputs - all it matters is the generated output.