Hacker Newsnew | past | comments | ask | show | jobs | submit | xxs's commentslogin

That's java code, though... bit weird, esp. i % 8 (which is just i & 7). The compiler should be able to optimize it since 'i' is guaranteed to be non-negative, still awkward.

Java CRC32 nowadays uses intrinsics and avx128 for crc32.


Crc32 can be written in handful lines of code. Although it'd be better to use the vector instruction set - e.g. AVX when available.

as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)

That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.


yep. I built custom JSON parsers as a first solution. The problem is you can't get away from scanning at least half the document bytes on average.

With RX and other truly random-access formats you could even optimize to the point of not even fetching the whole document. You could grab chunks from a remote server using HTTP range requests and cache locally in fixed-width blocks.

With JSON you must start at the front and read byte-by-byte till you find all the data you're looking for. Smart parsers can help a lot to reduce heap allocations, but you can't skip the state machine scan.


what do you mean by little data, most communication protocols are not one off

That's a proper late 90s reference, props!

What do you mean - if Java returns memory to the OS? Which one - Java heap of the malloc/free by the JVM?

Java is pretty greedy with the memory it claims. Especially historically it was pretty hard to get the JVM to release memory back to the OS.

To an outsider, that looks like the JVM heap just steadily growing, which is easy to mistake for a memory leak.


> Especially historically it was pretty hard to get the JVM to release memory back to the OS.

This feels like a huge understatement. I still have some PTSD around when I did Java professionally between like 2005 and 2014.

The early part of that was particularly horrible.


Java has a quite strict max heap setting, it's very uncommon to let it allocate up to 25% of the system memory (the default). It won't grow past that point, though.

Baring bugs/native leaks - Java has a very predictable memory allocation.


we aren't talking about allocation, tho

we are talking about DEallocation


it's a reply to:

"To an outsider, that looks like the JVM heap just steadily growing, which is easy to mistake for a memory leak."

I cut the part that it's possible to make JVM return memory heap after compaction but usually it's not done, i.e. if something grew once, it's likely to do it again.


This only really ends up being a problem on windows. On systems with proper virtual memory setups, the cost of unused memory is very low (since the the OS can just page it out)

Unfortunately, the JVM and collectors like the JVM's plays really bad with virtual memory. (Actually, G1 might play better. Everything else does not).

The issue is that through the standard course of a JVM application running, every allocated page will ultimately be touched. The JVM fills up new gen, runs a minor collection, moves old objects to old gen, and continues until old gen gets filled. When old gen is filled, a major collection is triggered and all the live objects get moved around in memory.

This natural action of the JVM means you'll see a sawtooth of used memory in a properly running JVM where the peak of the sawtooth occasionally hits the memory maximum, which in turn causes the used memory to plummet.


Depends on which JVM, PTC and Aicas do alright with their real time GCs for embedded deployment.

I've never really used anything other than the OpenJDK and Azuls.

How does PTC and Aicas does GC? Is it ref counted? I'm guessing they aren't doing moving collectors.


They are real time GCs, nothing to do with refcounting.

One of the founding members of Aicas is the author of "Hard Realtime Garbage Collection in Modern Object Oriented Programming Languages" book, which was done as part of his PhD.


For video games it is pretty bad, because reading back a page from disk containing "freed" (from the application perspective, but not returned to the OS) junk you don't care about is significantly slower than the OS just handing you a fresh one. A 10-20ms delay is a noticeable stutter and even on an SSD that's only a handful of round-trips.

Games today should be using ZGC.

There's a lot of bad tuning guides for minecraft that should be completely ignored and thrown in the trash. The only GC setting you need for it is `-XX:+UseZGC`

For example, a number of the minecraft golden guides I've seen will suggest things like setting pause targets but also survivor space sizes. The thing is, the pause target is disabled when you start playing with survivor space sizes.


Overall if java hits the swap, it's a bad case. Windows is a like special beast when it comes to 'swapping', even if you don't truly needed it. On linux all (server) services run with swapoff.

Not used Windows Server that much?

Refactor doesn't mean just artificial puff-up jobs, it's very likely internal changes and reorganization (hence 100s of hours).

There are not many engineers capable of working on memory allocators, so adding more burden by agentic stuff is unlikely to produce anything of value.


Few months back, some of the services switched to jemalloc for the Java VM. It took months (of memory dumps and tracing sys-calls) to blame the JVM, itself, for getting killed by the oom_killer.

Initially the idea was diagnostics, instead the the problem disappeared on its own.


If you changed from glibc to jemalloc and that solved your issues, then you should blame glibc, not the JVM.

Well, indeed - I thought that part was obvious reading it.

that doesn't help either. 'Salt' is public and usually different/unique per entry/name.

If you mean to use a "secret" prefix (i.e. pepper) then, that would generate effectively globally unique names each time (and unpredictable too) but you can't change the pepper and it's only a matter of time it'd leak.


If they can't make the bucket before you do then they are not "bucket squatting", and they can't do so for a salted and hashed bucket name without knowing the salt at runtime.

The public/private distinction seems moot here, too: the salt is a throwaway since you just need the bucket name.

Even if you do need to keep track of the salt, it should be safe for the attacker to know, at least with respect to this attack, because you already own the bucket which the attacker would otherwise hoard.


The "squatting" part of "bucket squatting" is a bit of a misnomer here. The attack vector is actually in the opposite direction.

1. You set up an aws bucket with some name (any name whatsoever).

2. You have code that reads and/or writes data to the bucket.

3. You delete the bucket at some later date, but miss some script/process somewhere that is still attempting to use the bucket. For the time being, that process lies around, silently failing to access the bucket.

4. The bucket name is recycled and someone else makes a bucket with the same name. Perhaps it's an accident, or perhaps it's because by some means an attacker became aware of the bucket name, discovers that the name is available, and decided to "squat" the name.

5. That overlooked script or service is happy to see the bucket it's been trying to access all this time is available again.

You now have something potentially writing out private data, or potentially reading data and performing actions as a result, that is talking to attacker-owned infrastructure.


Seen this happen with Terraform. One team tears down a stack, bucket gets deleted, but another stack still has the name hardcoded in an output. Next CI run uploads artifacts to a bucket name that's now up for grabs. You only notice when deploys start failing. Or worse, succeeding against someone else's bucket.


Random pepper. Or just, y'know, randomly generate the effing string. Can't be that hard.


Of course, any UUIDv4 would do it (or any random stuff in general). I suppose the idea was having a naming scheme, instead of sharing the paths explicitly (and having an internal mapping for them)

>For every "20 min max" take home assignment, there will be people who are willing to spend 4+ hours doing it to outshine candidates who have jobs, families and lives.

The ones we use have a clear scoring system and prepared inputs - all it matters is the generated output.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: