That's java code, though... bit weird, esp. i % 8 (which is just i & 7). The compiler should be able to optimize it since 'i' is guaranteed to be non-negative, still awkward.
Java CRC32 nowadays uses intrinsics and avx128 for crc32.
as parser: keep only indexes to the original file (input), dont copy strings or parse numbers at all (unless the strings fit in the index width, e.g. 32bit)
That would make parsing faster and there will be very little in terms on tree (json can't really contain full blow graphs) but it's rather complicated, and it will require hashing to allow navigation, though.
yep. I built custom JSON parsers as a first solution. The problem is you can't get away from scanning at least half the document bytes on average.
With RX and other truly random-access formats you could even optimize to the point of not even fetching the whole document. You could grab chunks from a remote server using HTTP range requests and cache locally in fixed-width blocks.
With JSON you must start at the front and read byte-by-byte till you find all the data you're looking for. Smart parsers can help a lot to reduce heap allocations, but you can't skip the state machine scan.
Java has a quite strict max heap setting, it's very uncommon to let it allocate up to 25% of the system memory (the default). It won't grow past that point, though.
Baring bugs/native leaks - Java has a very predictable memory allocation.
"To an outsider, that looks like the JVM heap just steadily growing, which is easy to mistake for a memory leak."
I cut the part that it's possible to make JVM return memory heap after compaction but usually it's not done, i.e. if something grew once, it's likely to do it again.
This only really ends up being a problem on windows. On systems with proper virtual memory setups, the cost of unused memory is very low (since the the OS can just page it out)
Unfortunately, the JVM and collectors like the JVM's plays really bad with virtual memory. (Actually, G1 might play better. Everything else does not).
The issue is that through the standard course of a JVM application running, every allocated page will ultimately be touched. The JVM fills up new gen, runs a minor collection, moves old objects to old gen, and continues until old gen gets filled. When old gen is filled, a major collection is triggered and all the live objects get moved around in memory.
This natural action of the JVM means you'll see a sawtooth of used memory in a properly running JVM where the peak of the sawtooth occasionally hits the memory maximum, which in turn causes the used memory to plummet.
They are real time GCs, nothing to do with refcounting.
One of the founding members of Aicas is the author of "Hard Realtime Garbage Collection in Modern Object Oriented Programming Languages" book, which was done as part of his PhD.
For video games it is pretty bad, because reading back a page from disk containing "freed" (from the application perspective, but not returned to the OS) junk you don't care about is significantly slower than the OS just handing you a fresh one. A 10-20ms delay is a noticeable stutter and even on an SSD that's only a handful of round-trips.
There's a lot of bad tuning guides for minecraft that should be completely ignored and thrown in the trash. The only GC setting you need for it is `-XX:+UseZGC`
For example, a number of the minecraft golden guides I've seen will suggest things like setting pause targets but also survivor space sizes. The thing is, the pause target is disabled when you start playing with survivor space sizes.
Overall if java hits the swap, it's a bad case. Windows is a like special beast when it comes to 'swapping', even if you don't truly needed it. On linux all (server) services run with swapoff.
Few months back, some of the services switched to jemalloc for the Java VM. It took months (of memory dumps and tracing sys-calls) to blame the JVM, itself, for getting killed by the oom_killer.
Initially the idea was diagnostics, instead the the problem disappeared on its own.
that doesn't help either. 'Salt' is public and usually different/unique per entry/name.
If you mean to use a "secret" prefix (i.e. pepper) then, that would generate effectively globally unique names each time (and unpredictable too) but you can't change the pepper and it's only a matter of time it'd leak.
If they can't make the bucket before you do then they are not "bucket squatting", and they can't do so for a salted and hashed bucket name without knowing the salt at runtime.
The public/private distinction seems moot here, too: the salt is a throwaway since you just need the bucket name.
Even if you do need to keep track of the salt, it should be safe for the attacker to know, at least with respect to this attack, because you already own the bucket which the attacker would otherwise hoard.
The "squatting" part of "bucket squatting" is a bit of a misnomer here. The attack vector is actually in the opposite direction.
1. You set up an aws bucket with some name (any name whatsoever).
2. You have code that reads and/or writes data to the bucket.
3. You delete the bucket at some later date, but miss some script/process somewhere that is still attempting to use the bucket. For the time being, that process lies around, silently failing to access the bucket.
4. The bucket name is recycled and someone else makes a bucket with the same name. Perhaps it's an accident, or perhaps it's because by some means an attacker became aware of the bucket name, discovers that the name is available, and decided to "squat" the name.
5. That overlooked script or service is happy to see the bucket it's been trying to access all this time is available again.
You now have something potentially writing out private data, or potentially reading data and performing actions as a result, that is talking to attacker-owned infrastructure.
Seen this happen with Terraform. One team tears down a stack, bucket gets deleted, but another stack still has the name hardcoded in an output. Next CI run uploads artifacts to a bucket name that's now up for grabs. You only notice when deploys start failing. Or worse, succeeding against someone else's bucket.
Of course, any UUIDv4 would do it (or any random stuff in general). I suppose the idea was having a naming scheme, instead of sharing the paths explicitly (and having an internal mapping for them)
>For every "20 min max" take home assignment, there will be people who are willing to spend 4+ hours doing it to outshine candidates who have jobs, families and lives.
The ones we use have a clear scoring system and prepared inputs - all it matters is the generated output.
Java CRC32 nowadays uses intrinsics and avx128 for crc32.
reply