Hacker Newsnew | past | comments | ask | show | jobs | submit | jchandra's commentslogin

Yeah, that’s consistent. topK keeps the obvious tokens, but subtle context gets eroded over time rather than dropped all at once.


Fair point, the gap isn’t huge in that plot, and both degrade at low ratios. The difference is more in how they degrade: TopK can have sharper, localized failures, while HAE tends to be a bit more smooth. That doesn’t always show up strongly in average MSE.

That said, the gains are modest right now, this is still a research prototype exploring the tradeoff, and there’s clearly more work to be done.


Thanks, really appreciate the pointer. Will dig into it.


Haha, that’s a very fair reading :)

Yeah, the latency hit is definitely real. That said, most of what I’ve run so far is CPU-bound, which likely exaggerates it quite a bit so I didn’t want to draw strong conclusions from that.

Would need proper GPU implementations to really understand where it lands.


I completely agree.Right now this is all on a synthetic setup to isolate the behavior and understand the reconstruction vs memory tradeoff. Real models will definitely behave differently.

I’ve started trying this out with actual models, but currently running things CPU-bound, so it’s pretty slow. Would ideally want to try this properly on GPU, but that gets expensive quickly

So yeah, still very much a research prototype — but validating this on real models/data is definitely the next step.


That’s a great point and yeah, I’d agree SVD itself isn’t new at all.

On downsides: definitely a few. The biggest one is latency - SVD is fairly heavy, so even though it’s amortized (runs periodically, not per token), it still adds noticeable overhead. It’s also more complex than simple pruning, and I haven’t validated how well this holds on real downstream tasks yet.

This is very much a research prototype right now more about exploring a different tradeoff space than something ready for production.


In this prototype, OLS + SVD isn’t per-token, it runs only when the recycle bin fills (amortized over multiple tokens).

That said, it’s still heavier than Top-K. I haven’t benchmarked end-to-end latency yet; this is mainly exploring the accuracy vs memory tradeoff.


I’ve been exploring KV cache optimization for LLM inference.

Most methods (Top-K, sliding window) prune tokens. This works on average, but fails selectively — a few tokens cause large errors when removed.

I tried reframing the problem as approximating the attention function: Attn(Q, K, V)

Prototype: - entropy → identify weak tokens - OLS → reconstruct their contribution - SVD → compress them

Early results show lower error than Top-K at low memory, sometimes even lower memory overall.

This is still a small research prototype, would appreciate feedback or pointers to related work.


Totally fair point — at the end of the day, it's all about getting the best model performance. I was mostly trying to highlight how, under the hood, a lot of modern HPO algos really boil down to smart scheduling decisions.


Total Computational Budget in Hyperband need to be elaborated. There are more things to it.


Pickle still is good for custom objects (JSON loses methods and also order), Graphs & circular refs (JSON breaks), Functions & lambdas (Essential for ML & distributed systems) and is provided out of box.


We're contemplating protocols that don't evaluate or run code; that rules out serializing functions or lambdas (i.e., code).

Custom objects in Python don't have "order" unless they're using `__slots__` - in which case the application already knows what they are from its own class definition. Similarly, methods don't need to be serialized.

A general graph is isomorphic to a sequence of nodes plus a sequence of vertex definitions. You only need your own lightweight protocol on top.


Because globals(), locals(), Classes and classInstances are backed by dicts, and dicts are insertion ordered in CPython since 3.6 (and in the Python spec since 3.7), object attributes are effectively ordered in Python.

Object instances with __slots__ do not have a dict of attributes.

__slots__ attributes of Python classes are ordered, too.

(Sorting and order; Python 3 objects must define at least __eq__ and __lt__ in order to be sorted. @functools.total_ordering https://docs.python.org/3/library/functools.html#functools.t... )

Are graphs isomorphic if their nodes and edges are in a different sequence?

  assert dict(a=1, b=2) == dict(b=2, a=1)

  from collections import OrderedDict as odict
  assert dict(a=1, b=2) != dict(b=2, a=1)
To crytographically sign RDF in any format (XML, JSON, JSON-LD, RDFa), a canonicalization algorithm is applied to normalize the input data prior to hashing and cryptographically signing. Like Merkle hashes of tree branches, a cryptographic signature of a normalized graph is a substitute for more complete tests of isomorphism.

RDF Dataset Canonicalization algorithm: https://w3c-ccg.github.io/rdf-dataset-canonicalization/spec/...

Also, pickle stores the class name to unpickle data into as a (variously-dotted) str. If the version of the object class is not in the class name, pickle will unpickle data from appA.Pickleable into appB.Pickleable (or PickleableV1 into PickleableV2 objects, as long as PickleableV2=PickleableV1 is specified in the deserializer).

So do methods need to be pickled? No for security. Yes because otherwise the appB unpickled data is not isomorphic with the pickled appA.Pickleable class instances.

One Solution: add a version attribute on each object, store it with every object, and discard it before testing equality by other attributes.

Another solution: include the source object version in the class name that gets stored with every pickled object instance, and try hard to make sure the dest object is the same.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: