Memcached-Backed Content Infrastructure at Khan Academy

rubyn00bie · on May 23, 2017

I would've liked to have seen why they chose Memcahce over something like Redis. While this is a nice overview it feels pretty light on details which I think for their apparent scale would be the really interesting part.

From what I can tell is they mostly moved to calling multiple keys at once instead of individually. The internationalization bit was was interesting as a point to consider, but hardly seemed like a real innovation more just a solid tip. Maybe I'm dense and missed something important...

scurvy · on May 23, 2017

Memory fragmentation isn't an issue with memcached like it is for Redis. Given enough uptime and object churn, redis will use all the memory on a server and trigger OOM's. This is because redis counts the total amount of memory used to store objects for its "max memory" limit. With enough object churn, boom.

Memcached uses its max memory configuration parameter to decide how much memory to allocate, and that's it. System memory usage never budges.

Basically it boils down to fragmentation vs slab allocator model, and I'll take slabs for stability.

tyingq · on May 23, 2017

I don't think redis is available on AppEngine standard.

benkraft · on May 23, 2017

Yeah, that's the main reason in practice. We also just don't need any fancy features.

jedmeyers · on May 23, 2017

Would having managed Redis on Google Cloud somehow influenced the decision?

benkraft · on May 23, 2017

Not necessarily -- unless it had clear perf benefits (and I don't know if it would). I don't think we have much use for the features, so no need to change unless there's a reason.

lukasm · on May 23, 2017

Do you have any GAE resources (blogposts, github etc) recommendations?

rubyn00bie · on May 23, 2017

Ah! That totally makes sense. Thank you.

pstrateman · on May 23, 2017

if you're strictly doing key/value caching memcached is faster

laumars · on May 23, 2017

I'd be interested in reading a source to this claim because I did quite a bit of research on memcached vs redis a couple of years ago and the conclusion I came to was that redis's performance was at least comparable to memcached, but potentially even faster in the (then new) latest engine.

I've also found redis to be easier to manage than memcached and easier to scale too - which was one of the clinching points that sold that K/V DB to us.

Pragmatically, the performance of redis vs memcached is similar enough for most people that very few people would need to decide based solely on throughput and read time alone. For example our application would regularly see hundreds of thousands of concurrent users with several million keys stored and yet our redis server was barely taxed at all.

rak00n · on May 24, 2017

Related: http://antirez.com/news/94

rubyn00bie · on May 23, 2017

It has been a while but is that with Redis configured not to store things on disk and configured properly for the machine?

(Again) It has been a while but I remember from my own tests Redis being faster. Granted I could've been exploiting some of redis's data types that allowed for my contradictory results (which is what you're saying).

I'm in no way saying memcache isn't awesome. It is great, dead simple, and works like a charm OOB. I used it in production for years with no real issues (except multi key performance).

OriginalPenguin · on May 23, 2017

Source?

malinens · on May 23, 2017

not only faster but much more stable

jdubs · on May 23, 2017

You should give mcrouter a good look. It has some interesting features like cache warming, fail over and other really interesting features that will save your ass.

chevman · on May 23, 2017

Why reinvent something that could be solved by an existing CMS - i.e. Al Fresco, Adobe Experience Manager, etc?

Curious if it is a financial decision, technical, or something else.

benkraft · on May 23, 2017

Tom (the original author) wrote about that a bit: http://www.arguingwithalgorithms.com/posts/14-01-03-content-.... In general I think we want to do a lot of custom things, so we'd need a very deep integration with an existing CMS, to the point where the costs would probably outweigh the benefits. For example, we have our own editor for articles and exercises (https://github.com/Khan/perseus), which does lots of fancy things, so having a prebuilt editor wouldn't be useful, and in fact would be something we'd have to replace. Controlling the whole system is just simpler for us.

rglullis · on May 23, 2017

OT, I know... and I don't want to pick on Khan Academy, but I noticed that the submitter is an account specifically created to submit content related to their own blog. If you look at the submission history, there are only a half-dozen of links, all of them to KA's blog.

I know that there are no rules about promoting your own content, but how is this not considered a sock puppet account?

tezka · on May 23, 2017

The account is named ka-engineering, no need to mine posting history. There are some world class engineers working in KA (e.g Craig Silverstein), I would take their blog posts over most medium junk stuff often posted here anyday with utmost gratefulness.

rglullis · on May 23, 2017

> I would take their blog posts

Me too. I am not talking about the content they are posting.

I am talking about this abuse of something that is supposed to be "community-curated". I just don't think it sets a good precedent to have every company using HN as some kind of PR channel.

Yesterday there were (pretty valid!) complaints from the "CEOs of Github's Elite Partners" showing up on the GH Marketplace thread. The complaints were valid because the comments from the CEOs were not actually trying to be informative, they were just trying to create marketing hype.

HN is much better when it's the place to find things we wouldn't find anywhere else. If it becomes standard practice for companies to submit every blog post to HN, how would HN differ from Twitter or Facebook? The whole voting part helps, but if the community starts to accept this asymmetry between content producers and consumers, it will be as bland as every other place.

tracker1 · on May 23, 2017

I think it depends on the specific source, and content... This isn't like the typical advertorial tech blog posts that a lot of us see. Auth0 is the one that really sticks out to me on that regard, though far from the only one.

So long as the blogs are well written, informative, and have some technical depth; even if from a single-source they have value. I could see similar blogs from any number of technical resources. MS bloggers have similar quality and often show up here regularly. FB as well. It's a matter of quality, and the voting model tends to bear that out pretty well.

rglullis · on May 23, 2017

I don't know... You might be right that the voting part can take care of this. What I fear though is that the community will get used to this idea that there is no point in submitting anything that is outside of the mainstream, because the mainstream is already using HN, and any fringe topic will be drowned by the other voters.