One interesting thing: the raw dump that's linked from the list's README doesn't seem to include a couple of notable domains from the README itself, like news.ycombinator.com or reddit.com. I may be mangling the dump or incorrectly downloading it in some way.
EDIT: disclaimer, be responsible, audit how the dump is generated, etc etc etc
https://gist.github.com/dikaiosune/0ca7829884b3b3f790418f0f1...
Improvements welcome.
One interesting thing: the raw dump that's linked from the list's README doesn't seem to include a couple of notable domains from the README itself, like news.ycombinator.com or reddit.com. I may be mangling the dump or incorrectly downloading it in some way.
EDIT: disclaimer, be responsible, audit how the dump is generated, etc etc etc