More

clickok · on Feb 17, 2023

Damn. I bought rl.ai when I was a grad student but it's been laying fallow since I can't really blog about the stuff I'm working on right now. How does one go about selling their domain for millions of dollars?

elil17 · on Feb 17, 2023

rl.ai is probably not so valuable. Everyone now knows the word "AI," but only specialists know "RL"

pavlov · on Feb 17, 2023

What’s RL? “Real life” is the only thing that comes to mind.

ilkke · on Feb 17, 2023

I was betting on 'roguelike'

profstasiak · on Feb 17, 2023

reinforcment learning

pavlov · on Feb 17, 2023

Ok, thanks.

Short names on these popular country TLDs are probably not as valuable as they might seem.

I used to own a NN.io domain name where NN was a common financial abbreviation. I thought it might be fairly valuable, but I ended up selling at auction for less than $3k. (That was before the crypto boom. Maybe it could have fetched more if I’d waited several years and sold in 2021. It seems like there was a short moment when even relatively crap .xyz domains sold for five figures.)

rvz · on Feb 17, 2023

Exactly. That is why it is probably not valuable.

manimino · on Feb 17, 2023

Which is a bit surprising, given that RL is the most likely path to the scary AI future everyone loves to blog about.

elil17 · on Feb 17, 2023

Doesn't surprise me. Maybe you've heard of low temperature heat pumps, but not the refrigerants that enabled that technology. You've certainly heard about solar panels but you probably don't know what metallurgical and manufacturing advances made them cheap enough for many people to buy.

Why should someone learn technical jargon when what there interested in is what effects the technology will have on society?

hellomyguys · on Feb 17, 2023

> How does one go about selling their domain for millions of dollars?

You have a domain someone wants to buy for millions of dollars

thm · on Feb 17, 2023

Dan, Sedo, Afternic

clickok · on May 13, 2022

According to others, that story is severely embellished:

https://savingjournalism.substack.com/p/i-talked-to-elon-mus...

clickok · on Nov 3, 2021

I see a lot of really divergent results with these time series database benchmarking posts. Timescale's open source benchmark suite[0] is a great contribution towards making different software comparable, but it seems like the tasks/metrics heavily favor TimescaleDB.

This article has Clickhouse more-or-less spanking TimescaleDB, but the blog post it references[1] is basically the reverse. Are the use cases just that different?

-----

0. https://github.com/timescale/tsbs

1. https://blog.timescale.com/blog/what-is-clickhouse-how-does-...

zepolen · on Nov 3, 2021

As someone who has used both in production environments under various workloads, I can, without a doubt, tell you that Clickhouse spanks the crap out of TimescaleDB.

The only use case where TimescaleDB is more useful is the ability to mutating/deleting single rows but even there, Clickhouse offers some workarounds at the expense of a little extra storage until a compaction is run similar to VACUUM.

Clickhouse is to TimescaleDB what Nginx was to Apache.

csdvrx · on Nov 3, 2021

> I can, without a doubt, tell you that Clickhouse spanks the crap out of TimescaleDB.

Same. I'm ready to believe my experience is not representative, but I've rarely heard something different after talking to people who've seriously evaluated both.

> Clickhouse is to TimescaleDB what Nginx was to Apache.

Perfect comparison. Except I don't remember Apache cooking some tests to pretend they are faster than nginx, or astroturfing communities :)

Different tools serve different purposes, simple as that.

If TimescaleDB or Apache does the job for you, stick with them.

When you will want to scale / increase performance or just rewrite, chose the better option of the day.

In 2021, Clickhouse should be a recommended default, like nginx.

PeterZaitsev · on Nov 3, 2021

I think both Clickhouse and TimeScaleDB are great systems with different design goals and approaches. Specifically I think Clickhouse is much better suited to "Event Logs" than "Metrics" storage (Clickhouse Inspired VM does well in this regard)

I would just encourage all vendors to be more humble positioning their benchmarks. In my practice production behaviors for better or worse rarely resemble benchmark results

eatonphil · on Nov 3, 2021

What is "Clickhouse Inspired VM"?

PeterZaitsev · on Nov 3, 2021

Hm. Not sure why My previous response is marked as dead, I guess VM is swear word It refers to V-I-CTO-R-I-A Metrics

PeterZaitsev · on Nov 3, 2021

VictoriaMetrics https://victoriametrics.com/

clickok · on July 5, 2021

A good idea and very cleanly implemented. I imagine that there's a ton of other possible applications that don't require much modification to the code. Thanks for sharing!

lorey · on July 9, 2021

Thank you so much, happy to hear! There's a more versatile version coming soon :)

clickok · on June 3, 2021

None of these solutions are ideal, although Zenodo's better than most. As far as I can tell, they're all targeted more towards the final, authoritative release, so it seems you're still out of luck during the paper writing process. What if I'm just trying to share a dataset/pre-trained model with remote collaborators?

I ran into this when doing some OCR experiments[1], finding acquiring data and pre-trained models to be the most time-consuming part of the enterprise. This ended up adding enough additional hassle that I didn't manage to get anything really interesting going, although figuring out how to containerize other peoples' code was educational. Personally, I think I'll be relying on some combination of institutional repositories + torrents/IPFS for any large datasets/models I end up releasing in the future.

-----

1. https://github.com/rldotai/ocr-experiments

clickok · on Feb 15, 2021

Who, whom?

Less tersely: this article is one in a long procession of journalists trying to exert control over tech. The opening example (Speech2Face, which they aver is transphobic) is inflammatory and utterly unrepresentative of the usual topics of AI conferences. The other references are far better, but the choice is revealing-- it's not so much an abstract concern about an unaccountable few exerting control from the shadows, but alarm that someone else might be muscling in on their territory.

croissants · on Feb 16, 2021

I'm with you that a lot of mass media writing about AI is silly, but I think this is article is not in that category. For example, it doesn't "aver [Speech2Face] is transphobic", that's from a statement by Alex Hanna, and that statement is immediately followed by comments from other people questioning the statement, and the piece in general is pretty even-keeled about giving space to criticisms and responses.

I think the article paints a good picture of the machine learning research community figuring out how to grapple with the growing number of people who want to probe its ethics, without portraying anybody as a villain.

clickok · on Feb 11, 2021

This was great, actually. I don't program in Scala, but it was very interesting to hear about the difference between types as abstractions vs types as they are used.

For unfamiliar topics or when presented with uncommon insight, I believe rants, monologues, even diatribes are actually some of the best things to read.

clickok · on Nov 22, 2020

Depends what you mean by "practical". Assuming you've done CV/ML stuff before, you could probably get something working pretty well over a weekend, and I think I could solve it completely with a bit more effort via 3D scanning + synthetic dataset generation... but unless you have cubic meters of lego to sort, doing it by hand would be faster, albeit less fun.

clickok · on Sept 26, 2020

In the words of John von Neumann, there's a lot more that's known than is proved.

It's often frustratingly difficult to go from the known to the proven; still, just because something's not proved doesn't mean that scientists are ignoring it.

Dylan16807 · on Sept 26, 2020

Also sometimes what we "know" is in fact wrong.

rtx · on Sept 26, 2020

Thanks for sharing this.

clickok · on Sept 18, 2020

It would be nice if citing repositories were easier-- either for generating a reference for my own code or acknowledging when I've used someone else's code in my research.

There's tons of math and physics blogs that contain useful results that the author wanted to make available but didn't manage to incorporate into a paper. I wonder if there'd be any interest in a sort of GitHub for proofs? It could even use git, since (assuming consistency) isn't math just a DAG anyways (and therefore isomorphic to a neural net, as are all things).

throwawaygh · on Sept 18, 2020

Traditionally that sort of stuff goes in tech reports, dissertations, or text books.

What's missing is the dissemination piece. Somehow people will absolutely refuse to take seriously the job of citing code they use, even when their main result is obtainable by "and then I ran something from scipy/numpy/pytorch/etc."

westurner · on Sept 18, 2020

You can get a free DOI for and archive a tag of a Git repo with FigShare or Zenodo.

If you have repo2docker REES dependency scripts (requirements.txt, environment.yml, postInstall,) in your repo, a BinderHub like https://mybinder.org can build and cache a container image and launch a (free) instance in a k8s cloud.

Journals haven't yet integrated with BinderHub.

Putting the suggested citation and DOI URI/URL in your README and cataloging citations in an e.g. wiki page may increase the crucial frequency of citation.

A Linked Data format for presenting well-formed arguments with #StructuredPremises would help to realize the potential of the web as a graph of resources which may satisfy formal inclusion criteria for #LinkedMetaAnalyses.

cycomanic · on Sept 18, 2020

The issue is that none of the citation count engines (Google scholar, scopus, Web of Science...) count citations on those DOIs. So for a researcher who needs to somehow demonstrate impact through citation counts, it does not really help unfortunately.

westurner · on Sept 18, 2020

We could reason about sites that index https://schema.org/ScholarlyArticle according to our own and others' observations. Google Scholar, Semantic Scholar, and Meta all index Scholarly Articles: they copy the bibliographic metadata and the abstract for archival and schoarly purposes.

AFAIU, e.g. Zotero and Mendeley do not crawl and index articles or attempt to parse bibliographic citations from the astounding plethora of citation styles [citationstyles, citationstyles_stylerepo] into a citation graph suitable for representative metrics [zenodo_newmetrics].

bitcoin.org/bitcoin.pdf does not have a DOI, does not have an ORCID [orcid], and is not published in any journal but is indexed by e.g. Google Scholar; though there are apparently multiple records referring to a ScholarlyArticle with the same name and author. Something like "Hell's Angels" (1930)? No DOI, no ORCID, no parseable PDF structure: not indexed.

AFAIU, Google Scholar does not yet index ScholarlyArticle (or SoftwareApplication < CreativeWork) bibliographic metadata. GScholar indexes an older set of bibliographic metadata from HTML <meta> tags and also attempts to parse PDFs. [gscholar_inclusion]

Google Scholar is also not (yet?) integrated with Google Dataset Search (which indexes https://schema.org/Dataset metadata).

FigShare DOIs and Zenodo DOIs are DataCite DOIs [figshare_howtocite, zenodo_principles]; which apparently aren't (yet?) all indexed by Google Scholar [rescience_gscholar].

IIUC, all papers uploaded to https://arxiv.org are indexed by Google Scholar. In order for arxiv-vanity.org [arxiv_vanity] to render a mobile-ready, font-resizeable HTML5 version of a paper uploaded to ArXiV, the PostScript source must be uploaded. Arxiv hosts certain categories of ScholarlyArticles.

JOSS (Journal of Open Source Software) has managed to get articles indexed by Google Scholar [rescience_gscholar]. They publish their costs [joss_costs]: $275 Crossref membership, DOIs: $1/paper:

> Assuming a publication rate of 200 papers per year this works out at ~$4.75 per paper

[citationstyles]: https://citationstyles.org

[citationstyles_stylerepo]: https://github.com/citation-style-language/styles

[gscholar_inclusion]: https://scholar.google.com/intl/en/scholar/inclusion.html#in...

[figshare_howtocite]: https://knowledge.figshare.com/articles/item/how-to-share-ci...

[zenodo_principles]: https://about.zenodo.org/principles/

[zenodo_newmetrics]: https://www.frontiersin.org/articles/10.3389/frma.2017.00013...

[rescience_gscholar]: https://github.com/ReScience/ReScience/issues/38

[arxiv_vanity]: https://www.arxiv-vanity.com/

[joss_costs]: https://joss.theoj.org/about#costs

[orcid]: https://en.wikipedia.org/wiki/ORCID

MayeulC · on Sept 18, 2020

Owning to the distributed nature of git, and the properties of the hashes it uses, it is probably enough to put a full commit id in a paper to securely reference a software project, regardless of its hosting platform.

We'd just need a dedicated search engine, and a way to automatically extract those from papers, to clone and archive repos.

spappal · on Sept 18, 2020

> the properties of the hashes [g]it uses

Git uses SHA-1, a hardened version since 2017, and are now doing per-repo upgrades to SHA-256 [0]. Lots of repos are presumably still on SHA-1 (and users on older versions of git).

As of 2020, chosen-prefix attacks against SHA-1 are now practical. [verbatim from 1] But I don't think second preimage attacks are practical yet.

Linus Torvalds argued in 2006 basically that it's irrelevant whether git's hash function is second preimage resistant. Selective quoting:

> remember that the git model is that you should primarily trust only your _own_ repository [2]

> [a malicious] collision is entirely a non-issue: you'll get a "bad" repository that is different from what the attacker intended, but since you'll never actually use his colliding object, it's _literally_ no different from the attacker just not having found a collision at all [2]

All that is just to say: git originally chose its hashes for the above mentioned "git model", thus didn't 100 % care about second preimage resistance. For your suggested search engine, depending on how the database is collected you might not be able to trust "your own repository" (if it's crowdsourced I could register another codebase with the same hash as Linux). A second preimage resistant hash function would be a requirement for the suggested use case.

[0]: https://git-scm.com/docs/hash-function-transition/

[1]: https://en.wikipedia.org/wiki/SHA-1#cite_ref-8

[2]: https://marc.info/?l=git&m=115678778717621&w=2