Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here's the IndexNow standard that CloudFlare Crawler Hints is using:

https://www.indexnow.org/

The idea is that you can push a notification to a search engine when content changes instead of waiting for the crawler to notice.

https://<searchengine>/indexnow?url=url-changed&key=your-key

You can also submit more than one URL with a POST.

You can notify Bing at https://www.bing.com/indexnow?url=url-changed&key=your-key

If you notify the IndexNow API endpoint it notifies Bing plus other search engines on your behalf:

https://api.indexnow.org/indexnow?url=url-changed&key=your-k...

This announcement is about how CloudFlare can now do this automatically for sites it hosts.

Some other hosts and CDNs support IndexNow, eg Akamai. See: https://blogs.bing.com/webmaster/october-2021/IndexNow-Insta...



Why would they not use the /.well-known/ prefix for the default index now key?

The default being at the root seems… stupid.


Not defending the standard, but I guess since this is a shared secret you don't want to put it at a well known location. There's a (slight) attack vector from having an attacker know the secret, since they can "launch" a crawl against a site. Maybe could get a crawler to access private URLs or something?

Another interesting feature I saw in the standard is that you can host keys in subdirectories too.

"the location of a key file determines the set of URLs that can be included with this key. A key file located at http://example.com/catalog/key12457EDd.txt can include any URLs starting with http://example.com/catalog/ but cannot include URLs starting with http://example.com/help/."


This wouldn’t be in a place like “.well-known/secret-key”, the key would still be part of the path. It’s just a well known prefix to put exactly this kind of thing.


I was going to say that well-known is only for stable paths, where you want to avoid collisions, not for paths with random keys... but you're right:

https://www.rfc-editor.org/rfc/rfc8615#section-3

   Registrations MAY also contain additional information, such as the
   syntax of additional path components, query strings, and/or fragment
   identifiers to be appended to the well-known URI, or protocol-
   specific details (e.g., HTTP [RFC7231] method handling).
So it could be: /.well-known/index-now/<key>

IndexNow would need to change the semantics of how they handle directories, as a key authorises only subdirectories.

I also notice there is an option for changing the filename of the IndexNow key file, but there is less flexibility about the directory it's hosted:

    https://<searchengine>/indexnow?url=http://www.example.com/product.html&key=af4c4e043c7d42afad6bdeeda948527d&keyLocation=http://www.example.com/myIndexNowKey63638.txt
This seems like a potential vulnerability as if an attacker knows a text file path that contains a known (hex?) string it looks they could use it as a key?


So it’s an alternative to sitemaps, documents that can list incrementally all the web pages of a website with their last modification date-time, but with a push model instead of a pull model?

https://www.sitemaps.org/


Yes, but of course only for the search engines that support it. Eg, Google is absent from the list, although they've already had a push/ping sitemap feature.


I thought so too, but search engines supporting IndexNow are supposed to tell each other about new pages, and there's a bit more of control since you can send just the urls you want instead of an entire sitemap.


If it was widely used it's win-win. More timely updates and less traffic crawling pages which haven't changed. That's if it ever becomes a reliable signal of page changes.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: