The selector is stored as a CSS path and matched against the fetched HTML on each check — so as long as the element's structure and nesting stay roughly the same, minor layout changes don't usually break it.
The fragile cases are sites that generate class names on every build (React/webpack/vite apps often do this) — those selectors will just stop working.
For semantic elements like price tags, availability text, or content blocks, they tend to be stable enough that it's not a real problem day-to-day. And if a filter stops matching entirely, the watch flags with error message it rather than silently giving you empty diffs.
Yeah, semantic anchors are definitely the right direction — [data-testid], aria-label, or text proximity tend to survive rebuilds much better than class paths.
The picker leans towards CSS right now but that's something I want to improve.
The harder problem is auth-gated content — Instagram feeds, dashboards, paywalled pages. Browser Steps handles it today (you can script login flows), but honestly I think the real fix is AI-assisted interaction. A small cheap model that can find what you care about without needing a brittle selector at all. That's where I want to take this — less "maintain a CSS path", more "here's what I'm interested in, figure it out...
Yes! Exactly this direction — the hybrid fetch is already live: plain HTTP first, Chromium if the content looks off. LLM semantic targeting is the next step, but only triggered when a selector breaks, not on every check — too expensive otherwise.
Not stupid at all — the docs were missing an RSS page, which is on me. I've just added one: https://docs.sitespy.app/docs/dashboard/rss. RSS feeds are available per watch, per tag, or across all watches from the dashboard. Thanks for flagging it, this is exactly the kind of feedback that helps
Fair point, and I should have been upfront about this earlier. The backend is a fork of changedetection.io. I've built on top of it — added the browser extension workflow, element picker, billing, auth, notifications, and other things — but the core detection engine comes from their project. That should have been clearly attributed from the start, and I'll add it to the docs and about page.
changedetection.io is a genuinely great project. What I'm trying to build on top of it is the browser-first UX layer and hosted product that makes it easier for non-technical users to get value from it without self-hosting and AI focus approach
Yeah, that was a real bug — CSS transitions on the body were blocking the thread during theme switches. I pushed a fix for it earlier today. Should be smooth now, but let me know if you still see it
Site Spy keeps snapshot history, so you can revisit older versions of a page and inspect how it changed over time, not just get the latest alert. I’d describe it more as monitoring with retained history than as a dedicated public archive, but deeper archival integrations are definitely something I’ve thought about
German laws and bureaucracy pages are exactly the kind of thing where tracking one specific part of a page is much more useful than watching the whole page. And yeah, more control over check frequency makes a lot of sense if monthly checks are enough and rate limits are the main problem. I’d be curious what kind of schedule would work best for you there?
Monthly is fine, but not monthly all at once, because I watch multiple pages on one website, and that triggers the rate limiting.
The ideal pipeline for me would be "notice a change in a specific part of a page, use a very small LLM to extract a value or answer a question, update a constant in a file and make a pull request".
I've been thinking about this pipeline for a long time because my work depends on it, but nothing like it seems to exist yet. I'll probably write my own, but I just can't find the time.
You can already work around the rate-limit issue today — there's a global minimum recheck interval in Settings that spreads checks out across time. Not per-site throttling yet, but it prevents one domain from getting hit too many times at once.
The pipeline you described — detect a change, extract a value with a small LLM, open a PR — is pretty much exactly what the MCP server is designed for. Connect Site Spy to Claude or Cursor, and when a specific part of a page changes, the agent can handle the extraction and PR automatically. I don't think anyone has wired up that exact flow yet, but all the pieces exist.
Thanks, that’s a really good question. Site Spy uses a real browser flow, so it generally handles JS-rendered pages much better than simple HTML-only polling tools. In practice, the trickier cases tend to be sites with aggressive anti-bot protection or messy login/session flows rather than JS itself. I’m trying to make those limitations clearer so people don’t just hit a vague failure and feel let down
Curious how you're thinking about getting around anti-bot protection. I scrape a lot and I've noticed many highly trafficked sites investing in anti-bot measures recently, with the rise of AI browsers and such. Still, cool idea, congrats on the launch.
I'm planning to add proxy rotation across different regions to help with geo-restricted content and rate limiting. Anti-bot is an arms race though — some sites just can't be monitored without solving a captcha, which isn't something I'm trying to do. Focused on making the common cases work well rather than promising to bypass everything.
Yep, urlwatch is a good one too. This category clearly has a strong self-hosted tradition. With Site Spy, what I’m trying to make much easier is the browser-first flow: pick the exact part of a page visually, then follow changes through diffs, history, RSS, and alerts with very little setup
That’s a completely fair concern. Services in this category do need to earn trust over time. I built the backend to handle a fair amount of traffic, so I’m not too worried about growth on that side. My goal is definitely to keep this running for the long term, not treat it like a one-off project
The fragile cases are sites that generate class names on every build (React/webpack/vite apps often do this) — those selectors will just stop working.
For semantic elements like price tags, availability text, or content blocks, they tend to be stable enough that it's not a real problem day-to-day. And if a filter stops matching entirely, the watch flags with error message it rather than silently giving you empty diffs.