Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMO one of the killer use cases of GPT is reformatting information from any format X to any other format Y, and we're using this superpower in the relatively "boring" space of data extraction: https://kadoa.com can turn any website into an API.


I'm in the market for this. How do you get around scrape blockers? For example Target and Walmart are tricky to get the markup for even with services that specialize in this like scrapfly.

Do you guys have indie friendly pricing? I don't have $500/month to spend but could do $20/month.


For $20 you can get over 10k requests from scraping fish: https://scrapingfish.com/buy


Lol $20 won't buy you a turkey sandwich these days


https://scrapfly.io/pricing

https://scrapestack.com/product

https://www.scrapingbee.com/#pricing

There's quite a few products in this segment around that price range.


2023 In an airport

2024 In a cafe

2025 In a posh supermarket

2026 In any supermarket


I tried this with a textual list of oublic grilling spots in my city and it got it completely wrong, hallucinating with high deceiving confidence.

How do you prevent this?


We use LLMs to semantically understand the website and generate the scrapers code for it, not for the actual data extraction (which would be too expensive anyways). We also have checks in place to verify that the extracted data truly exists on the website.


Did you ask for grilling spots, or did you paste the grilling spots from some database and ask it to summarize/reformat?

Parent commenter was talking about the latter. If you did the former, it will hallucinate like crazy, of course.


Watch out / warn users about possible mistakes. With long numbers I've seen it modify them slightly.


love this! signing up




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: