Well the API calls worked perfectly. The LLM didn’t misinterpret that. The data ...

pesus · 2025-03-28T04:39:11 1743136751

How do you know a summary of a podcast you haven't listened to is accurate?

saaaaaam · 2025-03-28T06:25:04 1743143104

Firstly I am not summarising the podcast, simply using whisper to make a transcript.

T even if I was, because I do this multiple times a day and have been for quite sone time I know how to check for errors.

One part of that is a “fact check” built into the prompt, another part is feeding the results of that prompt back into the API with a second prompt and the source material and asking it to verify that the output of the first prompt is accurate.

However the level of hallucination has dropped massively over time, and when you’re using LLMs all the time you quickly become attuned to what’s likely to cause them and how to mitigate them.

I don’t mean this in an unpleasant way but this question - and many of the other comments responding to my initial description of how I use LLMs - feel like the story is things that people who have slightly hand wavey experience of LLMs think, having played with the free version of ChatGPT back in the day.

Claude 3.7 is far removed from ChatGPT at launch, and even now ChatGPT feels like a consumer facing procure while Claude 3.7 feels like a professional tool.

And when you couple that with detailed tried and tested prompts via the api in a multistage process, it is incredibly powerful.