YouTube would be a great source for spoken language. But only a tiny portion of YouTube has subtitles, and it doesn't yet feel like automatic transcription is at a level where you would want to use its output to train something else. That day will surely come though