There is literally no other way to apply machine learning to such a large proble...

There is literally no other way to apply machine learning to such a large problem. The system must have training data to learn from; as you expand the scope of your ML system, you must expand the scope of your training data.

Every large commercial ML system works this way. Google for years has paid people to manually score and categorize search results pages, to provide training data for ML web search systems.

It would have been better if companies were up-front about this when talking about "artificial intelligence" in their voice products, so it doesn't look like a nasty surprise.

And in the long run, everyone will benefit from some consistent regulations on ML data scoring, like HIPAA for health care (another industry where for-profit companies handle very sensitive customer data).