Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A better title for the article might have been,

“Why current deep learning is not general enough for AGI or some other families of structured problems.”

Viewed this way, it’s not a criticism of pragmatically using deep learning or experimenting with deep learning for some narrow tasks.

Rather, it expresses what aspects of current deep learning make it unsuitable for general transfer learning, hierarchical or causal inference, or many Bayesian techniques requiring greater use of priors.

Other comments have pointed out that there are plausible rebuttals in deep reinforcement learning and metalearning.

But the bigger thing to me is to be clear that the article is not a criticism of deep learning engineering — applying deep learning to satisfy an explicit requirement, where the success criteria possibly has nothing at all to do with general intelligence nor with whether a certain approach can span some sufficiently large class of general models.

However, even if you just constrain your view to look just at so-called “pragmatic” deep learning — deep learning for concrete tasks — there are still a lot of unanswered questions about why things work, and whether or not an approach is learning semantic aspects of some true underlying structure (some latent variable space that captures the true data generating process) or if deep learning merely allows overfitting to particular populations of observation-space statistics.

This paper gives an example of exactly this issue for CNN models for image processing [0]. I’d argue that this is more of the kind of criticism relevant for task-oriented practitioners, whereas the OP link is some criticism more relevant for AGI or philosophy of statistics research at large.

[0]: < https://arxiv.org/pdf/1711.11561.pdf >



Adversarially robust classifiers have interpretable gradients and feature representations [0]. The problem seems to be that the standard networks capture all the statistics there is, including surface statistical regularities and noise. It can be mitigated, though.

[0]: https://arxiv.org/abs/1805.12152


Thanks for the link!

Note that it’s not just adversarial robustness that is a problem, as the paper’s result merely with altering surface statistics with a Fourier domain filter applied to the training data already creates problems for interpreting the network’s internal representation as having any type of semantic representation of the underlying structure relevant for the task, without needing to involve the part about adversarial robustness at all.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: