Despite augmentation being used everywhere, most discussions are still very surface-level (“flip, rotate, color jitter”).
In this article I tried to go deeper and explain:
• The *two regimes of augmentation*: – in-distribution augmentation (simulate real variation) – out-of-distribution augmentation (regularization)
• Why *unrealistic augmentations can actually improve generalization*
• How augmentation relates to the *manifold hypothesis*
• When and why *Test-Time Augmentation (TTA)* helps
• Common *failure modes* (label corruption, over-augmentation)
• How to design a *baseline augmentation policy that actually works*
The guide is long but very practical — it includes concrete pipelines, examples, and debugging strategies.
Would love feedback from people working on real CV systems.
Link: https://medium.com/data-science-collective/what-is-image-aug...
Despite augmentation being used everywhere, most discussions are still very surface-level (“flip, rotate, color jitter”).
In this article I tried to go deeper and explain:
• The *two regimes of augmentation*: – in-distribution augmentation (simulate real variation) – out-of-distribution augmentation (regularization)
• Why *unrealistic augmentations can actually improve generalization*
• How augmentation relates to the *manifold hypothesis*
• When and why *Test-Time Augmentation (TTA)* helps
• Common *failure modes* (label corruption, over-augmentation)
• How to design a *baseline augmentation policy that actually works*
The guide is long but very practical — it includes concrete pipelines, examples, and debugging strategies.
Would love feedback from people working on real CV systems.
Link: https://medium.com/data-science-collective/what-is-image-aug...