captures's comments

captures · 2026-01-16T20:16:35 1768594595

The classification is surprisingly simple - k-nearest neighbors on a 27-dimensional feature vector extracted from each drawing.

The features: - Stroke count - Point density across 6 horizontal and 6 vertical bands (where is the ink?) - Direction histogram across 8 compass directions (which way are strokes going?) - Aspect ratio and total stroke length - First stroke start position, last stroke end position

The training set is ~64k hand-drawn samples from the original Detexify project. Each sample gets preprocessed and converted to this 27D vector. Classification is then just finding the k nearest training samples by Euclidean distance and returning the most common symbols among them.