Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A few comments:

- My understanding is that a gamma chirp is the established filter to use for an auditory filter bank--any reason you choose an elliptical filter instead?

- I didn't look too closely, but it seems like you are analyzing the output of the filter bank as real numbers. I highly recommend you convolve with a complex representation of the filter and keep all of the math in the complex domain until you collapse to loudness.

- I'd not bucket to discrete 100hz time slices, instead just convolve the temporal masking function with the full time resolution of the filter bank output.

- You want to think about some volume normalization step that would give the final minimized Zimtohrli distance metric between A and B*x, where x is a free variable for volume. Otherwise, a perceptual codec that just tends to make things a bit quieter might get a bad score.

- For fletcher munson, I assume you are just using a curve at a high-ish volume? If so, good :)

- Not sure how you are spacing filter bank center frequencies relative to ERB size, but I'd recommend oversampling by a factor of 2-3. (That is, a few filters per ERB).

Apologies if any of these are off base--I just took a quick look.



This man codecs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: