Two algorithms of wide and likely lasting use that might be able to
benefit strongly from vectorisation come to my mind: sample-rate
conversion and time-stretching (the frequency-domain kind which
already benefits from FFTW efficiency but does a lot of costly
arithmetic on top of that). There's excellent code for both out there
but I think -- correct me if I'm wrong -- it's pretty much exclusively
scalar not vector/parallel processing.