On Tue, 2009-08-11 at 22:15 +0200, Fons Adriaensen wrote:
..
An example: the santalucia reverb with period size = 256,
mininum partition size = 256.
There will be 3 partitions of 256 frames, 6 of 512, 6 of 2048
and 24 of 8192 frames.
Say periodsize = 128 instead. Then we would have thread A: 6 partitions
of 128, B: 6 of 512, C: 6 of 2K and finally D: 24 of 8K
The first will be done in jack's callback and the
others at
lower priorities, resp. -1,-2,-3 relative to jack's thread (*).
Each of these threads is triggered into action with a period
equal to its partition size, and the work is expected to be
ready at the next trigger.
Say at time 16K.
Thread D would then have just gotten the last 128 samples it needs to
start doing the FFT on the data between 8K and 16K
This needs not be done untill time 24K when it OTOH /must/ have been
completed. The kernel* will be called 64 times between now and then, so
if D does 1/64th of its job now it can save its state and go to sleep.
Are you saying that it in the specific case is impossible to know how
much of an 8K FFT equals 1/64th?
[*] Here "kernel" means the CUDA-program running and not the filter
kernel nor the Linux kernel. It's confusing ...
That means that the thread that does the size 512 ones
must
have priority over all the others. Except in some simple
cases it's almost impossible to divide the work done by e.g.
the slowest thread into equal parts. If that were possible we
wouldn't need multiple threads at all - the evenly divided
workload could be done in jack's callback without problem.
As things are, pre-emption seems to be the only solution.