On Tue, Aug 11, 2009 at 08:56:56PM +0200, Jens M Andreasen wrote:
Say warp A (or "process" A) must do four
smaller workloads while warp B
is doing one bigger workload? The way to go would then be for warp B to
call __syncthreads() when 25% of its work is done, thus assuring that
warp A will be given all of GPU untill it has catched up at the end of
its first workload and also calls __synthreads(), which gives warp B the
green light to continue. This under the assumption that warp A hasn't
already done it's part and is waiting for B to catch up.
Repeat the procedure at 50% and 75%.
It's not that simple...
An example: the santalucia reverb with period size = 256,
mininum partition size = 256.
There will be 3 partitions of 256 frames, 6 of 512, 6 of 2048
and 24 of 8192 frames.
The first will be done in jack's callback and the others at
lower priorities, resp. -1,-2,-3 relative to jack's thread (*).
Each of these threads is triggered into action with a period
equal to its partition size, and the work is expected to be
ready at the next trigger.
That means that the thread that does the size 512 ones must
have priority over all the others. Except in some simple
cases it's almost impossible to divide the work done by e.g.
the slowest thread into equal parts. If that were possible we
wouldn't need multiple threads at all - the evenly divided
workload could be done in jack's callback without problem.
As things are, pre-emption seems to be the only solution.
I found a measure of ~1 sec for a 128K FFT on a PPro
@200
Would that be helpful for a guesstimate?
No. The largest FFT used by jconv is 16k, it doesn't pay to
increase that size. Depending on the configuration the FFTs
could dominate the workload (many short 1-to-1 convolutions)
or the MAC operartions would take most of the time (long IRs
and/or a dense matrix). And then there's the cache size that
will have all sorts of complicated effects.
(*) From the next release that would be -1,-3,-5, i.e. one
less for each doubling of the size. This allows multiple
jconvs to be scheduled fairly even it they don't use the
same set of sizes. Other apps doing similar things should
use the same system if they are supposed to work together
well.
--
FA
Io lo dico sempre: l'Italia รจ troppo stretta e lunga.