On Wed, Aug 12, 2009 at 09:59:50AM +0200, Jens M Andreasen wrote:
Say periodsize = 128 instead. Then we would have
thread A: 6 partitions
of 128, B: 6 of 512, C: 6 of 2K and finally D: 24 of 8K
Actually 7,6,6,24.
Say at time 16K.
Thread D would then have just gotten the last 128 samples it needs to
start doing the FFT on the data between 8K and 16K
This needs not be done untill time 24K when it OTOH /must/ have been
completed. The kernel* will be called 64 times between now and then, so
if D does 1/64th of its job now it can save its state and go to sleep.
Are you saying that it in the specific case is impossible to know how
much of an 8K FFT equals 1/64th?
It would be 16K FFT, and there would be a lot more than
just that.
This is an outline of the process() function for one
partition size:
proces()
{
for (loop over inputs)
{
if (input not active) continue
FFT
}
for (loop over outputs)
{
if (output not active) continue
for (loop over inputs)
{
if ((input,output) not active) continue
for (loop over partitions)
{
if (partition not empty)
{
MAC
}
}
}
IFFT
}
}
The loops over inputs and outputs and the continue condition
are actually inplemented by linked lists, but the code shown
is equivalent.
You could loop over the data structures, and make a list of
of all steps (FFT, MAC, IFFT) and their parameters. Then if
you know the relative cost of the three basic operations you
try to divide this in 64 equal parts.
There are some problems with this:
- The relative cost of FFT and MAC is not know with any
precision.
- There may be less than 64 steps.
- A single 16k FFT may already be too much, and most FFT
libraries don't allow you to divide it into smaller
pieces.
Again, if you *can* do the division, there is no need
to use multiple threads. you can just merge the lists
of steps for all partition sizes and execute this as
a static schedule using just function calls. This is
how jace worked.
Ciao,
--
FA
Io lo dico sempre: l'Italia รจ troppo stretta e lunga.