On Mon, May 11, 2009 at 11:21 AM, Jens M Andreasen
<jens.andreasen(a)comhem.se> wrote:
On Mon, 2009-05-11 at 10:23 -0400, Paul Davis wrote:
1) the question is now how to fit a single set of
N samples into cache
memory. Its how to fit *all* the samples to be processed in a given
"cycle" into cache memory. Wasting 25% of cache memory for each buffer
isn't conducive to this.
If 96 frames are enough for stability (and say 64 isn't), then sample 96
- 127 in a 128 frame buffer are a waste of memory anyway and only adds
to latency.
sometimes there is a tradeoff between latency and CPU cycles. live
recording often tilts towards less CPU cycles and more latency.
It may even be so that a set of shorter buffers that
are only partially
aligned - but allocated as one continous area - may have a greater
chance of fitting into available cache, without trashing other important
data.
the point of making things cache aligned relates to SSE(2). the point
of making them fit in the cache relates to overall throughput. not
quite the same thing.