* Lee Revell <rlrevell(a)joe-job.com> wrote:
jackd was running in the background in both cases.
With 1024KB, there
were massive XRUNS, and worse, occasionally the soundcard interrupt
was completely lost for tens of milliseconds. This is what I would
expect if huge SG lists are being built in hardirq context. With
16KB, jackd ran perfectly, the highest latency I was was about 100
usecs.
Kernel is 2.6.8-rc2 + voluntary-preempt-I4. CPU is 600Mhz, 512MB RAM.
ok, i'll put in a tunable for the sg size.
Btw., it's not really the building of the SG list that is expensive,
it's the completion activity that is expensive since e.g. in the case of
ext3 IO traffic it goes over _every single_ sg entry with the following
fat codepath:
__end_that_request_first()
bio_endio()
end_bio_bh_io_sync()
journal_end_buffer_io_sync()
unlock_buffer()
wake_up_buffer()
bio_put()
bio_destructor()
mempool_free()
mempool_free_slab()
kmem_cache_free()
mempool_free()
mempool_free_slab()
kmem_cache_free()
the buffer_head, the bio and bio->bi_io_vec all lie on different
cachelines and are very likely to be not cached after long IO latencies.
So we eat at least 3 big cachemisses, times 256.
Jens, one solution would be to make BIO completion a softirq - like SCSI
does. That makes the latencies much easier to control.
Another thing would be to create a compound structure for bio and
[typical sizes of] bio->bi_io_vec and free them as one entity, this
would get rid of one of the cachemisses. (there cannot be a 3-way
compound structure that includes the bh too because the bh is freed
later on by ext3.)
Ingo