non-st_blksize
sized blocks will be absolutley swamped, by disk
latencies, cache latencies, scheduling latencies and file
decoding overhead.
your measurements would be so swamped with noise from other
factors that any differences would be statistically irrelevant.
If you could explain how any CPU load measurement ever devised by
mankind could be "swamped" (or in fact influenced at all) by a latency
factor, then yes, I might have missed your point... Load and latency
are orthogonal issues, aren't they?
The decoding overhead -- I don't know. That *does* compete for CPU
load indeed. The balance depends on the CPU and on the OS I guess.
Remember there are ARMs, Blackfins with ucLinux, OS's and pieces of
hardware yet to be invented etc. The people who write OS's for these
platform usually make choices based on what standards (like POSIX)
mandate, if they have an implementation choice. I have written a POSIX
layer for AD's VDK (a real-time kernel) in the past, and I've
certainly made my choices based on what the standards said.
So, when in doubt, I think one should heed the advice of standards and
common practice (like using fread and fwrite, not read and write), or
else address the issues that arise from breaking the rules (i.e.
provide a VIO layer, like you did, or cache non-block-sized reads in
userspace).
Admittedly, for PC users these are somewhat academic distinctions...
-- Dan