i thought this was pretty insightful. vst-plugins has had a little
debate over whether or not VST should provide an integer sample
format. actually, not much of a debate. one person suggested it,
everybody else jumped on him. this post, however, is the most
detailed on why you should forget about thinking of float arithmetic
as slower than integer ...
--;p
------- Forwarded Message
Date: Sun, 15 Sep 2002 14:21:52 -0700
From: Vesa Norilo <vnorilo(a)siba.fi
To: <vst-plugins(a)lists.steinberg.net> (VST PlugIns)
Subject: [vst-plugins] Re: SDK buffers from host
Ah, good ol' Mike Abrash. But do yourself a favour and update your
knowledge: I'm pretty sure Mike has also done just that. BTW, I did
learn a huge amount from his Zen of Code Optimization, but almost
everything in it has become irrelevant by superscalar processing and
vector engines.
i think you
are seriously and dramatically underestimating the
incredible benefits provided by a standardized
I haven't count cpu instruction
cycles recently. But last time
I did, the difference between floating point operations and
integer operation was from 15x to 100x speed difference. I
think though with the last few generation of cpu's they have
significantly narrowed the gap.
On AMD 3DNow, all floating point instructions take 2 cycles, and two of
them can be run in parallel, meaning that a total of 4 floating point
additions or multiplies can be executed in 2 clock cycles. These also
include sine, cosine, reciprocal and square approximations. Can you beat
that with integer asm?
Again in my very limited audio processing experience
so far
I'm finding that I can apply many of the same asm techniques.
And I know things are changing with new hardware I guess it's
possible in the future that floating with equal or, I don't know how,
but I suppose could even exceed int speeds.
Since I have AMD, I'm fluent with 3DNow and not SSE. But from what I
know, SSE is very similiar in functionality and power to 3DNow, and SSE2
offers even more capabilities.
To beat the vector engines, you would have to go MMX. This is for two
reasons: the saturating adds would eliminate the need of overflow
checking, and due to its SIMD nature it can match or outdo 3DNow/SSE.
Unless you are willing to drop to 16 bits (so that MMX can do 4
operations per instruction) it will not.
But when you get right down the heart of the matter,
the
computer really is only doing bit manipulations anyway -
floating point or int. The difference with IEEE floating is the fact
Which is why integer bit mask operations are still usable with floats,
making many neat tricks possible. Such as really fast logarithms, which
is again impossible with ints unless you use lookup tables (which you
don't want to do for 32 bit resolution)
that the cpu has to decode the register to get at the
exponent.
And that is going to add extra clock cycles. 2 sign bits have
So, if an instruction that performs two floating point multiplies costs
2 cycles (and can run parallel with another such), each floating point
multiply will consume 0.5 cycles. I don't really see any "extra cycles"
here. Do you?
Still the ideal speed approach is a flat bit array
with no decoding
and a big register, big enough to handle that largest number
you will ever have to deal with.
That's very untrue. Consider how much memory speeds are limiting
processing speed. Processing 32 bit floats is faster than 64 bit ints
largely for that reason. The optimal speed approach is not simple. It
has to do with optimizing memory for cost/benefit, optimizing cache and
processor for speed, and trying to strike a balance.
Consider floats as a hardware accelerated compression method that saves
memory bandwidth.
What's wide enough for audio processing (64
bit???, 128 bit???).
At some point need for an exponent simply disappears.
And the need for a 4GHz bus appears.
But you all are right. In reality it seems my lack of
experience
has jump up and bit me in the ass. And I was only looking at
things from my self-absorbed world of the plug-in I'm trying
to build to do some specific things I personally would like to
accomplish.
That's nice, but to recapitulate, don't really talk about optimization
before you know several SIMD sets by heart. Nowadays, that's mandatory.
For the above reasons under optimized assembly
language
conditions, I just don't see how float speeds can exceed int
speeds. But considering all of the other points you all have
brought up my suggestion seems to have been short-sited.
If your most recent knowledge is of x87 floating point asm, I understand
this perfectly. However, a speed and assembly enthusiast should keep
himself updated.
Best regards,
Vesa
#############################################################
This message is sent to you because you are subscribed to
the mailing list <vst-plugins(a)lists.steinberg.net>et>.
To unsubscribe, E-mail to: <vst-plugins-off(a)lists.steinberg.net
To switch to the DIGEST mode, E-mail to
<vst-plugins-digest(a)lists.steinberg.net
To switch to the INDEX mode, E-mail to
<vst-plugins-index(a)lists.steinberg.net
Send administrative queries to postmaster(a)steinberg.de
------- End of Forwarded Message