On Mon, Feb 08, 2010 at 10:01:03PM +0100, Emanuel Rumpf wrote:
Actually OpenCL seems to already be supported, (in new
hardware only ?),
by the large graphic players:
http://www.nvidia.com/object/cuda_opencl.html (there's a lot of docu
here too !)
http://developer.amd.com/gpu/atistreamsdk/pages/default.aspx
http://www.khronos.org/opencl/
I am aware of Nvidia and ATI binary drivers supporting OpenCL. What I
do not see is an ability to run this using the Nouvou, the RadeonHD xorg
driver, the intel xorg driver, or as SSE optimized software only.
If OpenCL DSP code would run on my desktop using CPUs only, faster than
a reasonable C implementation using primitives would run, then I'd
suggest it is ready for people to start using without holding out for
open source driver support. I don't think that we are close enough to
say that we will be there in 6-12 months yet though. If I am mistaken
about the state of the Khronos reference implementation or the clang
implementation, I'd love to see a demo prove me wrong.
There's also the "CT Technology" from
Intel (aquired from RapidMind),
which is claimed to be easier to code and less hardware-dependant than OpenCL.
http://software.intel.com/en-us/data-parallel/
http://software.intel.com/sites/products/collateral/hpc/ct/ct_newsletter110…
I am passingly aware of them, but as long as there is not an open
language specification, and an open implementation, I don't think they
are a real answer either.
however, I
don't believe that the OpenCL language is as nice or easy to
use as Cuda. ?Is there any work on an open source clone of Cuda?
A link to a short Intro. The code looks acceptable at a first glance:
http://ati.amd.com/technology/streamcomputing/intro_opencl.html
Just because I don't like it as well does not mean I won't accept it as
a valid choice.
However, the examples there still look a lot more complicated that the
cuda examples. The page you posts included a fairly minimal OpenCL
example that is close to 40 lines long without showing the actual source
code for the actual OpenCL function to execute on the card.
_global__ void inc_gpu(int *a, int N)
{
int idx = blockIdx.x * blockDim.x + threadIdx.x;
if (idx < N)
a[idx] = a[idx] + 1;
} }
int main()
{
float *a;
dim3 dimBlock (blocksize);
dim3 dimGrid( ceil( N / (float)blocksize) );
cudaMalloc((void **) &a, N);
cudaMemcpy(a, someLocalSource, N*sizeof(float), cudaMemcpyHostToDevice);
inc_gpu<<<dimGrid, dimBlock>>>(a, N);
cudaFree(a);
}
The __global__ says that it will be called from the host but run on the
device. Calling the function uses a modified call syntax to setup the
arrangement of hardware resources that will be used by the call, but the
number of units to use, and the block size to carve the data by. dim3 is
a cuda defined datatype.
Still, if OpenCL shows up in Mesa, and can at a minimum run on the CPU
faster than a basic C implementation, then I would certainly choose it
over Cuda for open-ness.
I still somehow wonder, how cheap low MHz Chips can
outperform my
GHz PC system. The answer seems to be parallelism of computations,
which the GPUs (or DSPs) support better than CPUs.
This also means, that faster CPUs won't necessarily bring
the effect we are waiting for.
(To play many software effects/instruments in an acceptable time.)
Well, I bet a lot of software isn't wringing as much performance for
your PC as it could. When it comes to music though, exactly what low
mhz chips are beating a 2ghz Core 2 Duo or equivelent AMD? Also, how
does the cost of said low mhz device compare to the Core 2 Duo?
However, even in PCs, the trend is more cores, and either programmers
need to get better, or else we need better tools. Now, I certainly plan
to become a better programmer, but I still want better tools, and I
think it is unfair to expect everyone to get better.