Thursday, December 17, 2015

Re: [Discuss-gnuradio] [VOLK] GPU acceleration -> OpenCL integration?

Hey,

You are completely right, that's the point. The matrix is of the size
1000x1000 and it is faster than the generic implementation above 500x500
(just a rough estimate). Most use-cases in gnuradio do not exploit this
case.

But, if you want to promote VOLK outside the gnuradio context, this
feature is quite unique. As far as I know, the SIMD support of OpenCL is
pretty bad (I talk of the CPU frontend) and VOLK could combine a proper
SIMD use with GPU acceleration.

Nevertheless, I think there are some efficient encoder/decoder
algorithms for GPUs, which could make use of such an integration.

Greetings
Stefan

On 12/17/2015 07:14 PM, Sylvain Munaut wrote:
> Hi,
>
>> RUN_VOLK_TESTS: volk_32f_x2_matrix_nxn_multiply_puppet_32f(1000000,10)
>> generic completed in 28482ms
>> a_opencl completed in 13364.3ms
>
> Question is how does that number change for smaller problem sizes ?
> And what would be the average problem size encountered in real env.
>
> For SIMD optimization the result of "who's the fastest" doesn't vary
> too much depending on problem size because they don't have much setup
> / teardown size.
> For OpenCL I very much doubt that would be the case and if you end up
> with an app making a lot of "smallish" (and given the default buffer
> size of GR, I feel the calls to volk aren't processing millions of
> samples at a time in a single call)
>
>
> Cheers,
>
> Sylvain
>

_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

No comments:

Post a Comment