Thanks Marcus! I do know what the root cause is in the OpenCL implementation of the poor performance. Maybe it'll help provide some background. (I've actually been working on the gr-clenabled GNURadio blocks [in pybombs now] OpenCL study I published a month or so ago for about 4 months). For OpenCL the massively parallel processing across a number of lower-throughput cores on data sets where the data can all be processed in parallel works well. For instance calculations such as a[i] = b[i] + c[i]. All calculations can be handled in parallel and the lower performance of each core is offset by having 10's or 100's running at the same time for a good throughput boost.
For calculations such as a Costas Loop where an error is calculated for each point then used in the next calculation, you can't run the calculations in parallel and they have to be done in order to get the right results. You can switch OpenCL to a task-parallel mode with a work set size of 1, but for GNURadio what it really amounts to because each block just gets 1 thread is running the same function on a single lower performance GPU core. In that case the single-core GPU performance is an order of magitude worse than a general CPU core for the same task.---------- Forwarded message ----------
From: Marcus Müller <marcus.mueller@ettus.com>
Date: Wed, Apr 26, 2017 at 7:31 AM
Subject: Re: [Discuss-gnuradio] OpenCL FPGA Recommendation?
To: discuss-gnuradio@gnu.org
Dear Ghost,
On 04/26/2017 01:01 PM, GhostOp14 wrote:
> I tested it as a single task in OpenCL on a GPU and the performance
> was horrible so I want to get the same algorithm running on an FPGA
> and see if the performance significantly improves.
Gut feeling: I wouldn't spend any money on an FPGA implementation before
I have not understood why it worked so terribly on GPU, and have a good
reason why it should work better on FPGA. Frankly, I don't think you
realize how hard it is to properly optimize things for specific
architectures, and OpenCL on an FPGA will not be easier to "get right"
than OpenCL on a GPU.
>
> Given some high-bandwidth goals, I'm actually thinking either USB 3.0
> or PCIe would be the requirement. I was looking at the Opal Kelly
> line like the one they have based on the Xilinx Artix-7. I actually
> think the USB 3.0 interface if I can transfer runtime data to/from it
> at USB 3.0 speeds would be more portable (say laptop/desktop). I'm
> still new to FPGA's so any other thoughts are much appreciated. It
> looks like I may still have to work in Vivado and build the FPGA code
> but then I could interface with it from C++ and a GNURadio block?
Probably! Don't know the FPGA manufacturer's OpenCL tools and whether
they offer an easy-to-use interface to PC software.
>
> Am I on the right track?
Don't know – again, I'd recommend going into a much deeper analysis of
why things work badly on your CPU and GPU, and why an FPGA should make
that better.
Best regards,
Marcus
_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio
From: Marcus Müller <marcus.mueller@ettus.com>
Date: Wed, Apr 26, 2017 at 7:31 AM
Subject: Re: [Discuss-gnuradio] OpenCL FPGA Recommendation?
To: discuss-gnuradio@gnu.org
Dear Ghost,
On 04/26/2017 01:01 PM, GhostOp14 wrote:
> I tested it as a single task in OpenCL on a GPU and the performance
> was horrible so I want to get the same algorithm running on an FPGA
> and see if the performance significantly improves.
Gut feeling: I wouldn't spend any money on an FPGA implementation before
I have not understood why it worked so terribly on GPU, and have a good
reason why it should work better on FPGA. Frankly, I don't think you
realize how hard it is to properly optimize things for specific
architectures, and OpenCL on an FPGA will not be easier to "get right"
than OpenCL on a GPU.
>
> Given some high-bandwidth goals, I'm actually thinking either USB 3.0
> or PCIe would be the requirement. I was looking at the Opal Kelly
> line like the one they have based on the Xilinx Artix-7. I actually
> think the USB 3.0 interface if I can transfer runtime data to/from it
> at USB 3.0 speeds would be more portable (say laptop/desktop). I'm
> still new to FPGA's so any other thoughts are much appreciated. It
> looks like I may still have to work in Vivado and build the FPGA code
> but then I could interface with it from C++ and a GNURadio block?
Probably! Don't know the FPGA manufacturer's OpenCL tools and whether
they offer an easy-to-use interface to PC software.
>
> Am I on the right track?
Don't know – again, I'd recommend going into a much deeper analysis of
why things work badly on your CPU and GPU, and why an FPGA should make
that better.
Best regards,
Marcus
______________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/
No comments:
Post a Comment