On Sat, Mar 28, 2015 at 5:32 PM, Andy Walls <andy@silverblocksystems.net> wrote:
When testing, I used 5 float streams rumning at over 150 Msps each, with 15 microsecomd bursts of 50 MHz at about 10 microseconds apart. I used enough x points to see two bursts on the gui. Normal trigger. (Free or auto trigger moght be too taxing.)
-Regards
Andy
Andy, if you have a chance, can you check out this new branch:
It adds the fixes that we talked about. I just want to verify that things are still looking and behaving well for you.
The other trick of this branch is if you go into the QT GUI Time Sink properties and turn "Control Panel" to Yes. I wouldn't mind a quick bit of feedback there, either.
Tom
On March 28, 2015 8:06:08 PM EDT, Tom Rondeau <tom@trondeau.com> wrote:On Sat, Mar 28, 2015 at 12:50 PM, Andy Walls <andy@silverblocksystems.net> wrote:On Sat, 2015-03-28 at 14:45 -0400, Andy Walls wrote:
> Hi Tom:
>
>
> On Sat, 2015-03-28 at 11:12 -0700, Tom Rondeau wrote:
> > On Sat, Mar 28, 2015 at 11:00 AM, Andy Walls
> > <andy@silverblocksystems.net> wrote:
>
> > Can this memmove() be safely skipped
> >
> > https://github.com/gnuradio/gnuradio/blob/master/gr-qtgui/lib/time_sink_f_impl.cc#L627
> [snip]
> > The volk_32f_convert_64f_u_avx() call is unavoidable as Qwt
> > wants
> > doubles for plotting and not floats. But it might also be able
> > to be
> > deferred to the very end when the decision to plot is known
> > for sure.
> > (But that's more surgery than I care to take on at the
> > moment.)
>
>
> > But thinking about the volk convert function, that's both copying the
> > data from the input buffer into the internal buffer as well as
> > performing the conversion. We can't just hold data in the input since
> > we don't want to back up the data until we're ready to plot both with
> > timing and with a full enough buffer -- it's just sampling a section
> > at a time and drops everything in between.
>
> Right.
>
> > That part could be converted into a memcpy instead of the volk
> > convert. Then, when we're ready to plot, we call the volk convert that
> > also does the move from d_start to 0, so it combines those two
> > elements.
>
> Yeah, that's the surgery part. :) It would require adding a new set of
> buffers to hold floats objects, and then convert them when a
> determination to plot was made.
>
> This also affects the memmove() of the tail for the trigger delay. It
> would operate on the new set of float buffers (vs the buffers holding
> doubles).
>
> > Thoughts on those proposals?
Your proposal for implementing memcpy() and deferring volk_*() to do the
conversion and "memmove" in one step is great! :)
I just implemented it, and the time_sink_f thread has gone from 41.5%
CPU down to 29.1% CPU in my tests. :) memcpy() now dominates the
thread, but that's to be expected.
With my initial hack:
> CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated)
> Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 100000
> samples % image name symbol name
> 78158 39.0737 libvolk.so.0.0.0 volk_32f_convert_64f_u_avx
> 22777 11.3870 no-vmlinux /no-vmlinux
> 13972 6.9851 libgnuradio-qtgui-3.7.7git.so.0.0.0 gr::qtgui::time_sink_f_impl::_test_trigger_slope(float const*) const
> 7781 3.8900 libgnuradio-qtgui-3.7.7git.so.0.0.0 gr::qtgui::time_sink_f_impl::_test_trigger_norm(int, std::vector<void const*, std::allocator<void const*> >)
> 7236 3.6175 libpthread-2.18.so pthread_mutex_lock
> 6163 3.0811 libgnuradio-runtime-3.7.7git.so.0.0.0 boost::detail::sp_counted_base::release()
> 5942 2.9706 libpthread-2.18.so pthread_mutex_unlock
> 4947 2.4732 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_executor::run_one_iteration()
> 3826 1.9127 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_detail::input(unsigned int)
> 3555 1.7773 libstdc++.so.6.0.19 /usr/lib64/libstdc++.so.6.0.19
> 3206 1.6028 libc-2.18.so __memmove_ssse3_back
> [...]
With my implementation of your suggestion:
CPU: Intel Sandy Bridge microarchitecture, speed 3.5e+06 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Clock cycles when not halted) with a unit mask of 0x00 (No unit mask) count 90000
samples % image name symbol name
27595 35.6051 libc-2.18.so __memcpy_sse2_unaligned
12225 15.7736 no-vmlinux /no-vmlinux
4051 5.2269 libpthread-2.18.so pthread_mutex_lock
3739 4.8243 libgnuradio-runtime-3.7.7git.so.0.0.0 boost::detail::sp_counted_base::release()
3362 4.3379 libpthread-2.18.so pthread_mutex_unlock
2876 3.7108 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_executor::run_one_iteration()
2364 3.0502 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::block_detail::input(unsigned int)
2091 2.6980 libstdc++.so.6.0.19 /usr/lib64/libstdc++.so.6.0.19
1388 1.7909 libgnuradio-runtime-3.7.7git.so.0.0.0 gr::tpb_detail::notify_upstream(gr::block_detail*)
1138 1.4683 libc-2.18.so __memmove_ssse3_back
[...]
2 0.0026 libvolk.so.0.0.0 __volk_32f_convert_64f_d
[...]
1 0.0013 libvolk.so.0.0.0 volk_32f_convert_64f_a_avx
Regards,
AndyAndy,Excellent!I've got a few other minor patches for some things, I'll put this in there to and test on my end as well.Tom
No comments:
Post a Comment