Thursday, July 23, 2015

[Discuss-gnuradio] Data (was: Re: Run graph/ scheduler overhead)

I copied out the dc_block_cc block from 3.7.8 and ran some performance
tests against it, which I've summarized in a table below.

I had to make some modifications to the original code, such as:

* I removed the make wrapper.
* I tested against different containers.
* Different containers have different access/management methods
which meant some changes to code body (I tried to be consistent).
* On input I passed a std::vector to work() rather than complex*.
Although this changes the flavor of work() I figure it's relative.
* I only used long_form and deleted the short_form code. I used
the key part of the original code.

The three containers are the original std::deque then std::queue and
std::list. The results are interesting. I probably should have looked at
other containers such as std::vector but that might require recoding.

I also compiled with and without -std=c++11 because when i looked at
container source I saw a bunch of #ifdefs for >= c++0x.

These are some of the problems with the original dc_block:

* Passing by value rather than by reference.
* No inlines.
* const needed where const should be.

So in a second copy of dc_block I did those things. I found a case
(filter()) where it returns by value and I left that one alone.

The table below summarizes the results. "Old" means my reasonable(?)
facsimile of the original dc_block. "+c11" means I added -std=c++11 to
the compile line. "Opt" is my optimized copy of the code where I added
references, inlines, etc. "Special" is "opt" but with different compile
options. All of the output is included at the end of this message.

The numbers you'll see for old/c++1/etc is the amount of time it took to
process /one/ sample. In "old+deque" for example (the first item), it
took 701us to process a sample. One of the surprising numbers is that
std::list sucks. Also, when looking at the assembly language for
filter() (copy below) I see reallocs(). That's not surprising and
probably badness. (BTW, "CPLX" is: "typedef std::complex<float> CPLX;".)

inline const CPLX
moving_averager_c_list::filter( const CPLX& x ) {

d_out_d1 = d_out;
d_delay_line.push_back(x);
d_out = d_delay_line.front();
d_delay_line.pop_front();

CPLX y = x - d_out_d1 + d_out_d2;
d_out_d2 = y;

return (y / (float)(d_length));
}

The "size" numbers in the table are the text segment size returned using
"size a.out". The "block size" is simply a sizeof(d_delay_line), which
is really sizeof(std:deque<CPLX>) for example.

One other note. I compiled "special" with -Ofast and it failed content
integrity check. Probably a bad option to use. :)


My os: Ubuntu 15.04.
My compiler: gcc version 4.9.2 (Ubuntu 4.9.2-10ubuntu13)
My system: AMD FX(tm)-9590 Eight-Core Processor @ 4.7GHz

I'm happy to send copies of the test code (two files) for review if
someone wants to put them on the web. The three main code blocks are
pretty simple:

{ dc_blocker_cc_deque dc( NUM_ELEM );

std::cout << "deque:" << std::endl;

t_start = gr::high_res_timer_now();
for( int i = 0; i < NUM_LOOPS; ++i )
for( int j = 0; j < NUM_COMPLEX; ++j )
dc.work( data, dc_deque );
timing( t_start, gr::high_res_timer_now(), NUM_LOOPS*NUM_COMPLEX );

}


#define NUM_LOOPS 5
#define NUM_COMPLEX 10000
#define NUM_ELEM 32


Here's the summary table:


old old+c11 opt opt+c11 special

deque: 0.000701038 0.000705963 0.000235234 0.00023607 0.000234233
queue: 0.00069784 0.000705617 0.00023619 0.00023222 0.000237184
list: 0.00194583 0.00243208 0.00191296 0.00193926 0.00194809

text
size: 26502 28902 21712 29574 23112

text
orig: 33821 26502


block size:

deque: 80
queue: 80
list: 16






Original facsimile (not c++11):

dennisg@Tori-Radio:~/dc_test$ c++ -O3 main.cc
dennisg@Tori-Radio:~/dc_test$ size a.out
text data bss dec hex filename
28902 856 280 30038 7556 a.out

dennisg@Tori-Radio:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8

dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 35051914970, sec_t: 35.0519, t/ea: 0.000701038

dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 34892023951, sec_t: 34.892, t/ea: 0.00069784

dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 97291349192, sec_t: 97.2913, t/ea: 0.00194583




Original facsimile (c++11):

dennisg@Tori-Radio:~/dc_test$ c++ -O3 -std=c++11 main.cc
dennisg@Tori-Radio:~/dc_test$ size a.out
text data bss dec hex filename
21712 848 280 22840 5938 a.out

dennisg@Tori-Radio:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8

dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 35298153446, sec_t: 35.2982, t/ea: 0.000705963

dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 35280849767, sec_t: 35.2808, t/ea: 0.000705617

dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 121603777765, sec_t: 121.604, t/ea: 0.00243208



Optimized code (not c++11):

dennisg@Tori-Radio:~/dc_test$ c++ -O3 -finline main_opt.cc
dennisg@Tori-Radio:~/dc_test$ size a.out
text data bss dec hex filename
29574 856 280 30710 77f6 a.out

dennisg@Tori-Radio:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8

dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 11761720007, sec_t: 11.7617, t/ea: 0.000235234

dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 11809516472, sec_t: 11.8095, t/ea: 0.00023619

dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 95647805916, sec_t: 95.6478, t/ea: 0.00191296


Optimized code (c++11):

dennisg@Tori-Radio:~/dc_test$ c++ -O3 -finline -std=c++11 main_opt.cc
dennisg@Tori-Radio:~/dc_test$ size a.out
text data bss dec hex filename
23080 848 280 24208 5e90 a.out


dennisg@Tori-Radio:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8

dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 11803504003, sec_t: 11.8035, t/ea: 0.00023607

dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 11610977298, sec_t: 11.611, t/ea: 0.00023222

dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 96962902014, sec_t: 96.9629, t/ea: 0.00193926


special (opt+c++11):

dennisg@Tori-Radio:~/dc_test$ c++ -Ofast -Wsign-compare -Wall
-Wno-uninitialized -fvisibility=hidden -finline -std=c++11 main_opt.cc
dennisg@Tori-Radio:~/dc_test$ size a.out
text data bss dec hex filename
23112 856 280 24248 5eb8 a.out


dennisg@Tori-Radio:~/dc_test$ ./a.out
Building complex number data...
Done.
GNURadio hi-res clock tps: 1000000000
GNURadio sizeof(gr_complex): 8
GNURadio sizeof(CPLX): 8

dc_blocker_cc_deque: delay_line size=80
deque:
Done: total_t: 11711630308, sec_t: 11.7116, t/ea: 0.000234233

dc_blocker_cc_queue: delay_line size=80
queue:
Done: total_t: 11859205796, sec_t: 11.8592, t/ea: 0.000237184

dc_blocker_cc_list: delay_line size=16
list:
Done: total_t: 97404287524, sec_t: 97.4043, t/ea: 0.00194809

Data error i=0
















_______________________________________________
Discuss-gnuradio mailing list
Discuss-gnuradio@gnu.org
https://lists.gnu.org/mailman/listinfo/discuss-gnuradio

No comments:

Post a Comment