Sunday, October 8, 2023

Re: Volk sqrt ARM performance

Hi Jeff,

there's a good chance that your compiler outsmarted you. i.e. parts of
your test are optimized out. I suggest to use smth like "benchmark" for
tests. Also, make sure that the variables in your test cannot be
optimized out.

Cheers
Johannes

On 08.10.23 00:22, Jeff R wrote:
> I modified a simple Volk sqrt program for an ARM1176JZ-S processor to
> test performance, and the results are puzzling. The following program
> prints:
>
>
> dur_VolkSqrt=(0.000000)0.001721 dur_CRTLSqrt=(0.000000)0.000318
>
>
> The following processor information is displayed. It appears as though
> NEON is supported.
>
>
> ~/volk-3.0.0/build# cpu_features/list_cpu_features
>
> arch            : aarch64____
>
> implementer     :  65 (0x41)____
>
> variant         :   0 (0x00)____
>
> part            : 3336 (0xD08)____
>
> revision        :   3 (0x03)
>
> flags           : asimd,cpuid,crc32,fp
>
>
> Why are the numbers so slow for Volk versus the CRTL? I may be missing
> something obvious. Thank you in advance.
>
>
> Here's the test program:
>
>
>
> // g++ -I /usr/local/include/volk volk_sqrt.cpp -o volk_sqrt -L
> /usr/local/lib64/ -lvolk
>
> // export LD_LIBRARY_PATH=/usr/local/lib64; ./volk_sqrt
>
>
> #include <stdio.h>
>
> #include <math.h>
>
> #include <volk.h>
>
> #include <limits.h>
>
> #include <time.h>
>
> #include <sys/time.h>
>
>
> double get_wall_time()
>
> {
>
>     struct timeval time;
>
>
>     if (gettimeofday(&time,NULL))
>
>     {
>
>         //  Handle error
>
>         return 0;
>
>     }
>
>     return (double)time.tv_sec + (double)time.tv_usec * .000001;
>
> }
>
>
> int main(int argc, char* args[])
>
> {
>
>     double walStop;
>
>     double walStart;
>
>     double dur_VolkSqrt;
>
>     double dur_CRTLSqrt;
>
>     int N = 1024*16;
>
>
>     unsigned int alignment = volk_get_alignment();
>
>     float* in = (float*)volk_malloc(sizeof(float)*N, alignment);
>
>     float* out = (float*)volk_malloc(sizeof(float)*N, alignment);
>
>
>     for(unsigned int ii = 0; ii < N; ++ii)
>
>     {
>
>         in[ii] = (float)(ii*ii);
>
>     }
>
>
>     walStart = get_wall_time();
>
>     volk_32f_sqrt_32f_a(out, in, N);
>
>     //volk_32f_sqrt_32f(out, in, N);
>
>     walStop = get_wall_time();
>
>     dur_VolkSqrt = walStop - walStart;
>
>
>     walStart = get_wall_time();
>
>     for(unsigned int ii = 0; ii < N; ++ii)
>
>     {
>
>         out[ii] = sqrt(in[ii]);
>
>     }
>
>     walStop = get_wall_time();
>
>     dur_CRTLSqrt = walStop - walStart;
>
>
>     printf("dur_VolkSqrt=(%f)%f dur_CRTLSqrt=(%f)%f\n", dur_VolkSqrt/N,
> dur_VolkSqrt, dur_CRTLSqrt/N, dur_CRTLSqrt);
>
>     volk_free(in);
>
>     volk_free(out);
>
>     return 0;
>
> }
>

No comments:

Post a Comment