Hi Jeff,
there's a good chance that your compiler outsmarted you. i.e. parts of
your test are optimized out. I suggest to use smth like "benchmark" for
tests. Also, make sure that the variables in your test cannot be
optimized out.
Cheers
Johannes
On 08.10.23 00:22, Jeff R wrote:
> I modified a simple Volk sqrt program for an ARM1176JZ-S processor to
> test performance, and the results are puzzling. The following program
> prints:
>
>
> dur_VolkSqrt=(0.000000)0.001721 dur_CRTLSqrt=(0.000000)0.000318
>
>
> The following processor information is displayed. It appears as though
> NEON is supported.
>
>
> ~/volk-3.0.0/build# cpu_features/list_cpu_features
>
> arch : aarch64____
>
> implementer : 65 (0x41)____
>
> variant : 0 (0x00)____
>
> part : 3336 (0xD08)____
>
> revision : 3 (0x03)
>
> flags : asimd,cpuid,crc32,fp
>
>
> Why are the numbers so slow for Volk versus the CRTL? I may be missing
> something obvious. Thank you in advance.
>
>
> Here's the test program:
>
>
>
> // g++ -I /usr/local/include/volk volk_sqrt.cpp -o volk_sqrt -L
> /usr/local/lib64/ -lvolk
>
> // export LD_LIBRARY_PATH=/usr/local/lib64; ./volk_sqrt
>
>
> #include <stdio.h>
>
> #include <math.h>
>
> #include <volk.h>
>
> #include <limits.h>
>
> #include <time.h>
>
> #include <sys/time.h>
>
>
> double get_wall_time()
>
> {
>
> struct timeval time;
>
>
> if (gettimeofday(&time,NULL))
>
> {
>
> // Handle error
>
> return 0;
>
> }
>
> return (double)time.tv_sec + (double)time.tv_usec * .000001;
>
> }
>
>
> int main(int argc, char* args[])
>
> {
>
> double walStop;
>
> double walStart;
>
> double dur_VolkSqrt;
>
> double dur_CRTLSqrt;
>
> int N = 1024*16;
>
>
> unsigned int alignment = volk_get_alignment();
>
> float* in = (float*)volk_malloc(sizeof(float)*N, alignment);
>
> float* out = (float*)volk_malloc(sizeof(float)*N, alignment);
>
>
> for(unsigned int ii = 0; ii < N; ++ii)
>
> {
>
> in[ii] = (float)(ii*ii);
>
> }
>
>
> walStart = get_wall_time();
>
> volk_32f_sqrt_32f_a(out, in, N);
>
> //volk_32f_sqrt_32f(out, in, N);
>
> walStop = get_wall_time();
>
> dur_VolkSqrt = walStop - walStart;
>
>
> walStart = get_wall_time();
>
> for(unsigned int ii = 0; ii < N; ++ii)
>
> {
>
> out[ii] = sqrt(in[ii]);
>
> }
>
> walStop = get_wall_time();
>
> dur_CRTLSqrt = walStop - walStart;
>
>
> printf("dur_VolkSqrt=(%f)%f dur_CRTLSqrt=(%f)%f\n", dur_VolkSqrt/N,
> dur_VolkSqrt, dur_CRTLSqrt/N, dur_CRTLSqrt);
>
> volk_free(in);
>
> volk_free(out);
>
> return 0;
>
> }
>
No comments:
Post a Comment