SIMD construct

SIMD construct can leverage some hardware vector processing units to accelerate computation. For example, the following CPU supports AVX instructions:

# lscpu
......
avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl ......

Check the following code:

#include <stdio.h>
#include <omp.h>

#define ARRAY_SIZE    (10240)
float A[ARRAY_SIZE];
float B[ARRAY_SIZE];
float C[ARRAY_SIZE];

int main(void)
{    
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        A[i] = i * 2.3;
        B[i] = i + 4.6;
    }

    double start = omp_get_wtime();
    for (int loop = 0; loop < 1000000; loop++)
    {
        for (int i = 0; i < ARRAY_SIZE; i++)
        {
            C[i] = A[i] * B[i];
        }
    }
    double end = omp_get_wtime();
    printf("Work consumed %f seconds\n", end - start);
    return 0;
}

Build and run it:

# gcc -O2 -fopenmp parallel.c
# ./a.out
Work consumed 4.805690 seconds

It consumes ~4.8 seconds to finish the following code:

for (int loop = 0; loop < 1000000; loop++)
{
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        C[i] = A[i] * B[i];
    }
}

Change the above code as following:

for (int loop = 0; loop < 1000000; loop++)
{
    #pragma omp simd
    for (int i = 0; i < ARRAY_SIZE; i++)
    {
        C[i] = A[i] * B[i];
    }
}

Build and run the program again:

# gcc -O2 -fopenmp parallel.c
# ./a.out
Work consumed 1.773028 seconds

This time the execution time is only ~1.8 seconds.

SIMD construct

SIMD construct

results matching ""

No results matching ""