Schedule clause

The schedule clause is defined as following:

schedule([modifier [, modifier]:]kind[, chunk_size])

It is used to determine how the iterations in loop constructs are distributed in threads. Check the following code:

#include <stdio.h>
#include <unistd.h>
#include <omp.h>

void thread_fun(int seconds)
{
    sleep(seconds);
    printf("Thread %d sleeps %d seconds\n", omp_get_thread_num(), seconds);
}

int main(void)
{        
    #pragma omp parallel for
    for (int i = 1; i <= 8; i++)
    {
        thread_fun(i);
    }

    return 0;
} 

Build and run it on 2-core machine:

# gcc -fopenmp parallel.c
# ./a.out
Thread 0 sleeps 1 seconds
Thread 0 sleeps 2 seconds
Thread 1 sleeps 5 seconds
Thread 0 sleeps 3 seconds
Thread 0 sleeps 4 seconds
Thread 1 sleeps 6 seconds
Thread 1 sleeps 7 seconds
Thread 1 sleeps 8 seconds

We can see the first 4 iterations are executed in the first thread while the last 4 iterations are executed in the second thread. Modify the loop directive in above code:

#pragma omp parallel for schedule(static)
for (int i = 1; i <= 8; i++)
{
    ......
}

Build and run it:

# gcc -fopenmp parallel.c
# ./a.out
Thread 0 sleeps 1 seconds
Thread 0 sleeps 2 seconds
Thread 1 sleeps 5 seconds
Thread 0 sleeps 3 seconds
Thread 0 sleeps 4 seconds
Thread 1 sleeps 6 seconds
Thread 1 sleeps 7 seconds
Thread 1 sleeps 8 seconds

The same output as no schedule clause. If there is no schedule clause, the program will use default schedule policy which has the same behavior as schedule(static) in this case.

Now we modify the schedule clause from static:

#pragma omp parallel for schedule(static)
for (int i = 1; i <= 8; i++)
{
    ......
}

to dynamic:

#pragma omp parallel for schedule(dynamic)
for (int i = 1; i <= 8; i++)
{
    ......
}

Build and run it:

# gcc -fopenmp parallel.c
# ./a.out
Thread 0 sleeps 1 seconds
Thread 1 sleeps 2 seconds
Thread 0 sleeps 3 seconds
Thread 1 sleeps 4 seconds
Thread 0 sleeps 5 seconds
Thread 1 sleeps 6 seconds
Thread 0 sleeps 7 seconds
Thread 1 sleeps 8 seconds

We can see the distribution policy is different from static case: The first thread executes the first iteration and the second thread runs the second iteration. When a thread finished its execution, it would grab the next iteration. Because the distribution is determined in runtime, so synchronization is needed.

There is a special case of schedule(dynamic): schedule(guided) which grabs iterations dynamically and reduces schedule overhead. Modify the code as following:

#pragma omp parallel for schedule(guided)
for (int i = 1; i <= 8; i++)
{
    ......
}

Build and run it:

# gcc -fopenmp parallel.c
# ./a.out
Thread 0 sleeps 1 seconds
Thread 0 sleeps 2 seconds
Thread 1 sleeps 5 seconds
Thread 0 sleeps 3 seconds
Thread 0 sleeps 4 seconds
Thread 1 sleeps 6 seconds
Thread 0 sleeps 7 seconds
Thread 1 sleeps 8 seconds

We can see it is not as the schedule(dynamic), which every thread grabs the next iteration in turn: the first thread finished more tasks than the second one.

We can also specify chunk_size in schedule clause. The default value of chunk_size in schedule(dynamic) and schedule(guided) is 1, while in schedule(static) is undefined. Change the schedule to schedule(dynamic, 2), then build and run it:

# gcc -fopenmp parallel.c
# ./a.out
Thread 0 sleeps 1 seconds
Thread 1 sleeps 3 seconds
Thread 0 sleeps 2 seconds
Thread 1 sleeps 4 seconds
Thread 0 sleeps 5 seconds
Thread 1 sleeps 7 seconds
Thread 0 sleeps 6 seconds
Thread 1 sleeps 8 seconds

Compare the above output with the one of schedule(dynamic): we can see this time every thread will grab 2 iterations one time.

There are 2 more options for schedule clause: schedule(auto) lets compiler and/or runtime system to determine the policy; schedule(runtime) makes scheduling deferred until run time, e.g., depends on OMP_SCHEDULE environment variable .

results matching ""

    No results matching ""