Workqueuing Constructs

taskq Pragma

The taskq pragma specifies the environment within which the enclosed units of work (tasks) are to be executed. From among all the threads that encounter a taskq pragma, one is chosen to execute it initially. Conceptually, the taskq pragma causes an empty queue to be created by the chosen thread, and then the code inside the taskq block is executed single-threaded. All the other threads wait for work to be enqueued on the conceptual queue. The task pragma specifies a unit of work, potentially executed by a different thread. When a task pragma is encountered lexically within a taskq block, the code inside the task block is conceptually enqueued on the queue associated with the taskq. The conceptual queue is disbanded when all work enqueued on it finishes, and when the end of the taskq block is reached.

Control Structures

Many control structures exhibit the pattern of separated work iteration and work creation, and are naturally parallelized with the workqueuing model. Some common cases are:

while Loops

If the computation in each iteration of a while loop is independent, the entire loop becomes the environment for the taskq pragma, and the statements in the body of the while loop become the units of work to be specified with the task pragma. The conditional in the while loop and any modifications to the control variables are placed outside of the task blocks and executed sequentially to enforce the data dependencies on the control variables.

C++ Iterators

C++ Standard Template Library (STL) iterators are very much like the while loops just described, whereby the operations on the data stored in the STL are very distinct from the act of iterating over all the data.  If the operations are data-independent, they can be done in parallel as long as the iteration over the work is sequential. This type of while loop parallelism is a generalization of the standard OpenMP* worksharing for loops. In the worksharing for loops, the loop increment operation is the iterator and the body of the loop is the unit of work. However, because the for loop iteration variable frequently has a closed form solution, it can be computed in parallel and the sequential step avoided.

Recursive Functions

Recursive functions also can be used to specify parallel iteration spaces. The mechanism is similar to specifying parallelism using the sections pragma, but is much more flexible because it allows arbitrary code to sit between the taskq and the task pragmas, and because it allows recursive nesting of the function to build a conceptual tree of taskq queues.  The recursive nesting of the taskq pragmas is a conceptual extension of OpenMP worksharing constructs to behave more like nested OpenMP parallel regions.  Just like nested parallel regions, each nested workqueuing construct is a new instance and is encountered by exactly one thread.  However, the major difference is that nested workqueuing constructs do not cause new threads or teams to be formed, but rather re-use the threads from the team.  This permits very easy multi-algorithmic parallelism in dynamic environments, such that the number of threads need not be committed at each level of parallelism, but instead only at the top level.  From that point on, if a large amount of work suddenly appears at an inner level, the idle threads from the outer level can assist in getting that work finished.  For example, it is very common in server environments to dedicate a thread to handle each incoming request, with a large number of threads awaiting incoming requests.  For a particular request, its size may not be obvious at the time the thread begins handling it.  If the thread uses nested workqueuing constructs, and the scope of the request becomes large after the inner construct is started, the threads from the outer construct can easily migrate to the inner construct to help finish the request.

Since the workqueuing model is designed to preserve sequential semantics, synchronization is inherent in the semantics of the taskq block.  There is an implicit team barrier at the completion of the taskq block for the threads that encountered the taskq construct to ensure that all of the tasks specified inside of the taskq block have finished execution.  This taskq barrier enforces the sequential semantics of the original program.  Just like the OpenMP worksharing constructs, it is assumed you are responsible for ensuring that either no dependences exist or that dependencies are appropriately synchronized between the task blocks, or between code in a task block and code in the taskq block outside of the task blocks.

The syntax, semantics, and allowed clauses are designed to resemble OpenMP* worksharing constructs. Most of the clauses allowed on OpenMP worksharing constructs have a reasonable meaning when applied to the workqueuing pragmas.

taskq Construct

#pragma intel omp taskq [clause[[,]clause]...]

     structured-block

where clause can be any of the following:

private

The private clause creates a private, default-constructed version for each object in variable-list for the taskq. It also implies captureprivate on each enclosed task.  The original object referenced by each variable has an indeterminate value upon entry to the construct, must not be modified within the dynamic extent of the construct, and has an indeterminate value upon exit from the construct.

firstprivate

The firstprivate clause creates a private, copy-constructed version for each object in variable-list for the taskq. It also implies captureprivate on each enclosed task. The original object referenced by each variable must not be modified within the dynamic extent of the construct and has an indeterminate value upon exit from the construct.

lastprivate

The lastprivate clause creates a private, default-constructed version for each object in variable-list for the taskq. It also implies captureprivate on each enclosed task. The original object referenced by each variable has an indeterminate value upon entry to the construct, must not be modified within the dynamic extent of the construct, and is copy-assigned the value of the object from the last enclosed task after that task completes execution.

reduction

The reduction clause performs a reduction operation with the given operator in enclosed task constructs for each object in variable-list.  operator and variable-list are defined the same as in the OpenMP Specifications.

ordered

The ordered clause performs ordered constructs in enclosed task constructs in original sequential execution order.  The taskq directive, to which the ordered is bound, must have an ordered clause present.

nowait

The nowait clause removes the implied barrier at the end of the taskq.  Threads may exit the taskq construct before completing all the task constructs queued within it.

task Construct

#pragma intel omp task [clause[[,]clause]...]

     structured-block

where clause can be any of the following:

private

The private clause creates a private, default-constructed version for each object in variable-list for the task. The original object referenced by the variable has an indeterminate value upon entry to the construct, must not be modified within the dynamic extent of the construct, and has an indeterminate value upon exit from the construct.

captureprivate

The captureprivate clause creates a private, copy-constructed version for each object in variable-list for the task at the time the task is enqueued. The original object referenced by each variable retains its value but must not be modified within the dynamic extent of the task construct.

Combined parallel and taskq Construct

#pragma intel omp parallel taskq [clause[[,]clause]...]

     structured-block

where clause can be any of the following:

Clause descriptions are the same as for the OpenMP parallel construct or the taskq construct above as appropriate.