OTF2 Knowledge: How to Match MPI Messages

OTF2 (part of Score-P) has a terrible reader interface. This post explains one little part of boilerplate you need to write in order to process MPI messages.

To properly process MPI send and receive, you need to determine which send belongs to which receive. This in conjunction with the previous two posts (1, 2) enables you to generate one list of messages to be used in your tool, subsequently.

Note that the following explanation also applies to OTF and likely to processing MPI send and receive events in general.

Formally, given:

    struct SentMessage {
        sender, receiver, time, communicator, length, tag }
    struct ReceivedMessage {
        receiver, sender, time, communicator, length, tag }

    list<SentMessage>     sentMessages
    list<ReceivedMessage> receivedMessages

Where both sent and received messages are ordered according to when these operations where issued, we want to determine:

    struct Message { sender, receiver, time, duration, length }

    list<Message> messages

First, we transform the received messages into another data structure:

    struct Key { sender, receiver, communicator, tag }

    map<Key, queue<ReceivedMessage>> receivedMessagesMap

Second, we iterate over all sent messages and index into the received messages map, to find the matching receive.
  • For each sent message
    • Use its sender, receiver, communicator and tag fields to index into receivedMessagesMap and retrieve the queue
    • Is there an item in the queue? Yes:
      • The first item in the queue is the receive that matches to this send
      • Append the matched message to messages. Duration is the difference between sent and receive time.
      • Dequeue this item
      • If duration ≤ 0: Warning: A message should be received later than it has been sent. We have undefined/infinite or negative bandwidth.
      • If send length > receive length: Warning: A sent message is not allowed to be larger than the receive assumes.
    • No:
      • Warning: Missing receive
  • Print out a condensed version of the above warnings

We use one queue per sender, receiver, communicator and tag, because a send belongs to a receive only if all of the four coincide.

Recording the warnings/errors makes sense, because MPI libraries are less strict than the standard. For example, if you send messages and never receive them and just exit the program, neither the MPI library nor Score-P might care and thus you have missing receives. I can confirm that this happens sometimes. Therefore, you should not disallow missing receives.
Also, be aware that durations will likely be zero or negative from time to time. Time depends on the clocks used. Clocks across different nodes/processes can yield values that imply zero or negative durations. Score-P does not sanitize time values. Thus, it is left to the tools developer to decide how to handle this situation.

With these three posts you can now properly process MPI messages, for example to compute correct bandwidths, or to visualize them:

Happy coding!

No comments:

Post a Comment