OTF2 (part of the Score-P) has a terrible reader interface. This post attempts to explain one part of boilerplate you need to write in order to process MPI messages.
If your tool processes MPI messages, it does not suffice to only consider MpiSend and MpiRecv
records. Applications using MPI can also use non-blocking send and
receive operations. Mixing blocking and non-blocking operations is
possible as well. In order to handle MPI messages correctly, you need to
consider all of the following records:
OTF2_CallbackCode handleMpiSend(OTF2_LocationRef sender,
OTF2_TimeStamp time, void* userData, OTF2_AttributeList* a,
uint32_t receiver, OTF2_CommRef com, uint32_t tag,
uint64_t length)
OTF2_CallbackCode handleMpiIsend(OTF2_LocationRef sender,
OTF2_TimeStamp time, void *userData, OTF2_AttributeList *a,
uint32_t receiver, OTF2_CommRef com, uint32_t tag,
uint64_t length, uint64_t requestId)
OTF2_CallbackCode handleMpiIsendComplete(
OTF2_LocationRef sender, OTF2_TimeStamp time,
void *userData, OTF2_AttributeList *a, uint64_t requestId)
OTF2_CallbackCode handleMpiRecv(OTF2_LocationRef receiver,
OTF2_TimeStamp time, void* userData, OTF2_AttributeList* a,
uint32_t sender, OTF2_CommRef com, uint32_t tag,
uint64_t length)
OTF2_CallbackCode handleMpiIrecv(OTF2_LocationRef receiver,
OTF2_TimeStamp time, void *userData, OTF2_AttributeList *a,
uint32_t sender, OTF2_CommRef com, uint32_t tag,
uint64_t length, uint64_t requestId)
OTF2_CallbackCode handleMpiIrecvRequest(
OTF2_LocationRef receiver, OTF2_TimeStamp time,
void *userData, OTF2_AttributeList *a, uint64_t requestId)
OTF2_CallbackCode handleMpiRequestCancelled(
OTF2_LocationRef location, OTF2_TimeStamp time,
void *userData, OTF2_AttributeList *a, uint64_t requestId)
This
post explains how to translate non-blocking send and receive records to
normal/blocking send and receives, so that your tool can subsequently
process all types of messages in a homogenous way.
In a previous post I explained a detail you need know for this to work.
Explaining the Involved Records
- Send: Is issued when MPI_Send is called
- Isend: Is issued when MPI_Isend is called
- IsendComplete: Is issued when an MPI_Wait, MPI_Test or a similar function confirms that the Isend operation has been completed
- Receive: Is issued when MPI_Recv is called
- IreceiveRequest:
- Is issued when MPI_Irecv is called
- Similar
to the Isend record, except you don't yet know the tag, communicator
and length of the to-be-received message, because of possible wildcards
for tags and communicators in such requests
- Ireceive:
- Issued when an MPI_Wait, MPI_Test or a similar function confirms that the Ireceive operation has been completed
- Similar
to IsendComplete, except it contains the parameters of the received
message, whereas Isend itself (not the complete) contains them
- RequestCancelled: Can cancel Isends and IreceiveRequests, and other maybe not-recorded requests.
The
naming scheme of these records is a bit confusing. A more consistent
scheme would have been: Isend, IsendComplete, Ireceive (=IreceiveRequest
here), IreceiveComplete (=Ireceive here).
Data Structures
Send/Receive
struct SentMessage {
time, sender, receiver, communicator, length, tag }
struct ReceivedMessage {
time, receiver, sender , communicator, length, tag }
Non-blocking
struct Isend { time, sender, receiver, com, length,
tag, requestId, queue<SentMessage> blockedSends }
struct IreceiveRequest { requestId,
queue<ReceivedMessage> blockedReceives }
map<process, queue<Isend>> isends
map<process, queue<IreceiveRequest>> ireceiveRequests
The Algorithm
Send
- If isends for this process is empty
- Otherwise
- Append this Send to the latest isends[sender]'s blockedSends
Explanation:
To preserve the correct ordering of messages, previously issued Isends
have to be processes before this Send. We can only process Isends that
are completed, because they might be cancelled subsequently. We
therefore enqueue this Send until all previous Isends are processed.
Isend
- Append this Isend to isends[sender] with an empty queue
IsendComplete
- Search for a matching Isend on this process
- If it matches the first in the queue
- Record the sent message (time is Isend's time)
- For each blocked Send in the attached queue
- Otherwise
- Append this completed Send to the previous Isend's blocked queue
- Append the blocked queue of the completed Send to the previous Isend's queue as well
- remove this entry from isends[sender]
Explanation:
An Isend has been completed. Therefore, we have a succesfully sent
message that we can record unless there are previously issued,
uncompleted Isends (similar to when Send happens). If we completed the
earliest remaining Isend, we can now process all messages that
have been blocked by it. If we completed an Isend that has previous
other Isends, then we complete this Send and enqueue everything to the
previous Isend's queue, because we can only process these messages when
this previous Isend has been completed.
The
receive records will be handled similarly, with some slight
modifications due to differences in whether information is known during
the request start or completion.
Receive
IreceiveRequest
- Append the request to ireceiveRequest[receiver] with an empty queue.
Ireceive
- Search for a matching IreceiveRequest on this process
- If it matches the first in the queue
- Record the received message (time is Ireceive's time)
- For each blocked Receive in the attached queue
- Record the received message
- Otherwise
- Append this completed Receive to the previous IreceiveRequest's blocked queue
- Append the queue of the completed Receive to the previous request's queue as well
- Remove this entry from ireceiveRequests[receiver]
RequestCancelled
- If it matches an Isend's requestId
- Handle the same as IsendComplete, but without recording this send
- If it matches an IreceiveRequest's requestId
- Handle the same as Ireceive, but without recording this receive
- It should never be both, but can be neither
Wrap Up
A
sent message has the timestamp of the initial Isend. A receive has the
timestamp of the completed Ireceive. Thus, we take the timestamp of the
earliest possible sending moment and the latest possible receiving
moment, because the message is actually transmitted sometime during this
interval. This is as accurate as it gets in OTF2. Because of this,
IreceiveRequest's and IsendComplete's timestamps are ignored. This also
means that the order in which you process the records is not necessarily
chronological for receives, but it is for sends.
After
applying the above algorithm you have a list of sent and received MPI
messages. You don't yet know which send belongs to which receive.
Determining this relationship is called message matching. I intend to
explain how it works in a future blog post.
Once this is done, you have one list of messages with correct durations, that can finally be used in your tool.
Happy Coding!
P.S.: I used OTF2 version 1.5.1