2015-05-29

OTF2 Knowledge: How to Map Local MPI Ranks to OTF2 Locations

OTF2 (part of the Score-P infrastructure) has a terrible reader interface. This post attempts to explain one little part of boilerplate you need to write in order to process MPI records.

For example, to process an MPI_Send record you use a handler with the following signature:

    OTF2_CallbackCode handleMpiSend(OTF2_LocationRef sender,
        OTF2_TimeStamp time, void* userData,
        OTF2_AttributeList* a, uint32_t receiver,
        OTF2_CommRef com, uint32_t tag, uint64_t length);


Notice that the data type of the sender and receiver processes are not the same. This is because the event is recorded as is during tracing, without trying to match MPI-specific, communicator-dependent identifiers to communicator-independent OTF2 ones. Since Score-P/OTF2 does not have a post-processing step, the connection between these identifiers and the OTF2 locations is never established. It is left to the programmer to apply this mapping while reading the trace.

Required Handlers

    OTF2_CallbackCode handleOtf2DefGroup(void *userData,
        OTF2_GroupRef group, OTF2_StringRef name,
        OTF2_GroupType groupType, OTF2_Paradigm paradigm,
        OTF2_GroupFlag groupFlags, uint32_t memberCount,
        const uint64_t* members);

    OTF2_CallbackCode handleOtf2DefCommunicator(void *userData,
        OTF2_CommRef com, OTF2_StringRef name,
        OTF2_GroupRef group, OTF2_CommRef parent);


Data Structures

    map<OTF2_CommRef, OTF2_GroupRef> communicatorToGroup;
    map<OTF2_GroupRef, map<uint32_t /*local rank*/,
        uint64_t /*rank in comm world*/>> localRankToGlobalRank;
    OTF2_GroupRef mpiLocationGroup;

The Algorithm

 

DefGroup

    localRankToGlobalRank[group] = {};
    for (uint32_t i = 0; i < memberCount; i += 1) {
        localRankToGlobalRank[group][i] = members[i];
    }

    if (type == OTF2_GROUP_TYPE_COMM_LOCATIONS &&
            paradigm == OTF2_PARADIGM_MPI) {
        mpiLocationGroup = group;
    }

DefCommunicator

    communicatorToGroup[com] = group

Mapping Local MPI Ranks to OTF2 Locations


Input: local MPI rank rank, MPI communicator of this rank com
Output: OTF2 location
Procedure: First, map the local MPI rank to the rank in comm world. Next map this rank to the OTF2 location using the magic group mpiLocationGroup:

    localRankToGlobalRank[mpiLocationGroup[localRankToGlobalRank[
        communicatorToGroup[com]][rank]]

Happy Coding!

ps: I used OTF2 version 1.5.1

2015-05-09

Automatically flushing QTextStream

QTextStream is way more convenient for printing to console than C++ streams are. Therefore, I usually replace cerr and cout by qerr and qout in projects using Qt:

Header file:
     extern QTextStream qerr;
     extern QTextStream qout;

Source file:
     QTextStream qerr(stderr, QIODevice::WriteOnly | QIODevice::Text);
     QTextStream qout(stdout, QIODevice::WriteOnly | QIODevice::Text);

Unfortunately, QTextStream buffers and is not able to automatically flush. This means it may print a message later than expected or not at all if the program crashes. This is obviously undesirable for a cerr replacement.

One way to work around this is by intercepting all stream operator calls and adding a call to flush():

Header file:
    class AutoFlushingQTextStream : public QTextStream {
    public:
        AutoFlushingQTextStream(FILE* f, QIODevice::OpenMode o)
                : QTextStream(f, o) {} 

        template<typename T>
        AutoFlushingQTextStream& operator<<(T&& s) {
            *((QTextStream*) this) << std::forward<T>(s);
            flush(); 
            return *this; 
        }
    };

    extern AutoFlushingQTextStream qerr;
    extern QTextStream qout;

Source file:
    AutoFlushingQTextStream qerr(stderr, QIODevice::WriteOnly | QIODevice::Text);
    QTextStream qout(stdout, QIODevice::WriteOnly | QIODevice::Text);


Any Tips for improving the code are very much appreciated.