main

::ELIMINATING THE LAUNCHER

The information on this page would be useful for users who want to launch each process in an application manually or want to use their own launcher to start an application or want to debug an application.

The Provided Launcher (mpd)

It is possible to launch an MPICH application without using the provided launcher.   First we need to know what the launcher does and then we can show how to launch an application without it.  MPICH.NT uses environment variables to communicate with the spawned processes, so any launcher that can provide the required environment variables could launch an MPICH.NT application.

What the launcher does:

    1) Create the first process

Process zero acquires a port to listen on and then communicates this port number back to the launcher.

    2) Create the rest of the processes

The launcher then creates all the rest of the processes, informing them which port the first process is listening on through an environment variable.

Here are the environment variables set by the launcher:

Required  
MPICH_JOBID Unique string accross all machines used to create named objects like mutexes and shared memory queues.  The provided launchers create this string by appending a number to the root hostname (ie. fry14).
MPICH_IPROC The rank of the current process.
MPICH_NPROC The total number of processes.
MPICH_ROOT The hostname of the root process and the port where it is listening.  Use a colon to separate the host name and port: hostA:port or a.b.c.d:port
MPICH_EXTRA Only valid on the root process.  The name of a temporary file used to communicate the port number from the root process to the launcher.
Conditional  
MPICH_SHM_LOW The lowest rank that the current process can reach through shared memory queues.
MPICH_SHM_HIGH The highest rank the current process can reach through shared memory queues.

Without the Launcher

The key to eliminating the launcher is to remove the interaction with the first process.  If you set MPICH_ROOT to an available port number in the environment of the first process then the process will use this port and it will not attempt to write the number out to the file described by MPICH_EXTRA.

Here is an example.

I brought up two command prompts on two separate machines, set the environment variables and ran an application according to the charts below:

Host Fry Jazz
Environment MPICH_JOBID=fry.123
MPICH_IPROC=0
MPICH_NPROC=2
MPICH_ROOT=fry:12345
MPICH_JOBID=fry.123
MPICH_IPROC=1
MPICH_NPROC=2
MPICH_ROOT=fry:12345
Command netpipe.exe netpipe.exe

Here is the same example on a single machine which uses shared memory:

Host Fry Fry
Environment MPICH_JOBID=fry.2000
MPICH_IPROC=0
MPICH_NPROC=2
MPICH_ROOT=fry:12345
MPICH_SHM_LOW=0
MPICH_SHM_HIGH=1
MPICH_JOBID=fry.2000
MPICH_IPROC=1
MPICH_NPROC=2
MPICH_ROOT=fry:12345
MPICH_SHM_LOW=0
MPICH_SHM_HIGH=1
Command netpipe.exe netpipe.exe

Here is an example of four processes on two machines which mixes shared memory and socket communication:

Host Fry Fry Jazz Jazz
Environment MPICH_JOBID=fry.100
MPICH_IPROC=0
MPICH_NPROC=4
MPICH_ROOT=fry:12345
MPICH_SHM_LOW=0
MPICH_SHM_HIGH=1
MPICH_JOBID=fry.100
MPICH_IPROC=1
MPICH_NPROC=4
MPICH_ROOT=fry:12345
MPICH_SHM_LOW=0
MPICH_SHM_HIGH=1
MPICH_JOBID=fry.100
MPICH_IPROC=2
MPICH_NPROC=4
MPICH_ROOT=fry:12345
MPICH_SHM_LOW=2
MPICH_SHM_HIGH=3
MPICH_JOBID=fry.100
MPICH_IPROC=3
MPICH_NPROC=4
MPICH_ROOT=fry:12345
MPICH_SHM_LOW=2
MPICH_SHM_HIGH=3
Command mandel.exe mandel.exe mandel.exe mandel.exe

This is the exact process for the first example from a command prompt:

On Fry

C:\Temp>set MPICH_JOBID=fry.123
C:\Temp>set MPICH_IPROC=0
C:\Temp>set MPICH_NPROC=2
C:\Temp>set MPICH_ROOT=fry:12345
C:\Temp>netpipe.exe

On Jazz

C:\Temp>set MPICH_JOBID=fry.123
C:\Temp>set MPICH_IPROC=1
C:\Temp>set MPICH_NPROC=2
C:\Temp>set MPICH_ROOT=fry:12345
C:\Temp>netpipe.exe

Debugging

To debug an application using MSDevStudio, simply set up the environment variables as described above and then run "msdev myapp.exe" instead of just "myapp.exe".  This will bring up the developer studio and then you can step through your program using the debugger commands.

Note: This will only work with code built with the debug configuration.  Also, if you want to step through the mpich dll code, you will have to download and compile the source distribution of mpich.nt.

main