MPJ Express
 

MPJ Express: An implementation of MPI in Java

Table of Contents

  1. Introduction
  2. Getting started
  3. Writing and compiling MPJE programs
  4. Running MPJE programs with the MPJ Express runtime
  5. Running MPJE programs without the MPJ Express runtime (manually)
  6. The MPJ Express test suite
  7. Compiling MPJ Express source code and test suite
  8. Tested platforms
  9. Java-docs
  10. Contact and support
  11. Miscellaneous
    1. Turning debugging on and off
    2. Running daemons in console mode
    3. Running MPJ Express daemons on Solaris, PowerPC Linux, and PowerPC Mac OS X
    4. Changing protocol switch limit
    5. API changes between mpiJava-1.2.x and MPJ Express
    6. Acknowledgements
  12. Known issues and limitations
    1. Test cases hang on Solaris and Windows
    2. Takes a long time to bootstrap MPJ processes
    3. Intercomm.Merge(..) -- limited functionality
    4. Using MPI.PACK datatype -- tweaked
    5. MPI.PACK with buffered mode -- tweaked
    6. Cartcomm.Dims_Create(..) -- limited functionality
    7. Request.Cancel(..) -- not implemented
    8. Printing long lines (>500 characters) with runtime -- limitation
    9. Exception "Another mpjrun module is already running on this machine"
    10. Permission issues while using MPJ Express runtime on Windows
    11. Mixing remote loading and local loading may end up in ClassNotFoundException for $MPJ_HOME/src/mpi/MPI.java
  1. Introduction

  2. This software (MPJ Express - MPJE) is a reference implementation of the MPI bindings defined for the Java language. The current version of this software is following the mpiJava 1.2 API specification . We plan to add support for the MPJ API in a subsequent release. It is important to note that the difference between these two APIs is in the naming schemes for classes and methods. The functionality provided to users is essentially the same for both APIs.

    This release contains the source code and binaries of MPJ Express library, as well as the runtime infrastructure. We have developed a test suite that imports various test cases from mpiJava; it also has a number of new test cases. This test suite checks the functionality of almost every MPI function. See the section on "MPJ Express test suite" for further details. This software has been tested on various UNIX and Windows operating systems. See the section "Tested Platforms" for a list of tested platforms.

    There are two fundamental ways of running MPJE applications. The first, and the recommended way is using the MPJ Express runtime infrastructure, alternatively the second way involves the 'manual' start-up of MPJE processes.

    The MPJ Express runtime infrastructure consists of daemons and the mpjrun module. The idea is, that the users of MPJ Express first start daemons on a number of compute-nodes, which in this document means the machines that execute MPJE processes. These can also be thought as the compute-nodes of a cluster. Once the daemons are running on compute-nodes, then the users can use the mpjrun module (using mpjrun.sh or mpjrun.bat scripts) on the cluster's head-node, which contacts the daemons, starting the MPJE application, and transports output back to the head-node so that users can view the progress of their programs during execution. The MPJ Express runtime infrastructure is able to run the code as JAR or class files. The runtime infrastructure provides the notion of local loaders and remote loaders. A user may prefer to use local loaders if the compute-nodes and head-node have a shared file system, and the MPJ Express JAR files as well as the user application is available locally on the compute-nodes. On the other hand, remote loaders can be used in cases where there is no shared file system on the compute-nodes, and the MPJ Express JAR files and the user applications have to be fetched from the head-node.

    The second way, which is referred in this document as 'manual', is to run the shell script 'runmpj.sh' that uses SSH to execute the code. This script is able to run JAR or class files, but it is only possible to use this script on UNIX-based operating systems. For Windows, running test cases and applications manually means starting each MPJE process by using the java command.

    The MPJ Express infrastructure does not deal with security in the current release. The MPJ Express daemons could be a security concern, as these are Java applications listening on a port to execute user-code. It is therefore recommended that the daemons run behind a suitably configured firewall, which only listens to trusted machines. In a normal scenario, these daemons would be running on the compute-nodes of a cluster, which are not accessible to outside world. Alternatively, it is also possible to start MPJE processes 'manually', which could help avoid runtime daemons. In addition, each MPJE process starts at least one server socket, and thus is assumed to be running on machine with configured firewall. Most MPI implementations assume firewalls as protection mechanism from the outside world.

  3. Getting started

    1. The pre-requisite for using MPJ Express is Java 1.5 (stable) or higher. Make sure that you use the stable version because there is a bug in Java 1.5 beta that affects MPJ Express. If you are interested in compiling the source code of MPJ Express, see section "Compiling the MPJ Express source code and test suite

    2. Download MPJ Express and unpack it. This should create a folder named "mpj-v<version_number>".

    3. Set MPJ_HOME and PATH environmental variables.
      • Linux (assuming MPJ Express is in '/home/aamir/mpj')
               export MPJ_HOME=/home/aamir/mpj
               export PATH=$PATH:$MPJ_HOME/bin
        These lines may be added to ~/.bashrc
      • Windows (assuming mpj is in 'c:\mpj')
        • Right-click My Computer->Properties->Advanced tab->Environment Variables and export the following system variables (user variables are not enough)
        • Set the value of variable MPJ_HOME as c:\mpj
        • Set the value of variable PATH as c:\mpj\bin
      • Windows with cygwin (assuming mpj is 'c:\mpj'
        • The recommended way to is to set variables as in Windows
        • If you want to set variables in cygwin shell
        •           export MPJ_HOME="c:\\mpj"
                    export PATH=$PATH:"$MPJ_HOME\\bin"
          These lines may be added to ~/.bashrc

    4. Create a new working directory for MPJE programs. This document assumes that the name of this directory is mpj-user. The location of this directory is not important in the context of execution of the code. This directory will hold users MPJE programs, machines file, and configuration file (for manual execution)

    5. Start the daemons
      • cd mpj-user
      • Write a machines file simply stating a machine name or IP address on each line. Save this file as 'machines' in mpj-user directory. More details on the format of machines file can be found here
      • Installing and starting daemons
        • Linux: mpjboot machines
        • Windows: on each machine listed in machines file:
          • Run $MPJ_HOME/bin/installmpjd-windows.bat
          • Goto Control-Panel->Administrative Tools->Services-> MPJ Daemon and start the service. It is important to start the daemon as a user process instead of a SYSTEM process. Click here to see how can this be done.
        • To test if the daemons have started on compute-node
          • For Linux Only: Each daemon produces a MPJ-Daemon<machine_name>.pid file in $MPJ_HOME/bin directory.
          • Each daemon produces a log file named daemon-<machine_name>.log in $MPJ_HOME/logs directory.

    6. Running test cases
      • cd mpj-user
      • Linux: mpjrun.sh -np 2 -jar $MPJ_HOME/lib/test.jar
      • Windows: mpjrun.bat -np 2 -jar %MPJ_HOME%/lib/test.jar
      • You may view sample output.

    7. Running your first MPJE application
      • Write a MPJE program, and save it as World.java. This document is assuming that you have a 'machines' file in mpj-user directory.
      • cd mpj-user
      • Compile
        • Linux: javac -cp .:$MPJ_HOME/lib/mpj.jar World.java
        • Windows: javac -cp .;%MPJ_HOME%/lib/mpj.jar World.java
    8. Execute
      • Linux: mpjrun.sh -np 2 World
      • Windows: mpjrun.bat -np 2 World
    9. You may also make a JAR file 'hello.jar' that contains World.class (see section "Writing and compiling MPJE programs" for details) and execute it
      • Linux: mpjrun.sh -np 2 -jar hello.jar
      • Windows: mpjrun.bat -np 2 -jar hello.jar

Writing and compiling MPJE programs

  • cd mpj-user
  • Write your MPJE program. For example:
       import mpi.*;
       public class World {
         public World() {
         }
         public static void main(String args[]) throws Exception {
           MPI.Init(args);
           int me = MPI.COMM_WORLD.Rank();
           int size = MPI.COMM_WORLD.Size();
           System.out.println("Hi from <"+me+">");
           MPI.Finalize();
         }
       }
  • Save this program as World.java.
  • Compile the code
    • Linux: javac -cp .:$MPJ_HOME/lib/mpj.jar World.java
    • Windows: javac -cp .;%MPJ_HOME%/lib/mpj.jar World.java
    You may put mpj.jar in your CLASSPATH environment variable. If mpj.jar is in the CLASSPATH variable, the -cp switch is not required in above commands.
  • Create a JAR file (Optional)
    • Bundle up the class file into a JAR file. Either copy mpj.jar to the current directory or put absolute path of the JAR file in the manifest files Class-Path attribute. This document is assuming that you have copied mpj.jar to current directory. Copying mpj.jar is only required when you are running JAR files without runtime and you do not have relative path to 'mpj.jar' in the manifest file.
      • Write the following lines in a file called 'manifest'
      •         Manifest-Version: 1.0
                Main-Class: World
                Class-Path: mpj.jar
      • jar -cfm hello.jar manifest World.class
      • To view the contents of JAR file: jar -tf hello.jar
    • One word of caution: While using JAR files, -cp switch, -classpath switch, and system CLASSPATH variable is ignored. So its very important to specify correctly the dependencies in Class-Path attribute of manifest file. This attribute is required only if you are running JAR files manually, the runtime will take care of adding MPJ Express classes to CLASSPATH itself.

Running MPJE programs with MPJ Express runtime

One of the challenging aspects of a Java messaging system is creating a portable mechanism for bootstrapping MPJE processes across various platforms. If the compute-nodes are running a UNIX-based OS, it is possible to remotely execute commands using RSH/SSH, but if the compute-nodes were running Windows, these utilities would not be available. The MPJ Express runtime provides a unified way of starting MPJE processes on compute-nodes irrespective of what operating system they may be using. The runtime system consists of two modules. The 'daemon' module runs on compute-nodes and listens for requests to start MPJE processes. The daemon is simply a Java application listening on an IP port, which starts a new JVM every time there is a request to run a MPJE processes. The 'mpjrun' module acts as a client to the daemon module. This module is started on, for example, the cluster head-node, and will contact daemons and return standard output for the user to view.

With Java, it is possible to run applications using class files, or class files bundled as a JAR file. The MPJ Express runtime allows the execution of MPJE applications both as JAR files and class files. With MPJ Express, the users may want to load MPJE JARs and classes either remotely or locally on the compute-nodes. With remote loader, it is possible to load all classes (application and MPJ Express code) from the head-node. This is useful in scenarios when there is no shared file system and the code is constantly being modified at the head-node. With local loader, it is possible to load all classes (application and MPJ Express code) from the compute-node. This might be useful if there is a shared file system. As all classes are loaded locally, this might provide better performance in comparison to remote loader. The default loader used in MPJ Express runtime infrastructure is remote loader. 'mpjrun' module provides -jar switch to execute JAR files and no switch is required to execute class files. The users can select local loading with the switch -localloader. The -wdir switch can be used to run the code in the appropriate directory on the remote node. When running JAR files using -localloader, the users should put the JAR in the CLASSPATH using -cp switch.

MPJ Express uses the Java Service Wrapper Project software to install daemons as a native OS service. This essentially means that there is some platform specific code used in order to achieve this. Currently, MPJ Express is distributing only Linux and Windows specific native code, but if you are interested in running MPJ Express daemons on other platforms like AIX, FreeBSD, HP-UX, HP-UX64. IRIX, MacOS, etc., then you can download the platform specific code from Java Service Wrapper Project . Some PATH variables in the scripts for these platforms will have to be changed. Feel free to contact us , if you need any help regarding this. The rest of this section explains how to install, start, stop, and uninstall MPJ Express daemons on Linux and Windows. In addition, it also shows how to run your MPJE programs using mpjrun module on these platforms.

  • cd mpj-user
  • This document assumes mpj-user as the working directory for a user. The name mpj-user itself has no significance. We assume that the user will create a machines file in this directory. In addition, we assume that user's MPJE program (World.class or hello.jar) will be present in this directory when mpjrun script is invoked.
  • Write a machines file. This file is used by scripts like mpjboot, mpjhalt, mpjrun.bat and mpjrun.sh to find out which machines to contact. The 'machines' file format is explained in this subsection.
    • 'machines' file is simply a file stating machinename, IP addresses, or aliases of the nodes where you wish to execute MPJE processes. This file is also used by mpjboot and mpjhalt to start and stop daemons on Linux machines. Suppose you want to run a process each on 'machine1' and 'machine2', then your machines file would be as follows
              machine1
              machine2
      Note that in the real-world, 'machine1' and 'machine2' would be fully-qualified names, IP addresses, or aliases of your machines.
    • If you are executing mpjrun in directory called mpj-user, a command like mpjrun.sh -np 2 World would assume a 'machines' file is present in this directory. If you have a list of machines in a file (let us say) 'mymachines.txt' or in another directory, then you can use -machinesfile switch to point mpjrun to machines file. If you want to point mpjrun to mymachines.txt, the exact command would be mpjrun.sh -machinesfile mymachines.txt -np 2 World . This is also applicable to mpjrun.bat
    • Multiple processes may be run on a machine. mpjrun would first see how many processes the user has requested. Let us say the user has requested two processes; then it would try to read the first two entries in the machines file. If there were fewer than two entries in machines file, then mpjrun would start two processes on the only entry in machines file. Thus, it is not necessary to put the names of machines twice in 'machines' file. The script should still work if machine names have been repeated. If you want to run two processes on localhost, 'machines' file would look like this
              localhost
  • Installing, starting, stopping, and uninstalling MPJ Express daemons
    • On Linux:
      • Starting daemons on a set of compute-nodes
        • mpjboot machines
        • This should work if $MPJ_HOME/bin has been successfully added to $PATH variable. This script will SSH into each of the machine listed in machines file, change directory to $MPJ_HOME/bin, and execute mpjdaemon start command to start the daemon.
          • You will be asked for a password on remote machines if ssh-agent has not been configured to allow login without asking for password/pass-phrase. You may get some guidance here about setting up password-less SSH access to compute-nodes. But script will work even if you do not have password-less SSH access.
          • $MPJ_HOME variable should be available on remote nodes. This may be achieved by putting export statements in ~/.bashrc file of the remote node.
        • Making sure that the daemon is running
          • Linux Only: Each daemon produces a MPJ-Daemon<machine_name>.pid file in $MPJ_HOME/bin directory.
          • Each daemon produces a log file named daemon-<machine_name>.log in $MPJ_HOME/logs directory.
        • You may optionally run the daemon as a service
          • (Only as root) Copy mpjdaemon script to /etc/init.d directory and add it to default runtime level. On Gentoo GNU/Linux, this is,
            • rc-update add mpjdaemon default
          • It is also possible to run the daemon as a non-root user.
      • Shutting down MPJ Express daemons on a set of compute-nodes
        • mpjhalt machines
    • On Windows:
      • Installing MPJ Express daemons
        • Click/run %MPJ_HOME%/bin/installmpjd-windows.bat
      • Starting daemons
        • Goto Control Panel->Administrative Tools->Services->MPJ Daemon and start the service. It is important to start the daemon as a user process (preferably the currently logged in user) instead of a SYSTEM process. To start the daemons as user process, goto Control Panel->Administrative Tools->-Services, right-click MPJ Daemon service, click Properties, click "Log On" tab, For the option "Log on as:", select This account and put in the user name and password of this account, and start the service.
        • Making sure that the daemon is running
          • Linux Only: Each daemon produces a MPJ-Daemon<machine_name>.pid file in $MPJ_HOME/bin directory.
          • Each daemon produces a log file named daemon-<machine_name>.log in $MPJ_HOME/logs directory.
      • Stopping daemons
        • Goto Control Panel->Administrative Tools->Services->MPJ Daemon and stop the service.
      • Uninstalling daemons
        • Click/run %MPJ_HOME%/bin/uninstallmpjd-windows.bat to uninstall the daemon. This will have to be repeated manually for each machine running the daemon.
  • Configuring MPJ Express daemons using the configuration file (Optional)
    • There is a configuration file $MPJ_HOME/conf/wrapper.conf that can be used to configure MPJ Express daemons. It is important to note that any options specified in this file would only affect MPJ Express daemons, not user applications. The JVM for MPJ Express daemons and user applications are different. For providing options to user processes, JVM arguments or application arguments should be specified to mpjrun.sh or mpjrun.bat script. For a complete list of options that can be used in wrapper.conf to configure MPJ Express daemons, have a look here
  • Running your MPJE program.
    • Running class files
      • Linux: mpjrun.sh -np 2 World
      • Windows: mpjrun.bat -np 2 World
    • Running JAR files
      • Linux: mpjrun.sh -np 2 -jar hello.jar
      • Windows: mpjrun.bat -np 2 -jar hello.jar
    • Passing arguments to the JVM running MPJE program
      • mpjrun.bat or mpjrun.sh script accepts all JVM arguments and would forward these to the JVMs running MPJE processes on compute-nodes. For instance, if the users would like to pass -Xms512M and -Djava.library.path=/tmp as two arguments to World program, the exact command would be
        • For Linux: mpjrun.sh -np 2 -Xms512M -Djava.library.path=/tmp World
        • For Windows: mpjrun.bat -np 2 -Xms512M -Djava.library.path=c:/tmp World
    • Passing arguments to MPJE application.
      • Any arguments after "-jar <jarname>" or "classname" is treated as application argument by mpjrun.sh and mpjrun.bat scripts. MPI.Init(String[] args) returns a String array that contains user specified arguments. If the user has specified two arguments: apparg1 and apparg2, then MPI.Init(..) returns an array which has length 2, apparg1 at index 0, and apparg2 at index 1.

Running MPJE programs without MPJ Express runtime (manually)

We do not recommand starting programs manually as normal procedure. This section documents the procedure for manual start-up, mainly to allow developers the flexibility to create their own initiation mechanisms for MPJE programs. The runmpj.sh script can be considered one example of such a mechanism.
  • cd mpj-user
  • This document is assuming mpj-user as the working directory for users. The name mpj-user itself has no significance. We assume that users will create configuration file in this directory. In addition, we assume that the user's MPJE program (World.class or hello.jar) will be present in this directory at the time of execution of MPJE processes
  • Write a configuration file called 'mpj.conf' as follows.
    • A typical configuration file that would be used to start two MPJE processes is as follows. Note the names 'machine1' and 'machine2' would be replaced by aliases/fully-qualified-names/ IP-addresses of the machines where you want to start MPJE processes
           # Number of processes
           2
           # Protocol switch limit
           131072
           # Entry in the form of machinename@port@rank
           machine1@20000@0
           machine2@20000@1 
    • The lines starting with '#' are comments. The first entry which is a number ('2' above) represents total number of processes. The second entry, which is again a number ('131072' above) is the protocol switch limit. At this message size, MPJ Express changes its communication protocol from eager-send to rendezvous. There are a couple of entries, one for each MPJE process, each in the form of machinename(OR)IP@PORT_NUMBER@RANK. Using this, the users of MPJ Express can control where each MPJE process runs, what server port it uses, and what should be the rank of each process. The rank specified here should exactly match the rank argument provided while manually starting MPJE processes (using java command). When the users decide to run their code using mpjrun, this file is generated programmatically.
    • Sample configuration files can be found in $MPJ_HOME/conf directory. If you wish to start MPJ processes on localhost, see $MPJ_HOME/conf/local2.conf file.
    • Each MPJ process uses two ports. Thus, do not use consecutive ports if you are trying to execute multiple MPJE processes on same node. A sample file for running two MPJE processes on same machine would be
           # Number of processes
           2
           # Protocol switch limit
           131072
           # Entry in the form of machinename@port@rank
           localhost@20000@0
           localhost@20002@1
  • Running your MPJE program.
    • The script runmpj.sh requires password-less SSH access to machines listed in the configuration file. This script will not work if your machines are not setup for this. You may get some guidance here regarding setting up SSH so that no password/passphrase is required at login. This is the only script in this software which requires password-less access. An alternative to using runmpj.sh is the manual start-up (using java command directly -- see directions below)
    • Running class files
      • Linux: runmpj.sh mpj.conf World
        • Alternatively, the directions for Windows should work also.
      • Windows and Linux:
        • For all the machines listed in mpj.conf, login to each Windows or Linux machine, change directory to %MPJ_HOME% (Windows) or $MPJ_HOME (Linux) , and type,
          • Linux: java -cp .:$MPJ_HOME/lib/mpj.jar World <rank> mpj.conf niodev
          • Windows: java -cp .;%MPJ_HOME%/lib/mpj.jar World <rank> mpj.conf niodev
          • The <rank> argument should be 0 for process 0 and 1 for process 1. This should match to what has been written in configuration file (mpj.conf). Check the entry format in the configuration file to be sure of the rank.
    • Running JAR files
      • Linux: runmpj.sh mpj.conf hello.jar
        • Alternatively, the directions for Windows should work
      • Windows and Linux:
        • For all the machines listed in mpj.conf, login to each Windows or Linux machine, and type,
          • Linux: java -jar hello.jar <rank> mpj.conf niodev
          • Windows: java -jar hello.jar <rank> mpj.conf niodev
          • The <rank> argument should be 0 for process 0 and 1 for process 1. This should match to what has been written in configuration file (mpj.conf). Check the entry format in the configuration file to be sure of the rank.
    • Passing arguments to the JVM running MPJE program
      • Edit $MPJ_HOME/bin/runmpj.sh shell script to pass the arguments to the JVM.
    • Passing arguments to MPJE application.
      • Edit $MPJ_HOME/bin/runmpj.sh shell script to pass the arguments to the application. MPI.Init(String[] args) returns a String array that contains user specified arguments. If the user has specified two arguments: apparg1 and apparg2, then MPI.Init(..) returns an array which has length 2, apparg1 at index 0, and apparg2 at index 1.

MPJ Express test suite

MPJ Express contains a comprehensive test suite to test the functionality of almost every MPI function. This test suite consists mainly of mpiJava test cases, MPJ JGF benchmarks, and MPJ microbenchmarks. The mpiJava test cases were originally developed by IBM and later translated to Java. As this software follows the API of mpiJava, these test cases can be used with a little modification. MPJ JGF benchmarks are developed and maintained by EPCC at the University of Edingburgh . MPJ Express is redistributing these benchmarks as part of its test suite. The original copyrights and license remain intact as can be seen in source-files of these benchmarks in $MPJ_HOME/test/jgf_mpj_benchmarks. Further details about these benchmarks can be seen here. MPJ Express also redistributes micro-benchmarks developed by Guillermo Taboada . Further details about these benchmarks can be obtained here

The suite is located in $MPJ_HOME/tests directory. The test cases have been changed from their original versions, in order to automate testing. TestSuite.java is the main class that calls each of the test case present in this directory. The build.xml file present in test directory, compiles all test cases, and places test.jar into the lib directory. By default, JGF MPJ benchmarks and MPJ micro-benchmarks are disabled. Edit $MPJ_HOME/test/TestSuite.java to uncomment these tests and execute them. Note, after changing TestSuite.java, you will have to recompile the testsuite by executing 'ant' in test directory.

  • cd mpj-user
  • This document is assuming that mpj-user is the working directory for MPJ Express users and the name mpj-user itself has no significance. For this section, mpj-user should contain 'mpj.conf', which is the configuration file required if you are running the code without runtime(manually). If you are using the runtime, then this directory should contain 'machines' file, which contains a list of machines where MPJ Express daemons are running.
  • Running test cases with MPJ Express runtime
  • Running test cases without the runtime (manually)
  • Start the tests
    • Linux: runmpj.sh mpj.conf $MPJ_HOME/lib/test.jar
      • 'runmpj.sh' requires password-less SSH access to machines in the configuration file. To see how this can be done, look here .
      • Alternatively, the directions for Windows should work.
    • Windows and Linux:
      • For all the machines listed in mpj.conf, login to each Windows or Linux machine, type,
        • Linux: java -jar $MPJ_HOME/lib/test.jar <rank> mpj.conf niodev
        • Windows: java -jar %MPJ_HOME%/lib/test.jar <rank> mpj.conf niodev
        • The <rank> argument should be 0 for process 0 and 1 for process 1. This should match to what has been written in configuration file (mpj.conf). Check the entry format in the configuration file to be sure of the rank.
  • You may view the sample output of test cases at http://dsg.port.ac.uk/projects/mpj/docs/res/t-<VERSION>.txt For version 0.26, this would translate to: http://dsg.port.ac.uk/projects/mpj/docs/res/t-0.26.txt Click here to view it .

Compiling MPJ Express source code and test suite

  • Pre-requisites (For compiling and running the code)
    • Java 1.5 (stable) or higher
      • Verify this by executing,
        • 'java -version' (should be stable 1.5 or higher)
          • MPJ Express has been developed and tested using java 1.5. But it is possible to compile the code with '-source release' and '-target release'.
        • 'javac' (should see usage information)
    • Apache ant 1.6.2 or higher
      • Verify this by executing 'ant'. This command should display usage information,
    • Perl (Optional)
      • MPJ Express needs Perl for compiling source code because some of the Java code is generated from Perl templates. The build file will generate Java files from Perl templates if it detects perl on the machine. It is a good idea to install Perl if you want to do some development with MPJ Express.
      • Perl for Windows can be downloaded here
      • build.xml points to the Perl executable. You may need to edit the property perl.executable to reference the Perl executable.
  • Compiling MPJ Express source code
    • Being in $MPJ_HOME directory, execute ant
      • Produces mpj.jar, daemon.jar, and starter.jar in lib directory.
  • Compiling MPJ Express test-code
    • cd test
    • ant
      • This produces test.jar in lib directory.

Tested Platforms

  • Gentoo GNU/Linux (kernel 2.6.10)
  • Debian GNU/Linux 'Sarge' (kernel 2.4.30)
  • SuSE 9.0 GNU/Linux (kernel 2.4.21)
  • Red Hat Fedora Core 4 GNU/Linux (kernel 2.6.12)
  • Red Hat Linux 7.3 GNU/Linux (kernel 2.4.19-openmosix)
  • Windows XP (Service Pack 2)
  • Windows XP with cygwin (Service Pack 2)

Java-docs

Java-docs can be seen in $MPJ_HOME/doc/javadocs

Contact and support

  • To support users of this software, we have setup a mailing list . The users are encouraged to subscribe and share their experiences of using MPJ Express. In case of any problems, please make sure that you have read the documentation (including this README). If your question(s) still remain unanswered, feel free to post to the list. Some useful pointers for the mailing list are following:
  • Alternatively, the users can contact us directly by email.

Miscellaneous

  1. Turning debugging on and off
    • MPJ Express uses log4j for logging purposes. By default, the logging is turned off. To turn logging on/off, the users can edit certain Java files to turn it off/on. This can be achieved in two ways.
      • Less efficient way,
        • Edit $MPJ_HOME/src/mpi/MPI.java, and uncomment this line to turn logging on. //rep.setThreshold((Level) Level.OFF ) ; By calling setThreshold(..) method of LoggerRepository associated with rootLogger, the threshold level for logging can be set. If this level is Level.OFF then all logging is dropped by the rootLogger. By default, the level is set to Level.ALL
        • Recompile the code
      • More efficient way,
        • Edit src/mpi/MPI.java and change value of static boolean DEBUG flag to false
        • Recompile the code
        • This approach is preferred for benchmarking MPJ
    • The runtime infrastructure also uses log4j. By default, the logging for the runtime is turned on. To turn it off, the users can edit src/runtime/starter/MPJRun.java and src/runtime/daemon/MPJDaemon.java and change the value of DEBUG flag.
    • MPJ Express runtime daemons can be debugged as follows:
      • Edit $MPJ_HOME/conf/wrapper.conf file.
      • Change the value of wrapper.logfile.loglevel from "NONE" to "DEBUG".
      • Now the output of mpjboot, mpjhalt, and other daemon activities can be seen in $MPJ_HOME/logs/wrapper.log file. This information is pretty useful for diagnosing and fixing daemons errors.

  2. Changing protocol switch limit
    • MPJ Express uses two communication protocols: the first is 'eager-send', which is used for transferring small messages. The other protocol is rendezvous protocol useful for transferring large messages. The default protocol switch limit is 128 KBytes. This can be changed prior to execution in following ways depending on whether you are running processes manually or using the runtime.
      • Running MPJE applications manually (without using runtime): The users may edit configuration file (for e.g. $MPJ_HOME/conf/mpj2.conf) to change protocol switch limit. Look at the comments in this configuration file. The second entry, which should be 131072 if you have not changed it, represents protocol switch limit
      • Running MPJE applications with the runtime: Use -psl <val> switch to change the protocol switch limit

  3. For debugging purposes, sometimes it is useful to run the daemons in console mode. This can be achieved in the following way:
    1. cd $MPJ_HOME/bin
    2. On UNIX systems, execute ./mpjdaemon_linux_x86_32 console . Here we are starting the daemon on a 32 bit x86 processor. Choose the appropriate script for your machine.
    3. On Windows, execute cd %MPJ_HOME%/bin ; wrapper.exe -c ../conf/wrapper.conf

  • With default settings attempting to start MPJ Express daemons on UltraSPARC Solaris, PowerPC (PPC) Linux, or PPC Mac OS X would result in an error like this:
 
     mpjboot machines 
     Starting mpjd... 
     ./mpjdaemon_linux_x86_32: line 1: ./daemon_linux_x86_32: cannot execute binary file 
  
  • Solaris
     
          a. Edit $MPJ_HOME/bin/mpjboot and $MPJ_HOME/bin/mpjhalt
          b. Comment the line ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_32 start;"
          c. Uncomment the line #ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_solaris_sparc_64 start;"
          d. cd $MPJ_HOME/lib
          e. cp libwrapper.so_solaris_sparc_64 libwrapper.so
         
  • PPC64 Linux
     
          a. Edit $MPJ_HOME/bin/mpjboot and $MPJ_HOME/bin/mpjhalt
          b. Comment the line ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_32 start;"
          c. Uncomment the line #ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_ppc_64 start;" 
          d. cd $MPJ_HOME/lib
          e. cp libwrapper.so_linux_ppc_64 libwrapper.so
         
  • PPC32 Mac OS X
     
          a. Edit $MPJ_HOME/bin/mpjboot and $MPJ_HOME/bin/mpjhalt
          b. Comment the line ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_linux_x86_32 start;"
          c. Uncomment the line #ssh $host "cd $MPJ_HOME/bin;./mpjdaemon_macosx_ppc_32 start;"
          d. cd $MPJ_HOME/lib
          e. cp libwrapper.jnilib_macosx_ppc_32 libwrapper.jnilib
         

  • To see API differences between mpiJava-1.2.x, and MPJ Express, read $MPJ_HOME/doc/APICHANGES.txt

  • We would like to thank:
  • Hong Ong for his input to the initial design of niodev in particular and the software in general,
  • Guillermo Taboada for alpha testing of the software
  • Mohsan Jameel for testing of the software

Known issues and limitations

  • There is a known (upto some extent) problem on Windows and Solaris that results in hanging MPJ processes. Normally this will be observed when MPJ test-cases will hang, as result, not completing or throwing any error message.

    We partially understand the problem but if some user encounters this problem, we would request some more debugging information. The required information can be obtained as follows. Edit $MPJ_HOME/src/xdev/niodev/NIODevice.java and goto line 3673 and uncomment the line "ioe1.printStackTrace() ;". The line 3673 is in the MPJ Express release 0.27 and it might change in the future. The general code snippet is like this:

     
    
          catch (Exception ioe1) {
            if(mpi.MPI.DEBUG && logger.isDebugEnabled() )  {
              logger.debug(" error in selector thread " + ioe1.getMessage());
            }
            //ioe1.printStackTrace() ;
          } //end catch(Exception e) ...
    
          if(mpi.MPI.DEBUG && logger.isDebugEnabled()) {
            logger.debug(" last statement in selector thread");
          }
    
        } //end run()
    
      }; //end selectorThread which is an inner class
    
     


    As a result now, when test-cases are executed again, users will see stacks periodically. Most of these are related to socket closed exceptions that are normal. If the code hangs now, the latest stack trace that is not about socket being closed is perhaps the reason of this hanging behaviour. We would request the users to kindly email us the output so that we can fix the problem. A stack trace that leaves MPJ Express hanging on Solaris is as follows:

             java.nio.channels.CancelledKeyException
               at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
               at sun.nio.ch.SelectionKeyImpl.readyOps(SelectionKeyImpl.java:69)
               at java.nio.channels.SelectionKey.isAcceptable(SelectionKey.java:342)
               at xdev.niodev.NIODevice$2.run(NIODevice.java:3330)
               at java.lang.Thread.run(Thread.java:595)
          



  • Some users have noticed that it takes a long time to bootstrap MPJ Express processes. For example,

    user@machine:~/mpj-user> mpjrun.sh -np 6 -jar $MPJ_HOME/lib/test.jar
    16:15:43.400 EVENT  Starting Jetty/4.2.23
    16:15:43.415 EVENT  Started HttpContext[/]
    16:15:43.419 EVENT  Started SocketListener on 0.0.0.0:15000
    16:15:43.419 EVENT  Started org.mortbay.http.HttpServer@23ac23ac
    16:15:43.420 EVENT  Starting Jetty/4.2.23
    16:15:43.420 EVENT  Started HttpContext[/]
    16:15:43.421 EVENT  Started SocketListener on 0.0.0.0:15001
    16:15:43.421 EVENT  Started org.mortbay.http.HttpServer@50265026
    
    [ pause for a minute or two ]
    
    Starting process <0> on Starting process <1> on 
    [ pause for a minute or two ]
    
    Starting process <2> on Starting process <3> on 
    [ pause for a minute or two ]
    
    Starting process <4> on Starting process <5> on 
    [ job starts ]
    

    Thanks to Andy Botting who is one of the users that identified this problem. This problem is perhaps related to name resolution and we are currently working to fix it.


  • The merge operation is implemented with limited functionality. The processes in local-group and remote-group *have* to specify 'high' argument. Also, the value specified by local-group processes should be opposite to remote-group processes.

  • The merge operation is implemented with limited functionality. The processes in local-group and remote-group *have* to specify 'high' argument. Also, the value specified by local-group processes should be opposite to remote-group processes.

  • Any message sent with MPI.PACK can only be received by using MPI.PACK as the datatype. Later, MPI.Unpack(..) can be used to unpack different datatypes.

  • Using 'buffered' mode of send with MPI.PACK as the datatype really does not use the buffer specified by MPI.Buffer_attach(..) method.

  • Cartcomm.Dims_Create(..) is implemented with limited functionality. According to the MPI specifications, non-zero elements of 'dims' array argument will not be modified by this method. In this release of MPJE, all elements of 'dims' array are modified without taking into account if they are zero or non-zero.

  • Request.Cancel(..) is not implemented in this release.

  • MPJ applications should not print more than 500 characters in one line. Some users may use System.out.print(..) to print more than 500 characters. This is not a serious problem, because printing 100 characters 5 times with System.out.println(..) will have the same effect as printing 500 characters with one System.out.print(..)

  • Some users may see this exception while trying to start the mpjrun module. This can happen when the users are trying to run mpjrun.bat script. The reason for this error is that the mpjrun module cannot contact the daemon and it tries to clean up the resources it has. In doing so, it tries to delete a file named 'mpjdev.conf' using File.deleteOnExit() method. This method appears not to work on Windows possibly because of permission issues.
          Exception in thread "main" java.lang.RuntimeException: Another mpjrun
          module is already running on this machine
            at runtime.starter.MPJRun.(MPJRun.java:135)
            at runtime.starter.MPJRun.main(MPJRun.java:925)
      
    This issue can be resolved by deleting mpjdev.conf file. This file would be present in the directory, where your main class or JAR file is present. So for example, if the users are trying to run "-jar ../lib/test.jar", then this file would be present in ../lib directory.

  • Permission issues while using MPJ Express runtime with Windows
    • Problem: The users may run into issues with starting daemons on Windows. The reason is that when MPJE processes are started manually, the owner is the user who started them. Thus the log files produced by these processes are owned by the user. On the other hand, the daemon is installed as a SYSTEM service. Thus, while starting the daemon, it may not be able to write to the log file, because logs directory is owned by user, whereas for daemon to be able to write to this directory, it has to be globally accessible. Even when the daemon is started, MPJE processes may not be able to write process log files because these log files are owned by the user, whereas now they are required to be globally accessible as MPJE processes started by the daemon are also SYSTEM processes. So the problem is caused if the users switch from running their code manually to the runtime, or possibly vice-versa.
    • Solution:
      • This can be avoided by starting MPJ Express daemons as user process instead of SYSTEM process. To restart the daemons as user process, goto "Control Panel->Administrative Tools->-Services", right-click MPJ Daemon service, click Properties, click "Log On" tab, For the option "Log on as:", select This account and put in the user name and password of this account, and restart the service. It should now be started as a user process. To make sure if its running as a user process, open process manager by pressing "ctrl-alt-delete" and look for processes "wrapper.exe" and "java.exe". The UserName should be the user name of this account instead of SYSTEM. There may be other java processes running on the machine, which may end up showing multiple java.exe on the process list. If this is the case, then "wrapper.exe" is the only process that is representing the MPJ Express daemon.
      • Delete all log files before the first execution
      • Execute following on cygwin
        • chmod a+w $MPJ_HOME/logs
        • chmod a+x $MPJ_HOME/lib/*.dll
        • chmod a+w $MPJ_HOME/logs/wrapper.log
          • If wrapper.log is present

  • Mixing local loading and remote loading may end up in ClassNotFoundException for $MPJ_HOME/src/mpi/MPI.java class. This specific class is the one shown in exception stack trace because this is the entry point to MPJE classes. This can happen in scenarios when you have MPJE application in your working directory, CLASSPATH contains ".", and you are using remote loading. Under the hoods, the application is loaded by the MPJE daemon using local loader (because working directory contains applications) and this application tries to load MPJE classes though a URL. Because this application is loaded using the local loader (default), it will not be able to load MPJE classes from a URL. To avoid this error, it is necessary not to have your applications in working directory when using remote loader. If your programs reads some file, then it may be a good idea to separate this file from your application classes, or copy it to a tmp directory and specify this tmp directory as working directory using -wdir switch.