Monday, February 4, 2008

MPI parallelization of MCNPX 2.5.0 on AMD64

So finally we have MCNPX-2.5.0 running with MPI parallelization. This is just to write down some of the problems and the solutions (before we forget what was the point).

The target platform was Centos 4 system on AMD64 based SMP machines. The build was always done in a directory outside the original source tree.

Compiling the code without MPI:
Using Intel 10.1 compiler collection:
Linked f90 and cc to icc and ifort. Configure called as:
/usr/local/src/mcnp/v250/configure --prefix=/share/apps/mcnpx-2.5.0 \
--with-FC=ifort \
--with-CC=icc \
--host=i686-pc-linux

It is necessary to use the host specification as the x86_64 platform is not recognised otherwise.
Intel compilers were used later on. To get tests running, go to Test dir in the source dir and:

ln -s Test.intel.linux.ifc.icc Test.intel.linux.ifort.icc

This is necessary as well for the other compilers.

Using Sun Studio 12 compiler for Fortran 90 and gcc 3.4.6:
The code compiled OK, just it was necessary to link f90 and cc to the Sun compilers (I linked f77 as well, but that is probably not necessary).

Using gfortran 4.1.2 and gcc 3.4.6:
There are some issues with GNU gfortran - after changes to the code described in http://mcnpx.lanl.gov/opendocs/installation/Intel-GCC_Linux_64.txt the code got compiled, but it crashed with segmentation fault even during make tests. I did not explore the issue further. I linked f90 -> gfortran before make, but it was maybe not necessary.

Called configure as:

.../v250/configure --prefix=/usr/local/mcnpx \
--with-FC=f90 \
--with-CC=cc \
--host=i686-pc-linux \
--with-FFLAGS="-DUNIX=1 -DLINUX=1 -DG95=1" \
--with-CFLAGS="-DUNIX=1 -DLINUX=1"

Adding MPI:
This part I tried only with the Intel compilers. I took MPICH 1.0.6 and compiled from sources, installed in a cluster-wide shared directory.

Run configure as:

/usr/local/src/mcnp/v250/configure --prefix=/share/apps/mcnpx-2.5.0-mpich2-64bit \
--with-FC=mpif90 \
--with-CC=mpicc \
--with-FFLAGS="-i8" \
--with-NOCHEAP \
--host=i686-pc-linux \
--with-MPILIB="-L/share/apps/mpich2-1.0.6/lib -lmpich"

Unfortunately, make broke with error stating something about EOF when looking for '"'. There were -lrt" strings in Makefile.h in src subdirectories. We corrected this by:

find ./ -name Makefile.h -exec ~/do.sh {} \;

where the do.sh was:

#!/bin/bash
CO=$1
cp $CO ${CO}.old
sed 's/lrt"/lrt/g' <${CO}.old >$CO

After this the code compiled OK, however when trying to use Tomas Vrba's large lattice file in a MPI environment, mcnpx failed with "Segmentation fault". The code run with smaller input files.

We did not find any notes about such problem. Finally we realized that the system has stack size limit set to ten MB in bash - setting to unlimited (ulimit -s unlimited) solved the problem.