Compiling, Linking and Running
- I get a "segmentation fault" error during my run. What happened?
- I get a "undefined reference to" error during linking. What can I do?
- I get an "error while loading shared libraries". What can I do?
- BLACS linking with mpicc: undefined reference to `mpi_init_'?
- How can I submit a parallel job using both MPI and OpenMP?
- Which ACML library should I use?
I get a "segmentation fault" error during my run. What happened?
You get a segmentation fault error when your programs tries to access memory address space out of what is assigned to your program.
This can be caused by (sorted by frequency):
This can be caused by (sorted by frequency):
- using wrong index in arrays: for example you try to access element 1000 of a 100 elements array, or when you compute an index in a wrong manner
- using a pointer with a wrong address: for example using a not initialized pointer, or a pointer associated to a failed dynamic allocation
- in C programs, using wrong type or number of arguments to printf/scanf (and similar) functions
- too aggressive compiler optimization: aggressive optimization activate tricks that are good under some assumption, but can cause problems if these assumptions are not sotisfied
- out of stack problems: your program declare too big data structures
- compile your program using
-O0 -g
compiler options - also add compiler options to activate boundary checks (
-check bounds
for ifort,-Mbounds
for pgf90, seeman
pages) - run your program with a simple test case
- use TotalView to debug your program in an interactive section
I get a "undefined reference to" error during linking. What can I do?
An ‘undefined reference to symbol_name’ is a typical linking error (after compiling, linking is the last phase in building an executable) and pops up when you didn’t provide all needed object files (
Suggested Solutions:
.o
files) or libraries.Suggested Solutions:
- from the undefined reference error, try to understand where the missing symbol can be found
- check that you have provided the correct path and library name in your linking command line, specify
-L/path/to/your/lib/dir -llib_name
for each needed library
I get an "error while loading shared libraries". What can I do?
When you run a program and get the following error:
Suggested Solutions:
error while loading shared libraries: libfoo.so: cannot open shared object file: No such file or directorythis means that your program has been built using dynamic linking, that is needed libraries are loaded at run-time. So the system must find out where to look for these libraries. The
LD_LIBRARY_PATH
environment variable contains a list of directories, separated by “:”, the OS will use to look for dynamic libraries.Suggested Solutions:
- load all needed modules you have used to build your program;
- before running the program, use the command
ldd your_executable_name
: you will get a list of all dynamic libraries needed by your executable at run-time. Check that all your libraries are correclty found; - set the
LD_LIBRARY_PATH
environment variable properly, so to add needed directories to the list;
BLACS linking with mpicc: undefined reference to `mpi_init_'?
When linking a C program with the mpicc openmpi compiler to BLACS libraries, for example using ScaLAPACK, you should also add the following mpi fortran compiler libraries to your linking line command -lmpi_f90 -lmpi_f77
How can I submit a parallel job using both MPI and OpenMP?
Suppose you want to run 4 MPI processes, each one being parallelized with OpenMP and running on 8 threads. Your PBS job script should:
- ask for 32 proccessors, distributed on 4 nodes:
#PBS -l nodes=4:ppn=8
- load one of the OpenMPI modules
- set the number of OpenMP threads of each MPI process:
- with
bash
:export OMP_NUM_THREADS=8
- with
tcsh
:setenv OMP_NUM_THREADS 8
- with
- invoke
mpiexec
with the following switches:-n 4
: to set the number of the MPI processes to run--bynode
: to allocate the MPI processes with a per-node scheme-x OMP_NUM_THREADS
: to export the variableOMP_NUM_THREADS
to the job environment
#!/bin/bash #PBS -l nodes=4:ppn=8 module load openmpi/1.2.5/64/pgi-8.0-1 export OMP_NUM_THREADS=8 mpiexec -x OMP_NUM_THREADS --bynode -n 4 your_executablePlease note that in the presence of
--bynode
, the -n
option cannot be omitted. For further information about the command-line options of mpiexec
, please look at the mpiexec
man page.Which ACML library should I use?
As you can see, ACML libraries come in 2 different flavours. Use them carefully:
- acml : standard ACML 64bit. This is the right choice for most users, good in almost all cases. Try them first;
- acml_mp : ACML 64bit libraries with multi-threaded support. Some ACML routines have been enabled to run in multi-threaded mode, according to the number of threads defined by the
OMP_NUM_THREADS
environment variable. Use it only if you are running serial program on a single node or if you kow what you are doing. Simply add an underscore “_mp” to the$ACMLDIR
environment variable set byacml
module
- Link static library:
gfortran -fopenmp ${ACMLDIR}_mp/lib/libacml_mp.a BETAfno -l option is needed since you passed the library name to -L, and no shared libraries will be used.
- Use shared libraries:
gfortran -fopenmp -L${ACMLDIR}_mp/lib -Wl,-rpath,${ACMLDIR}_mp/lib BETA.f -lacml_mpthe
rpath
option argument will be used to search for the right shared libraries.
No comments:
Post a Comment