## Numerical Libraries for Serial and Parallel Solving

### BLAS/LAPACK

BLAS and Lapack are low-level libraries which support linear algebra operations. See netlib for definitions and the "reference" BLAS. Multiple versions are installed which should give the same arithmetic answers. Most will want to use Goto BLAS or Intel MKL BLAS. **The reference BLAS is very slow. ** **The MKL and GOTO BLAS implementations are much faster and should be used for production work **(for example, the SIESTA code is eight times faster in total runtime with GOTO vs. Reference BLAS).

##### GOTOBlas with LAPACK

GotoBLAS2 is for academic use and is from University of Texas TACC. For most applications, this is the highest performance BLAS available. It also includes most commonly used LAPACK functions. We have specific versions of GotoBLAS compiled for each processor in use at AHPCC. The last and best GotoBLAS version is GotoBLAS2-1.09. Since the GotoBLAS project is now finished, there will not be future updates for new processors, though the current library can be recompiled for new processors that are similar to current models.

If you use dynamic linking, you can select the proper library at runtime. To do so:

module gotoblas2

The link option will be shown in environment variables $GOTO_T and $GOTO_C. You can pass the string $GOTO_T to a makefile. The following variables are set automatically by the module:

GOTO_T=-L/share/apps/gotoblas2/dynamic/E5430 -lgoto2_gfortran -pthread

GOTO_C=-lgfortran

and the controlling runtime enviroment variable is $GOTO_NUM_THREADS, which should be set in your .bashrc to indicate the number of threads, usually 1 if you are also using mpi:

GOTO_NUM_THREADS=1

**If your application uses GOTOBlas and all cores as MPI threads, setting GOTO_NUM_THREADS larger than one will usually result in drastically slower performance.**

Use $GOTO_T to link fortran, and that and also $GOTO_C to link c programs:

gcc callblas.c $GOTO_T $GOTO_C; ifort callblas.f $GOTO_T

There are specific ports of GotoBLAS for each fortran compiler, so if you switch compilers with module, also unload and load gotoblas2.

The default gotoblas2 or gotoblas2/1.09 is dynamic or .so libraries which are linked at run time. The correct library for the processor which is running will be selected. There is a module "gotoblas2/1.09-static" to load the static GotoBLAS2 library as .a libraries which will be included in your executable. If using those, the processor type you compile on will be the processor type that is optimized for. The processor types are strings derived from the /proc/cpuinfo entry. Currently available are:

3.20GHz 6174 E5430 E5520 X7560

Star compute nodes and stargate have E5430 processors, GPU nodes have E5520 processors, and the high-memory node has AMD 6174 processors. You can change the target node by pasting the $GOTO_T variable and changing the subdirectory E5430 to one of the above values.

An unthreaded GotoBLAS is available by recompiling, please contact support if you need this option.

##### Intel MKL BLAS/LAPACK/Solvers

The Intel Math Kernel Library is commercial licensed software. There are a number of functions besides LAPACK and BLAS, such as sparse solvers. It normally gives very good performance on Intel processors, though usually a little slower than GotoBLAS. It is updated regularly for new Intel processors. It runs but normally does not give good performance on AMD processors such as the high-memory node. The logic for different processors and compilers is internal, so there is a single set of link libraries. To use:

module mkl

The module sets directories in $LIBRARY_PATH and $LD_LIBRARY_PATH for linking. The most common link libraries for LAPACK and BLAS are set in variables $MKL_T (threaded) and $MKL_S (unthreaded). With MPI at one process per core, you will probably want the unthreaded libraries. The following variables are set by the module automatically:

```
```

MKL_T=-lmkl_lapack -lmkl_intel_lp64 -lmkl_intel_thread -liomp5 -lmkl_core -lpthread -lm

MKL_S=-lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

LIBRARY_PATH=/share/apps/intel/11.1.064/lib/intel64:/share/apps/intel/11.1.064/mkl/lib/em64t

To compile directly, use either $MKL_T or $MKL_S:

ifort myblas.f $MKL_T ; gcc callblas.c $MKL_S

To use in a makefile, use "-L", extract the path containing mkl from $LIBRARY_PATH, and add either the threaded or unthreaded libraries as shown by the environment variable $MKL_T or $MKL_S.

-L /share/apps/intel/11.1.064/mkl/lib/em64t -lmkl_lapack -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lm

$MKL_T and $MKL_S are given for the common use case of LAPACK and/or BLAS with Intel 64. Other libraries are available in MKL but will link differently, see Intel's Linking_Examples.

### FFTW

FFTW (Fastest Fourier Transform in the West), is a software library for computing discrete Fourier transforms.

### MUMPS

MUMPS (MUltifrontal Massively Parallel sparse direct Solver) is a software application for the solution of large sparse systems of linear algebraic equations on distributed memory parallel computers.

### PETSc

PETSc (Portable, Extensible Toolkit for Scientific Computation) is a suite of data structures and routines for the parallel scalable solution of scientific applications modeled by partial differential equations. PETSc employs the message passing interface (MPI).

### ScaLAPACK

ScaLAPACK (Scalable LAPACK) is a library that includes a subset of Linear Algebra PACKage (LAPACK) routines redesigned for distributed memory MIMD parallel computers.