Profiling ------------------------- One goal for the second exercise is to make the program run as fast as possible. To measure the total execution time of your program you can either prepend `time` on the command line: .. code-block:: bash $ time mpirun -n 2 ./numsim_parallel lid_driven_cavity.txt ... real 0m23,020s user 0m45,108s sys 0m0,465s ("real" is the actual wallclock time for the program execution.) Or you can measure the runtime yourself in the program, e.g. by using `MPI_Wtime `_. It can be useful to know which parts of a program take most of the computational time. These can then be optimized. This optimization approach is generally independent of the parallel execution of the program. Using Gprof ============ In the following we will cover how to use the tool `gprof` for this purpose. `Gprof` collects some stats while your program is running and presents them in form of a text file afterwards. In order to use it you have to compile all files with `-pg -g` and link with `-pg`. In CMake this can be done by adding a PROFILE option like this: .. code-block:: cmake :linenos: option(PROFILE "Add flags to profile the program with gprof." OFF) if(PROFILE) SET(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pg -g") SET(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -pg -g") endif() Then, you can configure inside the `build` directory with this option turned on: .. code-block:: bash $ cmake -DPROFILE=ON .. After `make install` you get the new executable. It will run slower by a factor of roughly 2 but will produce runtime statistics in a binary file `gmon.out`. To visualize this file, use gprof with your program as argument: .. code-block:: bash $ gprof numsim_parallel > out1.txt $ gprof -l numsim_parallel > out2.txt Now, two text files with results were produced. `out1.txt` contains runtime information on function call level. `out2.txt` took much longer to create and contains runtime information on code line level. The following are exemplary results. Head of file "out1.txt": .. code-block:: bash Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls s/call s/call name 15.99 0.43 0.43 320326804 0.00 0.00 std::__array_traits::_S_ref(int const (&) [2], unsigned long) 15.99 0.86 0.43 320139373 0.00 0.00 std::array::operator[](unsigned long) const 15.61 1.28 0.42 67447502 0.00 0.00 Array2D::operator()(int, int) 8.18 1.50 0.22 384 0.00 0.01 SORParallel::solve() Head of file "out2.txt": .. code-block:: bash Flat profile: Each sample counts as 0.01 seconds. % cumulative self self total time seconds seconds calls ns/call ns/call name 8.55 0.23 0.23 std::array::operator[](unsigned long) const (array:190 @ 20ec4) 8.18 0.45 0.22 std::__array_traits::_S_ref(int const (&) [2], unsigned long) (array:56 @ 15e0d) 7.81 0.66 0.21 320326804 0.66 0.66 std::__array_traits::_S_ref(int const (&) [2], unsigned long) (array:55 @ 15df7) 7.44 0.86 0.20 320139373 0.62 0.62 std::array::operator[](unsigned long) const (array:189 @ 20eae) 4.83 0.99 0.13 Array2D::operator()(int, int) (array2d.cpp:36 @ 2fc0a) The first column states how many percent of the total runtime was spent in either the function call or the code line. The example presented here shows that the program spends most of the time in the "[]" operator of `std::array`. Furthermore, the SOR solver accounts for 8% of the total runtime. If you scroll down the files, you'll find more context on these function calls. With this information you could the further optimize the highlighted code. More advanced optimization ============================= If you want to understand and optimize your program even better, you could: * add `more efficient compilation options `_ to GCC, * improve `vectorization `_ (e.g. it is allowed to use `#pragma opemp simd `_), * evaluate your progress using `perf `_ or `inspecting the assembly `_, * improve the numerics, by more efficient abortion criteria in the solver, a CG-method etc., * implement communication hiding. These advanced topics are not necessary for the submission. But as we will evaluate the total runtime of your program, experts/enthusiasts in these fields will be rewarded.