STAT and ATP – NERSC Documentation
Mục lục bài viết
STAT and ATP¶
Note
The tool may fail to work properly without loading the cray-cti
module. Until this is added automatically by the system, please load the module, too.
STAT (the Stack Trace Analysis Tool) is a highly scalable, lightweight tool that gathers and merges stack traces from all of the processes of a parallel application. The results are then presented graphically as a call tree showing the location that each process is executing.
This is a useful tool for debugging an application that hangs because collected call backtraces can quickly tell you where each process is executing at the moment in the code, providing a hint on where to look further for more detailed analysis.
It supports distributed-memory parallel programming only such as MPI, Coarray Fortran and UPC (Unified Parallel C).
One way to collect backtraces under Slurm is explained below.
-
Start an interactive batch job and launch an application in background. Keep the process ID (PID).
$
1
30
:00[
...other]
... $4
[
...other]
&
[
1
]
95298
You can also see the PID by running the
ps
command:$
95018
00
:00:0095298
00
:00:0095302
00
:00:0095325
00
:00:00 -
Load the
stat
module on Cori. Loadcray-stat
on Perlmutter:module
# Cori
module# Perlmutter
-
Run the
stat-cl
command on this process. You may want to use the-i
flag to gather source line numbers, too:$
95298
STAT2016
-11-30-07:33:37 Attaching(
null)
:95298stat-cl
takes several backtrace samples after attaching to the running processes. The result file is created in thestat_results
subdirectory under the current working directory. This subdirectory contains another subdirectory whose name is based on your parallel application’s executable name that contains the merged stack trace file in DOT format. -
Then, run the GUI command,
stat-view
(orSTATview
), with the file above to visualize the generated*.dot
files for stack backtrace information.stat-view
Please note that, if you’re running on Cori KNL nodes, you have to go to a login node for this step (after loading the stat module there). Otherwise, fonts will not be shown correctly.
The above call tree diagram reveals that rank 0 is in the
init_fields
routine (line 172 ofjacobi_mpi.f90
), rank 3 in theset_bc
routine (line 214 of the same source file), and the other ranks (1 and 2) are in theMPI_Sendrecv
function. If this pattern persists, it means that the code hangs in these locations. With this information, you may want to use a full-fledged parallel debugger such as DDT or TotalView to find out why your code is stuck in these places.
Note
The tool may fail to work properly without loading the cray-cti
module. Until this is added automatically by the system, please load the module, too.
Another useful tool in the same vein is ATP (Abnormal Termination Processing) that Cray has developed. ATP gathers stack backtraces when the code crashes, by running STAT before it exits.
The atp
module is load by default on Cori. However, it is not on Perlmutter. Ensure that the target application is built with debug symbols (usually -g
) after loading the module. Note also that, when the module is loaded, applications built with the Cray or GNU compilers are automatically linked against the ATP signal handler.
To enable it at runtime so that it generates stack backtrace info upon a failure, set the following environment variable before your srun
command in your batch script:
setenv
ATP_ENABLED
1
# for csh/tcsh
export
ATP_ENABLED
=
1
# for bash/sh/ksh
Intel Fortran and GNU Fortran have their own abnormal termination handling enabled by default. If ATP processing is desired instead, you need to set the FOR_IGNORE_EXCEPTIONS
environment variable if you’re using Fortran and you have built with the Intel compiler:
setenv
FOR_IGNORE_EXCEPTIONS
true
# for csh/tcsh
export
FOR_IGNORE_EXCEPTIONS
=
true
# for bash/sh/ksh
If your Fortran code is built with the GNU compiler, you will need to link with the -fno-backtrace
option.
When atp
is loaded no core file will be generated. However, you can get core dumps (core.atp.<apid>.<rank>
) if you set coredumpsize to unlimited:
unlimit
coredumpsize
# for csh/tcsh
ulimit
-c
unlimited
# for bash/sh/ksh
Even if Linux core dumping is enabled, ATP-specific core dumping can be disabled by setting the environment variable ATP_MAX_CORES
to 0.
More information can be found in the man page: type man intro_atp
or, simply, man atp
.
The following is to test ATP using an example code available in the ATP distribution package on Cori.
$
cp
$ATP_HOME
/demos/testMPIApp.c
.
$
cc
-o
testMPIApp
testMPIApp.c
$
cat
runit
#!/bin/bash
#SBATCH -N 1
#SBATCH -t 5:00
#SBATCH -q debug
export
ATP_ENABLED
=
1
srun
-n
8
./testMPIApp
1
4
$
sbatch
runit
Submitted
batch
job
3044170
$
cat
slurm-3044710.out
[
snip]
testApp:
(
31929
)
starting
up...
testApp:
(
31932
)
starting
up...
testApp:
(
31930
)
starting
up...
testApp:
(
31933
)
starting
up...
Application
3044170
is
crashing.
ATP
analysis
proceeding...
ATP
Stack
walkback
for
Rank
3
starting:
[email protected]:122
[email protected]:285
[email protected]:76
[email protected]:37
ATP
Stack
walkback
for
Rank
3
done
Process
died
with
signal
4
:
'Illegal instruction'
View
application
merged
backtrace
tree
with:
STATview
atpMergedBT.dot
You
may
need
to:
module
load
stat
srun:
error:
nid00009:
tasks
0
-3:
Killed
srun:
Terminating
job
step
3044170
.0
srun:
Force
Terminated
job
step
3044170
.0
[
snip]
ATP creates a merged stack backtrace files in DOT fomat in atpMergedBT.dot
(with function-level aggregation) and atpMergedBT_line.dot
(with line-level aggregation). The latter shows source line numbers. To view the collected backtrace result, you need to load the stat
module on Cori or cray-stat
on Perlmutter, and run stat-view
:
module
load
stat
# 'module load cray-stat' on Perlmutter
stat-view
atpMergedBT.dot
ATP can be a useful tool in debugging a hung application, too. You can force ATP to generate backtraces for a hung application by killing the application. To do that, you should have done necessary preparatory work such as setting the ATP_ENABLED
environment variable, etc. in the batch script for the job in question.
$
sacct
-j
3169879
# find job step id for the application - it's 3169879.0
JobID
JobName
Partition
Account
AllocCPUS
State
ExitCode
------------
----------
----------
----------
----------
----------
--------
3169879
runit
knl
mpccc
544
RUNNING
0
:0
3169879
.ext+
extern
mpccc
544
RUNNING
0
:0
3169879
.0
jacobi_mp+
mpccc
4
RUNNING
0
:0
3169879
.1
cti_dlaun+
mpccc
2
RUNNING
0
:0
$
scancel
-s
ABRT
3169879
.0
# Kill the application
$
cat
slurm-3169879.out
Application
3169879
is
crashing.
ATP
analysis
proceeding...
ATP
Stack
walkback
for
Rank
0
starting:
[email protected]:122
[email protected]:285
main@0x40a59d
MAIN__@jacobi_mpi.f90:174
ATP
Stack
walkback
for
Rank
0
done
Process
died
with
signal
6
:
'Aborted'
View
application
merged
backtrace
tree
with:
STATview
atpMergedBT.dot
You
may
need
to:
module
load
stat
[
snip]
$
stat-view
atpMergedBT_line.dot
The above example is to use SIGABRT in killing the application. There are other signals accepted by ATP. For info, please read the atp
man page.
If you cannot run your application interactively because your job requests a large number of nodes or it takes a long time to reach a problematic area, the above interactive approach is not practical. In that case, you can submit a non-interactive batch job where a signal is sent just before the job is supposed to end, and a signal handler cancels the application. The following example job script is to send the SIGUSR1
signal (a user-defined signal) 300 seconds before the job ends (#SBATCH --signal=B:USR1@300
) — this is when the application is presumed hung. The job runs a trap
command which then sets the canceL_srun
function to be invoked upon catching the signal. The function then cancels the application, triggering ATP to generate debug info. Note that, in this example, the srun
process is canceled with the Slurm job step ID, ${SLURM_JOB_ID}.0
, for the first srun
(“0”) of the job. If you have multiple srun
‘s in a job script and you want to target a certain job step, the proper step ID should be used.
#!/bin/bash
#SBATCH -N 2
#SBATCH -C knl
#SBATCH -t 30
#SBATCH -q regular
#SBATCH --signal=B:USR1@300
module
load
cray-cti
export
ATP_ENABLED
=
1
export
FOR_IGNORE_EXCEPTIONS
=
true
# Fortran code built with the Intel compiler
cancel_srun()
{
echo
SLURM_JOB_ID
=
$SLURM_JOB_ID
scancel
-s
ABRT
${
SLURM_JOB_ID
}
.0
}
srun
-n
8
--cpu-bind=
cores
./jacobi_mpi
&
trap
cancel_srun
SIGUSR1
wait