Customize tracing event behavior
-e expression is the most complicated option which you use a qualifying expression to denote which events to trace or how to trace them. The format of expression is:
[qualifier=][!][?]value1[,[?]value2]...
Qualifier can be trace (or t), abbrev(or a), verbose (or v), raw (or x), signal/signals (or s), read/reads (or r), write/writes (or w), fault, inject, or kvm. If no qualifier is specified, trace is the used. E.g., -e open is equal to -e trace=open or -e t=open.
The value part can contain one or more values, and it depends on the qualifier. E.g., -e t=open,write means only output open and write system calls. ! is used to filter out the specified system call. E.g.:
# strace -e trace=\!write ls
This won't trace write system call (The ! need to be escaped in bash shell). ? is used to suppress the error. Compare the following outputs:
# strace -e trace=op ls
strace: invalid system call 'op'
# strace -e trace=?op ls
project
+++ exited with 0 +++
"@64", "@32", or "@x32" suffixes can be used to specify system calls only for the 64-bit, 32-bit, or 32-on-64-bit respectively.
The value can also be all or none which mean following all system calls or nothing:
# strace -e trace=all ls
execve("/usr/bin/ls", ["ls"], 0x7ffd5ae4ff10 /* 20 vars */) = 0
brk(NULL) = 0x55aae08a1000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffcdecdc3e0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=98317, ...}) = 0
......
# strace -e trace=none ls
project
+++ exited with 0 +++
Values for trace qualifier
Below table describes the valid values for trace qualifier (Most are just copied from strace manual):
| Value | Meaning |
|---|---|
| set | Trace the system calls in set. E.g., "strace -e write ls". |
| /regex | Trace all the system calls which match regular expression. E.g., "strace -e /wr* ls". |
| %file | Trace all the system calls which take a file name as an argument. It incules open, stat, etc. |
| %process | Trace all the system calls which involve process management.Such as fork, wait, etc. |
| %network(or %net) | Trace all the network related system calls. |
| %signal | Trace all signal related system calls. |
| %ipc | Trace all IPC related system calls. |
| %desc | Trace all file descriptor related system calls. |
| %memory | Trace all memory mapping related system calls. |
| %stat | Trace stat syscall variants. |
| %lstat | Trace lstat syscall variants. |
| %fstat | Trace fstat and fstatat syscall variants. |
| %%stat | Trace syscalls used for requesting file status (stat, lstat, fstat, fstatat, statx, and their variants). |
| %statfs | Trace statfs, statfs64, statvfs, osf_statfs, and osf_statfs64 system calls. The same effect can be achieved with -e trace=/^(.*_)?statv?fs regular expression. |
| %fstatfs | Trace fstatfs, fstatfs64, fstatvfs, osf_fstatfs, and osf_fstatfs64 system calls. The same effect can be achieved with -e trace=/fstatv?fs regular expression. |
| %%statfs | Trace syscalls related to file system statistics (statfs-like, fstatfs-like, and ustat). The same effect can be achieved with -e trace=/statv?fs|fsstat|ustat regular expression. |
| %pure | Trace syscalls that always succeed and have no arguments. Currently, this list includes arc_gettls, getdtablesize, getegid, getegid32, geteuid, geteuid32, getgid, getgid32, getpagesize, getpgrp, getpid, getppid, get_thread_area (on architectures other than x86), gettid, get_tls, getuid, getuid32, getxgid, getxpid, getxuid, kern_features, and metag_get_tls. |
To know detailed information about which category a specific system call belongs to, you can refer sysent.h, sysent_shorthand_defs.h, and syscallent.h for your machine (E.g., X86_64 file is here).
abbrev qualifier
abbrev qualifier is used to omit printing each member of large structures. The default value is all. The following example shows how to check environment variables when executing execve system call:
# strace -e trace=execve -e abbrev=none ls
execve("/usr/bin/ls", ["ls"], ["SHELL=/bin/bash", "PWD=/root", "LOGNAME=root", "XDG_SESSION_TYPE=tty", "HOME=/root", "LANG=C", "SSH_CONNECTION=10.218.195.134 65"..., "XDG_SESSION_CLASS=user", "TERM=xterm", "USER=root", "SHLVL=1", "XDG_SESSION_ID=19", "XDG_RUNTIME_DIR=/run/user/0", "SSH_CLIENT=10.218.195.134 65285 "..., "PATH=/root/.cargo/bin:/usr/local"..., "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "HG=/usr/bin/hg", "MAIL=/var/spool/mail/root", "SSH_TTY=/dev/pts/0", "_=/usr/bin/strace"]) = 0
+++ exited with 0 +++
Actually, strace has -v option which just leverages abbrev qualifier (code is here):
......
case 'v':
qualify("abbrev=none");
break;
......
verbose qualifier
verbose qualifier is used to instruct whether deference structures of system calls or not. The default value is all. E.g., if you don't want to deference structures of execve, you can do this:
# strace -e trace=execve -e verbose=execve ls
execve("/usr/bin/ls", ["ls"], 0x7fffc8641bd0 /* 20 vars */) = 0
+++ exited with 0 +++
raw qualifier
raw qualifier is used to print undecoded arguments in hexadecimal format. You can compare the output of access without and with raw qualifier:
# strace -e access ls
access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory)
+++ exited with 0 +++
# strace -e access -e raw=access ls
access(0x7f11091183a0, 0x4) = -1 ENOENT (No such file or directory)
+++ exited with 0 +++
signal qualifier
By default, strace will capture all signals. signal qualifier can be used to customize which signals should be traced. Check the following code:
# cat dead_loop.c
#include <unistd.h>
int main(void)
{
while (1)
{
sleep(1);
}
return 0;
}
Build and run it in one terminal:
# gcc dead_loop.c -o dead_loop
# ./dead_loop
Use strace to track it in another terminal:
# strace -p `pidof dead_loop`
strace: Process 18085 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd8492c6f0) = 0
......
Press Ctrl+C in first terminal to kill dead_loop process, you will find SIGINT related information is printed out in second terminal:
# strace -p `pidof dead_loop`
strace: Process 18085 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd8492c6f0) = 0
......
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=825212664}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
+++ killed by SIGINT +++
If you ignore SIGINT, the SIGINT won't be showed:
# strace -p `pidof dead_loop` -e signal=\!int
strace: Process 18091 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc92b62710) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc92b62710) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc92b62710) = 0
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=361293365}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
read and write qualifiers
read/write qualifiers are used to perform a full hexadecimal and ASCII dump of all the data read/write from file descriptors. Check following example:
# strace -e read=3 ls
execve("/usr/bin/ls", ["ls"], 0x7ffd0fd7d750 /* 21 vars */) = 0
......
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \0\0\0\0\0\0"..., 832) = 832
| 00000 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00 .ELF............ |
| 00010 03 00 3e 00 01 00 00 00 20 20 00 00 00 00 00 00 ..>..... ...... |
| 00020 40 00 00 00 00 00 00 00 38 52 00 00 00 00 00 00 @.......8R...... |
| 00030 00 00 00 00 40 00 38 00 09 00 40 00 16 00 15 00 [email protected]...@..... |
| 00040 01 00 00 00 04 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 00050 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ |
| 00060 b0 14 00 00 00 00 00 00 b0 14 00 00 00 00 00 00 ................ |
......
inject qualifier
The definition of inject qualifier is like this:
-e inject=set[:error=errno|:retval=value][:signal=sig][:syscall=syscall][:delay_enter=usecs][:delay_exit=usecs][:when=expr]
It is used to tamper a specific set of system calls.
:error=errno can set the wrong return value of a system call. E.g.:
# strace -e inject=read:error=1 ls
......
read(3, 0x7ffca47eb418, 832) = -1 EPERM (Operation not permitted) (INJECTED)
close(3) = 0
......
:retval=value does the reverse thing: set the correct return value:
# strace -e inject=arch_prctl:retval=0 ls
...... = 0x561afd1f6000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffdcba37700) = 0 (INJECTED)
Please notice :error=errno and :retval=value are mutually exclusive.
:signal=sig sets which signal will be delivered when entering a system call:
# strace -e inject=read:signal=SIGSEGV ls
......
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \0\0\0\0\0\0"..., 832) = 832
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)
:delay_enter=usecs/:delay_exit=usecs are used to set how many microseconds are delayed before entering/exiting system calls:
# strace -e inject=read:delay_enter=10:delay_exit=10 ls
:syscall=syscall is used to inject the specified system call, only "pure" system call is allowed now (please refer Values for trace qualifier section).
:when=expr is used to control the frequency of injection. By default every invocation will be injected. The format of the expression is one of the following:
| Frequency | Meaning |
|---|---|
| first | For every syscall from the set, perform an injection for the syscall invocation number first only. |
| first+ | For every syscall from the set, perform injections for the syscall invocation number first and all subsequent invocations. |
| first+step | For every syscall from the set, perform injections for syscall invocations number first, first+step, first+step+step, and so on. |
For example:
# strace -e inject=read:error=1:when=2 ls
execve("/usr/bin/ls", ["ls"], 0x7fff64d98c70 /* 21 vars */) = 0
......
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0 \0\0\0\0\0\0"..., 832) = 832
......
read(3, 0x7ffec8c113a8, 832) = -1 EPERM (Operation not permitted) (INJECTED)
......
The inject takes effect when calling read in the second time.
There are two more tracing events: fault=set[:error=errno][:when=expr] is similar as inject, and kvm=vcpu is used to print exit reason of kvm vcpu (requires kernel version 4.16 or later).