Customize tracing event behavior

-e expression is the most complicated option which you use a qualifying expression to denote which events to trace or how to trace them. The format of expression is:

[qualifier=][!][?]value1[,[?]value2]...

Qualifier can be trace (or t), abbrev(or a), verbose (or v), raw (or x), signal/signals (or s), read/reads (or r), write/writes (or w), fault, inject, or kvm. If no qualifier is specified, trace is the used. E.g., -e open is equal to -e trace=open or -e t=open.

The value part can contain one or more values, and it depends on the qualifier. E.g., -e t=open,write means only output open and write system calls. ! is used to filter out the specified system call. E.g.:

# strace -e trace=\!write ls

This won't trace write system call (The ! need to be escaped in bash shell). ? is used to suppress the error. Compare the following outputs:

# strace -e trace=op ls
strace: invalid system call 'op'
# strace -e trace=?op ls
project
+++ exited with 0 +++

"@64", "@32", or "@x32" suffixes can be used to specify system calls only for the 64-bit, 32-bit, or 32-on-64-bit respectively.

The value can also be all or none which mean following all system calls or nothing:

# strace -e trace=all ls
execve("/usr/bin/ls", ["ls"], 0x7ffd5ae4ff10 /* 20 vars */) = 0
brk(NULL)                               = 0x55aae08a1000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffcdecdc3e0) = -1 EINVAL (Invalid argument)
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
openat(AT_FDCWD, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=98317, ...}) = 0
......
# strace -e trace=none ls
project
+++ exited with 0 +++

Values for trace qualifier

Below table describes the valid values for trace qualifier (Most are just copied from strace manual):

Value Meaning
set Trace the system calls in set. E.g., "strace -e write ls".
/regex Trace all the system calls which match regular expression. E.g., "strace -e /wr* ls".
%file Trace all the system calls which take a file name as an argument. It incules open, stat, etc.
%process Trace all the system calls which involve process management.Such as fork, wait, etc.
%network(or %net) Trace all the network related system calls.
%signal Trace all signal related system calls.
%ipc Trace all IPC related system calls.
%desc Trace all file descriptor related system calls.
%memory Trace all memory mapping related system calls.
%stat Trace stat syscall variants.
%lstat Trace lstat syscall variants.
%fstat Trace fstat and fstatat syscall variants.
%%stat Trace syscalls used for requesting file status (stat, lstat, fstat, fstatat, statx, and their variants).
%statfs Trace statfs, statfs64, statvfs, osf_statfs, and osf_statfs64 system calls. The same effect can be achieved with -e trace=/^(.*_)?statv?fs regular expression.
%fstatfs Trace fstatfs, fstatfs64, fstatvfs, osf_fstatfs, and osf_fstatfs64 system calls. The same effect can be achieved with -e trace=/fstatv?fs regular expression.
%%statfs Trace syscalls related to file system statistics (statfs-like, fstatfs-like, and ustat). The same effect can be achieved with -e trace=/statv?fs|fsstat|ustat regular expression.
%pure Trace syscalls that always succeed and have no arguments. Currently, this list includes arc_gettls, getdtablesize, getegid, getegid32, geteuid, geteuid32, getgid, getgid32, getpagesize, getpgrp, getpid, getppid, get_thread_area (on architectures other than x86), gettid, get_tls, getuid, getuid32, getxgid, getxpid, getxuid, kern_features, and metag_get_tls.

To know detailed information about which category a specific system call belongs to, you can refer sysent.h, sysent_shorthand_defs.h, and syscallent.h for your machine (E.g., X86_64 file is here).

abbrev qualifier

abbrev qualifier is used to omit printing each member of large structures. The default value is all. The following example shows how to check environment variables when executing execve system call:

# strace -e trace=execve -e abbrev=none ls
execve("/usr/bin/ls", ["ls"], ["SHELL=/bin/bash", "PWD=/root", "LOGNAME=root", "XDG_SESSION_TYPE=tty", "HOME=/root", "LANG=C", "SSH_CONNECTION=10.218.195.134 65"..., "XDG_SESSION_CLASS=user", "TERM=xterm", "USER=root", "SHLVL=1", "XDG_SESSION_ID=19", "XDG_RUNTIME_DIR=/run/user/0", "SSH_CLIENT=10.218.195.134 65285 "..., "PATH=/root/.cargo/bin:/usr/local"..., "DBUS_SESSION_BUS_ADDRESS=unix:pa"..., "HG=/usr/bin/hg", "MAIL=/var/spool/mail/root", "SSH_TTY=/dev/pts/0", "_=/usr/bin/strace"]) = 0
+++ exited with 0 +++

Actually, strace has -v option which just leverages abbrev qualifier (code is here):

    ......
    case 'v':
        qualify("abbrev=none");
        break;
    ......

verbose qualifier

verbose qualifier is used to instruct whether deference structures of system calls or not. The default value is all. E.g., if you don't want to deference structures of execve, you can do this:

# strace -e trace=execve -e verbose=execve ls
execve("/usr/bin/ls", ["ls"], 0x7fffc8641bd0 /* 20 vars */) = 0
+++ exited with 0 +++

raw qualifier

raw qualifier is used to print undecoded arguments in hexadecimal format. You can compare the output of access without and with raw qualifier:

# strace -e access  ls
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
+++ exited with 0 +++
# strace -e access -e raw=access ls
access(0x7f11091183a0, 0x4)             = -1 ENOENT (No such file or directory)
+++ exited with 0 +++

signal qualifier

By default, strace will capture all signals. signal qualifier can be used to customize which signals should be traced. Check the following code:

# cat dead_loop.c
#include <unistd.h>

int main(void)
{
        while (1)
        {
                sleep(1);
        }
        return 0;
}

Build and run it in one terminal:

# gcc dead_loop.c -o dead_loop
# ./dead_loop

Use strace to track it in another terminal:

# strace -p `pidof dead_loop`
strace: Process 18085 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd8492c6f0) = 0
......

Press Ctrl+C in first terminal to kill dead_loop process, you will find SIGINT related information is printed out in second terminal:

# strace -p `pidof dead_loop`
strace: Process 18085 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffd8492c6f0) = 0
......
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=825212664}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGINT {si_signo=SIGINT, si_code=SI_KERNEL} ---
+++ killed by SIGINT +++

If you ignore SIGINT, the SIGINT won't be showed:

# strace -p `pidof dead_loop` -e signal=\!int
strace: Process 18091 attached
restart_syscall(<... resuming interrupted nanosleep ...>) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc92b62710) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc92b62710) = 0
nanosleep({tv_sec=1, tv_nsec=0}, 0x7ffc92b62710) = 0
nanosleep({tv_sec=1, tv_nsec=0}, {tv_sec=0, tv_nsec=361293365}) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)

read and write qualifiers

read/write qualifiers are used to perform a full hexadecimal and ASCII dump of all the data read/write from file descriptors. Check following example:

# strace -e read=3 ls
execve("/usr/bin/ls", ["ls"], 0x7ffd0fd7d750 /* 21 vars */) = 0
......
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0  \0\0\0\0\0\0"..., 832) = 832
 | 00000  7f 45 4c 46 02 01 01 00  00 00 00 00 00 00 00 00  .ELF............ |
 | 00010  03 00 3e 00 01 00 00 00  20 20 00 00 00 00 00 00  ..>.....  ...... |
 | 00020  40 00 00 00 00 00 00 00  38 52 00 00 00 00 00 00  @.......8R...... |
 | 00030  00 00 00 00 40 00 38 00  09 00 40 00 16 00 15 00  [email protected]...@..... |
 | 00040  01 00 00 00 04 00 00 00  00 00 00 00 00 00 00 00  ................ |
 | 00050  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  ................ |
 | 00060  b0 14 00 00 00 00 00 00  b0 14 00 00 00 00 00 00  ................ |
......

inject qualifier

The definition of inject qualifier is like this:

-e inject=set[:error=errno|:retval=value][:signal=sig][:syscall=syscall][:delay_enter=usecs][:delay_exit=usecs][:when=expr]

It is used to tamper a specific set of system calls.

:error=errno can set the wrong return value of a system call. E.g.:

# strace -e inject=read:error=1 ls
......
read(3, 0x7ffca47eb418, 832)            = -1 EPERM (Operation not permitted) (INJECTED)
close(3)                                = 0
......

:retval=value does the reverse thing: set the correct return value:

# strace -e inject=arch_prctl:retval=0 ls
......                              = 0x561afd1f6000
arch_prctl(0x3001 /* ARCH_??? */, 0x7ffdcba37700) = 0 (INJECTED)

Please notice :error=errno and :retval=value are mutually exclusive.

:signal=sig sets which signal will be delivered when entering a system call:

# strace -e inject=read:signal=SIGSEGV ls
......
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0  \0\0\0\0\0\0"..., 832) = 832
--- SIGSEGV {si_signo=SIGSEGV, si_code=SI_KERNEL, si_addr=NULL} ---
+++ killed by SIGSEGV (core dumped) +++
Segmentation fault (core dumped)

:delay_enter=usecs/:delay_exit=usecs are used to set how many microseconds are delayed before entering/exiting system calls:

# strace -e inject=read:delay_enter=10:delay_exit=10 ls

:syscall=syscall is used to inject the specified system call, only "pure" system call is allowed now (please refer Values for trace qualifier section).

:when=expr is used to control the frequency of injection. By default every invocation will be injected. The format of the expression is one of the following:

Frequency Meaning
first For every syscall from the set, perform an injection for the syscall invocation number first only.
first+ For every syscall from the set, perform injections for the syscall invocation number first and all subsequent invocations.
first+step For every syscall from the set, perform injections for syscall invocations number first, first+step, first+step+step, and so on.

For example:

# strace -e inject=read:error=1:when=2 ls
execve("/usr/bin/ls", ["ls"], 0x7fff64d98c70 /* 21 vars */) = 0
......
read(3, "\177ELF\2\1\1\0\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0  \0\0\0\0\0\0"..., 832) = 832
......
read(3, 0x7ffec8c113a8, 832)            = -1 EPERM (Operation not permitted) (INJECTED)
......

The inject takes effect when calling read in the second time.

There are two more tracing events: fault=set[:error=errno][:when=expr] is similar as inject, and kvm=vcpu is used to print exit reason of kvm vcpu (requires kernel version 4.16 or later).

results matching ""

    No results matching ""