ptrace and NTPL, the missing manpage == Signals == A signal sent to a ptrace'd process or thread causes only the thread that receives it to stop and report to the attached process. Use tgkill to target a signal (for example, SIGSTOP) at a particular thread. If you use kill, the signal could be delivered to another thread in the same process. Note that SIGSTOP differs from its usual behavior when a process is being traced. Usually, a SIGSTOP sent to any thread in a thread group will stop all threads in the thread group. When a thread is traced, however, a SIGSTOP affects only the receiving thread (and any other threads in the thread group that are not traced). SIGKILL behaves like it does for non-traced processes. It affects all threads in the process and terminates them without the WSTOPSIG event generated by other signals. However, if PTRACE_O_TRACEEXIT is set, the attached process will still receive PTRACE_EVENT_EXIT events before receiving WIFSIGNALED events. See "Following thread death" for a caveat regarding signal delivery to zombie threads. == Waiting on threads == Cloned threads in ptrace'd processes are treated similarly to cloned threads in your own process. Thus, you must use the __WALL option in order to receive notifications from threads created by the child process. Similarly, the __WCLONE option will wait only on notifications from threads created by the child process and *not* on notifications from the initial child thread. Even when waiting on a specific thread's PID using waitpid or similar, __WALL or __WCLONE is necessary or waitpid will return ECHILD. == Attaching to existing threads == libthread_db (which gdb uses), attaches to existing threads by pulling the pthread data structures out of the traced process. The much easier way is to traverse the /proc/PID/task directory, though it's unclear how the semantics of these two approaches differ. Unfortunately, if the main thread has exited (but the overall process has not), it sticks around as a zombie process. This zombie will appear in the /proc/PID/task directory, but trying to attach to it will yield EPERM. In this case, the third field of the /proc/PID/task/PID/stat file will be "Z". Attempting to open the stat file is also a convenient way to detect races between listing the task directory and the thread exiting. Coincidentally, gdb will simply fail to attach to a process whose main thread is a zombie. Because new threads may be created while the debugger is in the process of attaching to existing threads, the debugger must repeatedly re-list the task directory until it has attached to (and thus stopped) every thread listed. In order to follow new threads created by existing threads, PTRACE_O_TRACECLONE must be set on each thread attached to. == Following new threads == With the child process stopped, use PTRACE_SETOPTIONS to set the PTRACE_O_TRACECLONE option. This option is per-thread, and thus must be set on each existing thread individually. When an existing thread with PTRACE_O_TRACECLONE set spawns a new thread, the existing thread will stop with (SIGTRAP | PTRACE_EVENT_CLONE << 8) and the PID of the new thread can be retrieved with PTRACE_GETEVENTMSG on the creating thread. At this time, the new thread will exist, but will initially be stopped with a SIGSTOP. The new thread will automatically be traced and will inherit the PTRACE_O_TRACECLONE option from its parent. The attached process should wait on the new thread to receive the SIGSTOP notification. When using waitpid(-1, ...), don't rely on the parent thread reporting a SIGTRAP before receiving the SIGSTOP from the new child thread. Without PTRACE_O_TRACECLONE, newly cloned threads will not be ptrace'd. As a result, signals received by new threads will be handled in the usual way, which may affect the parent and in turn appear to the attached process, but attributed to the parent (possibly in unexpected ways). == Following thread death == If any thread with the PTRACE_O_TRACEEXIT option set exits (either by returning or pthread_exit'ing), the tracing process will receive an immediate PTRACE_EVENT_EXIT. At this point, the thread will still exist. The exit status, encoded as for wait, can be queried using PTRACE_GETEVENTMSG on the exiting thread's PID. The thread should be continued so it can actually exit, after which its wait behavior is the same as for a thread without the PTRACE_O_TRACEEXIT option. If a non-main thread exits (either by returning or pthread_exit'ing), its corresponding process will also exit, producing a WIFEXITED event (after the process is continued from a possible PTRACE_EVENT_EXIT event). It is *not* necessary for another thread to ptrace_join for this to happen. If the main thread exits by returning, then all threads will exit, first generating a PTRACE_EVENT_EXIT event for each thread if appropriate, then producing a WIFEXITED event for each thread. If the main thread exits using pthread_exit, then it enters a non-waitable zombie state. It will still produce an immediate PTRACE_O_TRACEEXIT event, but the WIFEXITED event will be delayed until the entire process exits. This state exists so that shells don't think the process is done until all of the threads have exited. Unfortunately, signals cannot be delivered to non-waitable zombies. Most notably, SIGSTOP cannot be delivered; as a result, when you broadcast SIGSTOP to all of the threads, you must not wait for non-waitable zombies to stop. Furthermore, any ptrace command on a non-waitable zombie, including PTRACE_DETACH, will return ESRCH. == Multi-threaded debuggers == If the debugger itself is multi-threaded, ptrace calls must come from the same thread that originally attached to the remote thread. The kernel simply compares the PID of the caller of ptrace against the tracer PID of the process passed to ptrace. Because each debugger thread has a different PID, calling ptrace from a different thread might as well be calling it from a different process and the kernel will return ESRCH. wait, on the other hand, does not have this restriction. Any debugger thread can wait on any thread in the attached process.