Defunct Process Problem Determination


Contents

About this document
Problem determination
What to do if the PPID is 1
What to do if the PPID is not 1
What to do if further problem determination is needed

About this document

This document explains how defunct processes may occur, how to determine their cause, and what information to provide to software vendors if further problem determination is necessary. This document applies to all versions of AIX 3.1 through 4.x.

The AIX Operating System implements a hierarchy of processes in that each process has a parent process. A process which terminates may notify the parent process, and the parent process may collect status information for that child. When the parent process fails to collect the status information in a timely manner, a defunct process may remain.

Because a defunct process is already stopped, killing the process will not have any effect. The defunct process does not use any system resources, such as CPU or disk, and uses a minimum amount of memory for storing the exit status and resource usage information.


Problem determination

The parent of a child process is responsible for reaping (removing) the child process when it terminates. To find the parent process, run the ps -ef command. The PPID column gives the process ID of the parent.


What to do if the PPID is 1

If the PPID for the defunct process is 1, the parent is the init process. The init process is the ancestor of all processes on the system. Under normal conditions the init process reaps all defunct processes with a PPID of 1.

If the parent process ID is 1, determine if the defunct process has been terminated for more than a few minutes. (To do this, wait a few minutes and then see if the process is still there.) It is acceptable for defunct processes owned by init to exist for one or two minutes, particularly on heavily loaded systems. Defunct processes are frequently created by complex shell scripts as a result of the way the command shells are designed. This continual stream of defunct processes is normal and does not indicate a problem.

A problem may occur when init has not completed processing /etc/inittab and is waiting on a specific entry (most commonly an /etc/rc script) to terminate. During this time, init ignores all other terminated child processes and waits for the specific child process to end. The most common indication that a hung /etc/inittab task is causing the problem is that the number of defunct processes owned by init grows without bound. In AIX 4.1.3 and later versions, init has been enhanced to effectively handle defunct processes, even while processing inittab wait entries. This function is available in AIX 3.2.5 as well as via the installation of APAR IX50480.

To determine if init has finished processing /etc/inittab, find the entries in /etc/inittab where the third field is wait or once. For example:

   qdaemon:2:wait:/bin/startsrc -sqdaemon

For 4.x entries:

Match the first field (qdaemon in the preceding example) with the sixth field of the who -d output. Each wait entry in the inittab file should have an entry in the who -d output. If there is no entry in the output, the process did not finish. In this example, qdaemon has finished, so it is not the problem process.

   .    .    Sep 20 10:01    old       6536 id=qdaemon term=0 exit=0

Look for a process that started from inittab but never made it to the who -d output. If an entry with wait as the action does not have an entry, the process has become hung and must exit or be stopped before init begins reaping child processes that have terminated. If the system is at AIX Version 4.1.3 or greater, however, the child processes that have been terminated will continue to be reaped.

NOTE: If the /etc/inittab file has an entry like the following, remove it.

install_assist:2:wait:/usr/lib/lpd/pio/etc/pioinit
> /dev/null 2>&1

This entry should have been removed by Installation Assistant after the initial installation. This process will hang and cause the defunct processes.

NOTE: If lslpp -l dce.client.core.rte reveals that this is installed, then make sure that it is installed at or above 2.2.0.7 or higher. If DCE is not being used, then uninstall. If DCE is being used, then install IY03565 which delivers dce.client.core.rte.2.2.0.7. DCE has been seen to tie up the INIT daemon, so that it cannot clean up the defunct processes in the inittab. After installing the mentioned APAR, you must reboot.


What to do if the PPID is not 1

If the parent process ID is other than 1, the process referenced by that PID is responsible for reaping child processes that have terminated.

One cause of unreaped child processes is shell pipelines. The shell is implemented such that a pipeline causes commands to have a parent-child relationship. Many of the commands on the system do not expect to have child processes because they produce no child processes of their own. These processes are then unprepared to handle a child process that has terminated. This is particularly visible when a shell pipeline includes one or more short-lived processes and one or more long-lived processes. The short-lived process will end and be inherited by its potentially long-lived parent. This defunct process will be present as long as the parent process is still running.

Any program that creates child processes of its own should accept responsibility for the child processes when they end. When init must process orphaned defunct processes, the load on the system is increased, because init must perform additional work to determine if the terminated process was a child of init. These programs should be coded to handle child processes that have terminated. If programs do not do so, this should be reported to their respective software vendors as a potential defect.


What to do if further problem determination is needed

If you determine that a process is defunct and the process should reasonably have been reaped by the parent, the following information will probably be required by the software vendor of the program in order to analyze the cause of the problem:




[ Doc Ref: 90605201014632     Publish Date: Nov. 03, 2000     4FAX Ref: 1771 ]