Diagnosing Full File Systems


Contents

About this document
    Related documentation
Resolving full filesystems
Determining what filesystems are full
Determining where space is allocated within each filesystem
Determining files that are using space in a filesystem
Resolving space taken by open files that have been deleted
Recommended fixes

About this document

This document discusses how to resolve issues with local JFS filesystems that have run out of available space.

Information in this document applies to AIX Versions 4.x.

Related documentation

For more in-depth coverage of this subject, the following IBM publications are recommended:

The product documentation library is also available:
http://www.rs6000.ibm.com/resource/


Resolving full filesystems

To resolve an out-of-filesystem-space issue, complete these steps:

  1. Determine what filesystems are full.
  2. Determine where space is allocated within the source filesystem.
  3. Take the required steps to resolve the out-of-space condition.

Once the above steps have been completed, the situation should be resolved, or the reason for the problem should be understood.

If these steps do not resolve the issue, filesystem corruption MAY be involved. Unmount the filesystem and run a full fsck against it to verify that no corruption problems exist.


Determining what filesystems are full

The df command is used to get filesystem status information. The relevant field to consider is %Used.

           %Used = percentage of total filesystem space currently allocated

Example:

   Filesystem   1024-blocks      Free %Used    Iused %Iused Mounted on
   /dev/hd4           12288        68   99%     1823    23% /
   /dev/hd2          409600     20436   96%    16181    16% /usr
   /dev/hd9var         8192      6088   26%      163     8% /var
   /dev/hd3           12288     11340    8%       87     3% /tmp
   /dev/hd1           57344     13872   76%     1459    11% /home

In this example, most of the available free space in the root filesystem / is allocated.

At this point, we have determined free-space problems on the / filesystem. The next issue is to determine what kind of space problem exists.

Keep in mind that the mounts are hierarchical and have a bottom-up precedent. In other words, a filesystem mounted below a second filesystem cannot access data in the second filesystem. For example, if you have a mount entry called /myfilesystem/mydata immediately followed by a mount entry called /myfilesystem, then the /myfilesystem mount point cannot access the /myfilesystem/mydata filesystem and any data that resides there.


Determining where space is allocated within each filesystem

There are two commands generally used to determine how and where filesystem allocation is placed: df and du.

df uses the space in a filesystem that is currently unallocated to determine the space that is used in a filesystem. For instance, if you have a filesystem that consists of 8192 512-byte blocks, and 4096 of those blocks are currently not allocated to anything, then the total space being used by the filesystem would be 4096 512-byte blocks.

            Allocated Storage = Total Storage - Unallocated Storage

df is inherently the most reliable command to report filesystem usage, because df reports information based on the filesystem as a whole.

du is a file-oriented command. It reports the space allocated to a specified file or directory. du must have a destination parameter, and is not isolated to a filesystem. For instance, running du / would give allocation information for all files in /. This would include all files in the / filesystem and any other filesystem mounted under /, such as /tmp, /var, and /usr. You could use the -x option of du to keep the operations within the filesystem, but there are cases where the results of using this option may be incomplete.

du will only report space taken by files. It will not report space taken by filesystem metadata, such as inodes, inode maps, or disk maps. inode/disk maps and other reserved areas for filesystem use will take up a negligable portion of the filesystem space, but the areas reserved for inodes can be substantial, and is ultimately based on the NBPI (Number of Bytes Per Inode) chosen when the filesystem was created. Each inode uses 128 bytes of filesystem space, so the amount of space taken for inode use will be the percentage defined below:

           (128 / NBPI) * 100

By default, a filesystem will use a NBPI of 4096, so the general overhead for a filesystem will be about 3%.

To determine what the NBPI is for the filesystem in question, issue the lsfs command with the -q option on the mount point of the filesystem.

Example:
# lsfs -q /
Name      Nodename   Mount Pt    VFS   Size    Options   Auto  Accounting
/dev/hd4  --         /           jfs   81920   --        yes   no 
(lv size: 24576, fs size: 24576, frag size: 4096, nbpi: 4096, compress: no, bf
: false, ag: 8)

du will only show allocated information about files it can reference. There are two cases where du may not show information about allocated storage.

  1. The file is hidden because a filesystem or file has been mounted on top of this entry. If you had a file that was stored in /bobby, and then mounted a filesystem on top of /bobby, then du would no longer see what was in the directory /bobby. It would only see the information in the filesystem that was mounted over /bobby.
  2. The file is open by other applications, and the file has been removed. In this case, the storage for that file will remain allocated until all references to that file have been closed. Without a filesystem entry, du will not show allocated space for that file, though df will show this space taken from the filesystem as a whole.

Determining files that are using space in a filesystem

To address the situation presented in case 1 in the previous section, mount the primary mount point of the desired filesystem on a secondary mount point. This has the effect of negating any filesystem mounted under the primary mount point.

Example:

   mount / /mnt

In the above example, we mounted the / filesystem over /mnt. The effect is that if we go into /mnt, we see all the information about the / filesystem and no other filesystem mounted under /. If we run cd /mnt/tmp, then we are actually in the directory /tmp in the / filesystem, and not in the /tmp filesystem. Also, if we run du -sk /mnt, it should closely match the %Used for / from the df command. If it does not, then this indicates that case 2 may be occuring (see the next section). For now, we will proceed with case 1.

We can now investigate disk usage accurately. First, go into the root directory of this filesystem.

   cd /mnt

Run the du command to get an accurate accounting of the space that can be seen for all accessible files in this filesystem.

# du -sk /mnt
11778   /mnt

This will report the accounted space taken for files in kilobytes. If you add the overhead of the filesystem described above, this figure should closely match that given by the df command.

Example:
# df -vk /
Filesystem  1024-blocks  Used  Free  %Used  Iused   Ifree %Iused  Mounted on
/dev/hd4    12288        12220  68     99%   1823    1249   23%      /

The overhead in this case will be (128/4096) * 12288K = 368K

The total space that can be accounted for is 11778K + 368K = 12146K versus the reported space taken from df of 12220K. This outcome is accurate and indicates that any space seen is accounted for in a file somewhere in the filesystem. If the difference between the two is large, then this indicates case 2 is more likely to be occurring. The next section, "Resolving Space Taken by Open Files That Have Been Deleted" addresses the situation in case 2. If not, continue with the following steps.

Run the following command on the new mount point of this filesystem to get a sorted disk usage of the filesystem's root directory.

   ls -A . | while read name; do du -sk $name; done | sort -nr

Example:

   # ls -A . | while read name; do du -sk $name; done | sort -nr
   2168    etc
   192     lpp
   168     sbin
    40     dev
    28     export
    12     smit.log
     4     var
     4     usr
     4     tmp
     4     tftpboot
     4     src
     4     smit.script
     4     mnt
     4     .sh_history
     4     .profile
     0     unix
     0     u
     0     lib
     0     bootrec
     0     bin

This command sorts disk usage for all files in the current directory by size, in decreasing order. If the file we suspect happens to be a directory, we can then change into that directory, and re-run the preceding command to determine what is taking up space within that directory. Continue these steps until you find the desired file or files, at which point you can take appropriate actions.


Resolving space taken by open files that have been deleted

In case 2, there are files within the filesystems that are opened by applications but have been removed from the filesystem tree. This behavior is documented in the unlink() system call as follows.

When all links to a file are removed and no process has the file open, all resources associated with the file are reclaimed, and the file is no longer accessible. If one or more processes have the file open when the last link is removed, the directory entry disappears.

However, the removal of the file contents is postponed until all references to the file are closed.

You can use the fuser command with the -dV flag on the full path to the device on which the filesystem resides. This will display files that have been removed but are still open. It will also report the inode number and size of such files. Using the process ID returned for these files, you can instruct the source application to close these files, or you can exit the application. Once this has occurred, and fuser no longer shows this deleted file, the space will be returned to the filesystem for general use.

NOTE: Using the flags given for fuser requires enhancements for fuser to be installed for the appropriate release of AIX, as listed in the following table. You may need other fixes in addition to these to reliably perform the operations of this document. Please refer to the list at the end of this document.

   APAR     Description               AIX Level
   ----     -----------               ---------
   IX78943  ENHANCEMENTS TO FUSER     4.1
   IX78941  ENHANCEMENTS TO FUSER     4.2
   IX78523  ENHANCEMENTS TO FUSER     4.3

If the filesystem had a shared library that was deleted and the process that used the library is no longer active, the library will still be open on the loader list. fuser will not detect these situations, but they can be remedied by running the slibclean command. This will flush any shared libraries from the loader list that are no longer active, and if they were deleted, the space will then be reclaimed.


Recommended fixes

   APAR     Description                                      AIX Level
   ----     -----------                                      ---------
   IX78066  FSCK SHOULD PATCH UP ALLOCATIONS IN WMAP         4.3
   IX78873  MALLOC FAILED ERRORS FROM FUSER                  4.3
   IX76061  DEFRAGFS AND FSCK INCORRECTLY REPORT BAD BIT MAP 4.2
   IX77541  FSCK SHOULD PATCH UP ALLOCATIONS IN WMAP         4.2
   IX86678  FUSER NOT FINDING PIDS OF MPX BASE DEVICES       4.3
   IY04972  Reduce serverity of disk inode corruption        4.3
   IY09173  FSCK DOES NOT CORRECT FILE CORRUPTION WHICH      4.3
            IT FINDS.



[ Doc Ref: 90605202614656     Publish Date: Spt. 28, 2000     4FAX Ref: 9768 ]