Managing System Dump Devices


Contents

About this document
    Related documentation
Managing system dump devices
Determining proper size for dump device
Setting a tape drive as a dump device
Extended options in AIX 4.x
Dumping a mirrored logical volume
Remote dumps over to a network
How to create a dedicated dump device

About this document

This document discusses how to manage storage devices used by AIX to store a system dump in the event of a catastrophic operating system software failure.

Its intent is to help the system administrator ensure that a system dump will be complete and usable for troubleshooting purposes.

This document applies to AIX versions 3.2 and 4.x.

Related documentation

For more in-depth coverage of this subject, the following IBM documents are recommended:

The product documentation library is also available:
http://www.rs6000.ibm.com/resource/aix_resource/Pubs/index.html


Managing system dump devices

When an unexpected system halt occurs, the system dump facility automatically copies selected areas of kernel data to the primary dump device. These areas include kernel segment 0 as well as other areas registered in the Master Dump Table by kernel modules or kernel extensions.

There are two dumps devices (a primary and secondary). To view information about the current dump devices, enter:

sysdumpdev -l 

Example:

# sysdumpdev -l 
primary              /dev/hd7 
secondary            /dev/sysdumpnull 

In this example, the primary dump device is the logical volume hd7.

When the operating system is installed, the primary dump device is automatically configured.

In AIX 3.2, the default primary dump device is /dev/hd7. This is a logical volume dedicated for system dumps.

In AIX 4.x, the default dump device is /dev/hd6. This is the primary paging space logical volume.

In both AIX 3.2 and 4.x the default secondary dump device is /dev/sysdumpnull. This is a null device and any dump written to this device is lost.


Determining proper size for dump device

The default dump device created for system use may NOT be large enough for a complete dump. To determine how large the dump device is, first determine what the primary dump device is using the procedure mentioned in this section. If the dump device is not currently set to a tape drive, then this device should be a logical volume. To retrieve information about this logical volume enter:

lslv <LOGICAL VOLUME NAME> 

Example:

lslv hd7 

This command will return a screen of information. Obtain the values for LPs and PP SIZE. Multiply these two values to get the size of the dump device in megabytes.

Next, determine how large the dump device for your machine should be.

To view an estimate of how large the dump device should be, enter:

sysdumpdev -e 

Example:

# sysdumpdev -e 
Estimated dump size in bytes: 4526080 

NOTE: This value will be what the CURRENT running machine would require. This value can change based on the activity of the machine. It is best to run this command when the machine is under its heaviest work load.

This will return a value in bytes. The primary dump device should be a size that is at or greater than the value returned. In this case, the dump space needs to be 4.5 megabytes. A normal system will have a physical partition size of 4 megabytes for rootvg. The dump device has to be increased in multiples of this size. A dump space of 4 megabytes would not be large enough to hold this dump, so the next size would have to be 8 megabytes.

At AIX levels prior to 3.2.4, this command option may not be available. If this is the case, a general rule of thumb is to make the dump device 1/4 of the size of your total RAM. To obtain the size of your total RAM, enter:

bootinfo -r 

If the dump device is a standard dump logical volume, such as hd7, then use the command extendlv to increase its size. If it is the primary paging space hd6, use the command chps.


Setting a tape drive as a dump device

If you do not have sufficient space on the system to store a dump, use a tape drive as the dump device. To accomplish this, put a blank tape in the desired tape drive and enter:

sysdumpdev -Pp /dev/rmt# 

In this case, rmt# refers to the specific tape drive you want to use for this (for example, rmt0, rmt1, rmt2, etc.)

Be aware that the tape drive will not be usable by any other application until you re-assign the dump device to another location.


Extended options in AIX 4.x

At AIX 4.x, there are three extra attributes that are not available in AIX 3.2. sysdumpdev -l will show these extra options.

Example:

# sysdumpdev -l 
primary              /dev/hd6 
secondary            /dev/sysdumpnull 
copy directory       /var/adm/ras 
forced copy flag     TRUE 
always allow dump    TRUE 

The copy directory entry specifies a filesystem in the rootvg volume group where the dump will be copied upon reboot after a system dump. This only applies if the primary dump is the primary paging space (hd6).

The force copy flag entry specifies if the system will prompt you to copy this dump to external media if there is not enough space in the specified filesystem. If this is set to FALSE and the system cannot copy this dump to the filesystem, then it will discard the contents of the dump.

The always allow dump flag is a security measure. If this is set to FALSE, then the only way to force a system dump would be to turn the service key to Service and then press Reset. It also prevents forcing a dump of any kind on machines with no service key, such as all PCI based machines.

If the primary dump device is the primary paging device, the only way it can copy the dump to the filesystem save area is if there is enough free space in that filesystem. The free space in the filesystem can be determined with the df command. If the free space in that filesystem is not at least as large as the space required for the dump (sysdumpdev -e), then either increase the size of that filesystem to have enough free space, remove files in that filesystem until enough free space is available, or move the save area to another filesystem with the required space. The latter can be accomplished with the sysdumpdev command. This filesystem must be in the rootvg volume group.


Dumping to a mirrored logical volume

AIX does not support dumping to a mirrored logical volume. This is because the dump only dumps to one copy of the logical volume. In other words, one of the mirrors will contain the dump. Since the logical volume is not being handled like a mirrored logical volume, the new data written, (for example, the dump) will not be synched with the other mirrors. Thus, when crash tries to read the dump, it can obtain data from both mirrors, only one of which actually contains the dump. That is, crash sees good dump data mixed with garbage data, and will not read the dump.

By splitting up the logical volume, creating one logical volume per copy of the original, one of them would contain a good dump. This can be accomplished with the splitlvcopy command. The procedure for splitting the logical volume is:

Run lslv to get the LV IDENTIFIER:
# lslv hd7 
LOGICAL VOLUME: hd7                 VOLUME GROUP: rootvg 
LV IDENTIFIER:  0000335216021417.12 PERMISSION:   read/write 
VG STATE:       active/complete     LV STATE:     opened/syncd 
TYPE:           dump                WRITE VERIFY: off 
MAX LPs:        128                 PP SIZE:      4 megabyte(s) 
COPIES:         2                   SCHED POLICY: parallel 
LPs:            4                   PPs:          8 
STALE PPs:      0                   BB POLICY:    relocatable 
INTER-POLICY:   minimum             RELOCATABLE:  yes 
INTRA-POLICY:   middle              UPPER BOUND:  32 
MOUNT POINT:    N/A                 LABEL:        None 
MIRROR WRITE CONSISTENCY: on 
EACH LP COPY ON A SEPARATE PV ?: no 

Notice that there are two copies. This means that there is one mirror. Use the splitlvcopy command to split the logical volume, hd7 in this case, into two logical volumes.

# splitlvcopy 0000335216021417.12 1 

A message similar to the following may appear:

splitlvcopy: WARNING! The logical volume being split, hd7, 
        is open. Splitting an open logical volume may cause 
        data loss or corruption and is not supported by IBM. 
        IBM will not be held responsible for data loss or 
        corruption caused by splitting an open logical 
        volume. Do you wish to continue? y(es) n(o)? lv02 

Enter y. The command will complete and show the name of the new logical volume it created, for example, lv02. At this point, hd7 contains one copy of the original hd7, and lv02 contains the other. This is exactly what we need.

If there had been three copies, shown by lslv, then lv02 would contain two copies of the original hd7.

Run crash on /dev/hd7 first to see if that was the right copy. If crash does not give error messages, the correct one has been found. If the dump is unusable, run crash on /dev/lv02, if lv02 has only one copy, that is, if the original hd7 contained two copies. If lv02 has two copies now, because the original hd7 had 3, run lslv /dev/lv02 to get the LV IDENTIFIER. Then run splitlvcopy <LVID> 1 to split lv02 to obtain one copy of each of its mirrors.

This may not work for dumps taken to mirrored paging space, because the pager may have already overwritten the dump.


Remote dumps over a network

Currently, the system dump does not handle ARP requests received from the server, or the gateway used, during the dump. If an ARP request is received while taking a dump, this causes the dump to hang. If your system takes a system dump and hangs on 0c7, this is likely the problem. At this point, power the system off and reboot.

To avoid this problem, create a permanent ARP entry for the client (the dumping machine) on the server or gateway. The machine that needs the permanent ARP entry is the machine on the same local network or ring as the client. This can be thought of as the logical server, since, if it is not the real server, the dump data must pass through it to get to the real server.

NOTE: "Real server" refers to the machine designated in the remote dump specification on the client.

Run the following steps on the real server to establish a permanent ARP entry on the server or gateway machine.

  1. Ensure an ARP entry exists by pinging the client. Example:
        ping myclient.xyz.com 
    
  2. Use arp -a to see the ARP table. Example:
     # arp -a 
    

    The following four lines of text should appear as two full lines.

     myclient.xyz.com (128.3.56.9) 
                       at 10:0:5a:9:e:7d [token ring] 
     myserver.xyz.com(128.3.56.20) 
                       at 10:0:5a:8f:12:bf [token ring]| 
    
  3. Now use the arp command to make the dumping client's entry permanent. Example:
         # arp -s 802.5 myclient.xyz.com 10:0:5a:9:e:7d 
    

The 802.5 refers to a token-ring network. Valid network types are listed in the ARP documentation of the product documentation, and are currently ether(802.3), fddi, and 802.5.

NOTE: If the dump hangs and the client must be rebooted, the partial dump on the server may still be useful.


How to create a dedicated dump device

  1. View an estimate of the dump size, enter:
    sysdumpdev -e
    

    You should see information similar to the following:

    0453-041 Estimated dump size in bytes: 25103360
    
  2. View the PP size, enter:
    lsvg rootvg
    

    You should see information similar to the following:

    VOLUME GROUP:   rootvg          VG IDENTIFIER:  0000003173650c77
    VG STATE:       active          PP SIZE:        4 megabyte(s)
    VG PERMISSION:  read/write      TOTAL PPs:      479 (1916 megabytes)
    MAX LVs:        256	        FREE PPs:       258 (1032 megabytes)
    LVs:            11		USED PPs:       221 (884 megabytes)
    OPEN LVs:       10		QUORUM:         2
    TOTAL PVs:      1		VG DESCRIPTORS: 2
    STALE PVs:      0		STALE PPs       0
    ACTIVE PVs:     1		AUTO ON:        yes
    
  3. Determine necessary number of PPs (physical partitions). Divide the estimated size (sysdumpdev -e), by the PP size to estimate the proper number of PPs that the dump logical volume should have.

  4. Determine where you have free PPs, enter:
    lsvg -p rootvg
    

    You should see information similar to the following:

    rootvg:
    PV_NAME           PV STATE    TOTAL PPs   FREE PPs    FREE DISTRIBUTION
    hdisk1             active       479         258       78..02..00..82..96
    hdisk2             active       159          0        00..00..00..00..00
    hdisk3             active        75          8        00..00..00..00..08
    

    NOTE: You should use the hdisk with the highest number of free PPs (in this example hdisk1).

  5. Create a LV, enter:
    mklv -y dumplv -t sysdump rootvg 7 hdisk1dumplv 
    
  6. Set LV as the dump device, enter:
    sysdumpdev -Pp /dev/dumplv
    

    You should see information similar to the following:

    primary              /dev/dumplv
    secondary            /dev/sysdumpnull
    copy directory       /var/adm/ras
    forced copy flag     TRUE
    always allow dump    FALSE
    
  7. Change always allow dump to TRUE, enter:
    sysdumpdev -K
    
  8. Verify that the flag has been changed, enter:
    sysdumpdev -l
    

    You should see information similar to the following:

    primary              /dev/dumplv
    secondary            /dev/sysdumpnull
    copy directory       /var/adm/ras
    forced copy flag     TRUE
    always allow dump    TRUE
    



[ Doc Ref: 90605210214768     Publish Date: Oct. 19, 2000     4FAX Ref: 6221 ]