Basic positioning method of RS6000 minicomputer failure 2

xiaoxiao2021-03-05 83

Fault Location of 4 7133-D40 SSA Disk Cabinet When the SSA disk cabinet fails, the corresponding SRNS will display the corresponding SRNS on the liquid crystal display of the front panel of the disk cabinet, and the yellow display light flashes, and it will also be in the Error log of AIX. There is a recorded error message, such as: Disk_err1, disk_err4, ssa_array_error, etc. Please record the code after the problem occurs and call the IBM service hotline. The five software fault positioning method software fault conditions are intricate, and there are several fault processing methods for several common cases. 1) The file system is not enough. Check if there is a "full" file system. Especially /, / var, / tmp, do not exceed 90%. The file system can cause the system to work properly, especially the basic file system of AIX. If / (root file system) can cause users to log in. View in DF -K. # DF -K (View AIX Basic File System) FileSystem 1024-Blocks Free% Used iused% iused MOUNTED ON / DEV / HD4 24576 1452 95% 2599 22% / / DEV / HD2 614400 28068 96% 22967 15% / USR / DEV / HD9VAR 8192 4540 45% 649 32% / VAR / DEV / HD3 167936 157968 6% 89 1% / TMP / DEV / HD1 16384 5332 68% 1402 35% / home except / usr file system, other file systems should not Too full, generally no more than 80%. Processing method 1: Delete junk file # du -sk * | sort -r | Head find out the largest subdirectories in the current directory, and go up and find out the largest space. (To distinguish which directories are the Mount Point of the file system, which is the subdirector of the file system to delete the file, release the space. Sometimes the space is not released immediately after deleting the file, which is because the file you deleted is being opened by a program. Only when this program is released, it is sometimes necessary to restart the system. Processing Method 2: Increase the file system size # Smitty Chjfs file system can increase at any time, provided that there is a remaining space in the volume group (VG). 2) Check the integrity of the file system # umount filesistem_name # fsck -y filesystem_name Note: The file system must first UMount, do check and repair, otherwise the unknown consequence can be caused. 3) View the volume group information (LSVG -L VG_NAME): There is a logical volume of the "Stale" state. If there is, use the syncvg command to repair the "Store" logical volume. 4) Check the memory SPACE usage (LSPS -S): Whether the usage exceeds 70%, if there is a CHPS -SX PGNAME, add X PP or with mkps -a -n -sx myvg on myvg Add a memory switch area with a PP number x. 5) Memory leakage in small machine memory leaks, that is, the system or application process cannot be released, so that the capacity of the available memory is gradually reduced. If the available memory is lowered to a minimum, the system or application cannot be causing the FORK process, it will cause the system to be embarrassed. Usually we can use the PS and SAR command to view the miniature machine memory and CPU usage, and the development trend of memory and CPU usage.

(a) PS # ps gv | Head -n 1; ps gv | Egrep -v "rss" | sort 6b -7 -n -r | head -n 5 pid Tty Stat Time PGIN SIZE RSS LIM TSIZ TRS% CPU% Mem Command 15674 PTS / 11 A 0:01 0 36108 36172 32768 5 24 0.6 24.0 ./tctestp 22742 PTS / 11 A 0:00 0 20748 20812 32768 5 24 0.0 14.0 ./backups 10256 PTS / 1 a 0:00 0 15628 15692 32768 5 24 0.0 11.0. - a 2:13 5 64 6448 xx 0 6392 0.0 4.0 Kproc 1806 - A 0:20 0 16 6408 xx 0 6392 0.0 4.0 Kproc Size Virtual Size (in the PagingSpace), in Kilobytes RSS Real-Memory (Resident SET) SIZE IN KILOBYTES of The Process. Basic conditions of memory and CPU usage can be observed by comparison of different time output. Find out that the number of memory has incurred and large, this process may have already happened. (b) The SAR directive can also view the CPU usage, but the results of the statistics are not very accurate. Usually using the SAR order is: #SAR -P all 2 10 09:29:37 CPU% USR% SYS% WIO% IDLE 09:29:39 0 0 0 4 95 1 1 0 4 95 - 0 0 4 95 09 : 29: 41 0 0 2 6 92 1 3 4 2 91 - 2 3 4 92 09:29:43 0 3 1 2 94 1 2 2 2 95 - 2 1 2 94 09:29:45 0 2 7 90 1 4 5 6 86 - 3 3 6 88 09:29:47 0 1 1 2 96 1 1 2 2 96 - 1 1 2 96 09:29:49 0 0 0 0 0 0 1 0 1 0 99 - 0 0 0 100 09 : 29: 51 0 2 0 0 98 1 0 1 0 98 - 1 0 0 98 09:29:53 0 7 1 6 86 1 2 2 5 90 - 5 2 5 88 09:29:55 0 4 5 56 35 1 12 2 55 32 - 8 4 55 33 09:29:57 0 16 8 14 64 1 15 9 11 65 - 15 12 64 Average 0 3 2 10 85 1 4 3 8 85 - 4 2 9 85 Indicates 2 seconds There are a total of 10 results in the previous result and then average. Currently, it is best to restart the system if the memory leak is found.

In the general case, HACMP software is very important in HACMP software, but once there is a problem, diagnosis and recovery skills are important. Need to quickly determine the problem and then use your understanding of HACMP To restore the normal operation of HACMP. Generally, the misalignment in the HACMP environment includes:. Understand the existence of the problem. Judgment the source of the problem.. Solve the problem. For the problem of a problem, you can learn the following way to understand a cluster environment There is a problem. The end user's complaint, they can't access the application. There are some HACMP information on the console. 1. Application services cannot access the end user complaints usually indicate that the Cluster has problems. They cannot perform applications normally or Unable to log in to the system. We must collected detailed information to determine there. There is a problem. Is there a wrong message? If possible, let the user repeat steps to determine if it is wrong. You can also be in your own system. Duplicate. To know that the user application is not available. It does not mean that HACMP has problems. The problem may occur in the application itself or its startup or termination script. Therefore, the application itself is a part of the error. 2. There are some HACMP information on the console to start, terminate, or errors, there will be some HACMP information on the console, and it will also write the corresponding file. Second judgment problem When the error occurs, We should try to find errors. But we are often misleaded by the wrong surface. The following steps can make us get more detailed information. 1. Save some log files. (/ TMP / HACMP.out & / TMP / CM .log). 2. Carefully check the log files generated by HACMP. They can provide initial judgment clues. 3. Check if the HACMP's part is normal. 4. Open HACMP tracking tools to generate more detailed information. .Hacmp's log file: The following files are text files, you can use vi. Each log file contains time. / Usr / adm / cluster. Log: The state of the HACMP is recorded, generated by the daemon of the HA. / TMP/HACMP.out: Records the detailed script of the HA. /usr/sbin/cluster/history/cluster.mmdd: Record the various events of HA Occurs. / TMP/cm.log: Generated by the CLSTRMGR process, it will be overwritten each time the HA will restart.. HACMP for AIX structure application layer HACMP software layer LVM & TCPIP layer AIX layer Physical Network layer Physical hard disk layer The hardware layer is in the physical network layer, the physical hard disk layer, the hardware layer, the LVM & TCPIP layer, and the AIX layer we can use the AIX system command to see if the hardware and system have problems. Generally, there is no type with ERRPT commands. Error, lsvg -o to see the VG we have to have VARYON, MOUNT to see what we have to install, netstat -i to see the service IP we have to be the state (or use ifconfig en *) Service and Service IP between Cluster Node, Standby and StandBy IP can ping. The corresponding information appears on each node. Description Hardware, LVM & TCPIP layer, AIX layer No problem, the problem may appear on the application layer and the HACMP software layer. Otherwise, the problem appears at the corresponding level. On the HACMP software layer, we can use vi /tmp/hacmp.out, if there is an Event Failed field It is possible to appear in this layer. If there is a time period, HACMP.out has no information, the problem may appear at the application layer. Here are some of the rules of HA-tuning:

In the first time, the relevant log file is saved, especially those that will be overwritten. Try to repeat the emergence. Don't be confused by the issues reflected by the user. Divergence to repeat the problem, if there are multiple possibilities Leading the problem, one is repeated, not repeating multiple possibilities at a time. Don't judge the problem with experience, but to judge the results after various tests.. Source of isolation, according to us The level of the narrative is diagnosed in the top. It is simply tested from a simple environment, and we will test from a simple environment. Do not try to test in a complex environment. Do a change once again, otherwise we It is impossible to know that the change solves the problem. Don't ignore various possibilities, because you can lose, you will pay attention to every detail of the system, including power, plug, connection, etc. Maintain a variety of tests and solutions, Use the reference for future troubleshooting. Dial the IBM service hotline, tell the problem phenomenon and the test results you have made, they will repeat the test center in Call Center, if necessary, the engineer is present to solve the problem. Three IBM HACMP Duplex System Management and Maintenance This section describes some of the basic management and maintenance of the HACMP dual-machine software. These commands will be used frequently in the daily work of the HACMP dual-machine system. 1 HACMP dual-machine system Start To start HACMP Double-machine system must have a ROOT user's privilege to enter the system to perform the following commands on the command line. # Smit clstart or # /usr/sbin/cluster/etc/rc.cluster -boot -n - I Note that the node that HACMP dual-machine software initiated first in the dual-machine system will become a primary node has resources and the node that is started after the key service will become a node. In addition to starting HACMP, INFORMIX is required to start the two-machine And SCP applications. 2 HACMP dual-machine system Closes To turn off the HACMP dual-machine software on a node must have the node root user privilege to enter the node host on the command line to execute the following command. # SMIT CLSTOP or # clstop -ger needs to note that if the node is the primary node and the HACMP software on the node is normal, you need to pay attention to the three options of the CLSTOP shutdown mode. Any customer application's good processing routine .2 Graceful means that the customer application predefined good processing routine will call the customer when the dual-machine software is turned off. 3 TakeOver means that the node will turn off the duplex software and release the resource request node to take over If the node is the preparation node, turn off the mode option. There is not much sense. In addition, HACMP will close Manager and Informix. 3 Query the status of the HACMP dual-machine system in the operation of the two-machine system often needs to know the current state of the two-machine system is likely to appear on the two-machine system. Recovery processing can ensure high availability and high-capacity of the dual-machine system. Query the status of the HACMP dual-machine system only requires the following operations to enter the node that needs to be queried, first check if the HACMP dual-machine software is started in this node. # lSRC -G Cluster If the system shows the following information, the HACMP dual-machine software has been started normally. Subsystem Group Pid Status CLSTRMGR Cluster 22500 Active Clsmuxpd Cluster 23674 Active Clinfo Cluster 28674 Active In confirmed that the two-machine software HACMP is started normally Under the command line, execute the following command to check the current state of the dual-machine system # / usr / sbin / cluster / clstat -a If the duplex system works normally, the system will display the following similar information CLSTAT - HACMP for AIX Cluster Status Monitor -------------------------------------------------- ---------------------------------- Cluster: SCP_Cluster (80) THU Jan 20 08:45:17 TAIST 2000 State: 2 SubState: Stable Node: MSCP1 State: Up Interface: MSCP1_SVC (0) Address:

192.9.1.60: Up interface: mscp1_tty (1) address: 0.0.0.0 state: up node: mscp2_svc (0) Address: 192.9.1.61 State: Up interface: mscp2_tty (1) address: 0.0. 0.0 State: UP 7 Common System Status query command: # lsdev -c -s SCSI lists all relevant information about each SCSI device: such as logic unit number, hardware address, and device file name. # ps -ef lists various information on all processes running: such as process numbers and process names. # NetStat -Rn lists the network card status and routing information. # NetStat -in lists the NIC status and network configuration information. # DF -K lists the loaded logical volumes and their size information. # mount lists the loaded logical volumes and their loading locations. # uname -a list information such as system ID number, system name, OS version. # HostName lists the system network name. # rsvg -l rootvg, LSVG -P rootvg Displays logical volume group information, such as which physical discs and logical volumes are included. # llv -l Datalv, LSLV -P Datalv Display logical volume various information, such as including which discs, whether there is mirroring. Eight Network Fault Location Method Network No Diagnostic Process: ifconfig View NIC Listing (UP) NetStat -i View NIC Status Ierrs / IPKTS and Oerrs / Opkts> 1% PING Self-Card Address (IP Address) PING Other Machine Address, Uncontonded, use DIAG to detect the network card in its machine. In the same network, SubnetMask should be consistent.

Basic methods of network configuration: (1) If you need to modify the network address, host name, etc., must use ChDEV command # chdev -l inet0 -a hostname = myhost # chdev -l EN0 -A Netaddr = '9.3.240.58' -a Netmask = 255.255.255.0 '(2) View NIC status: # lsdev -cc if (3) Confirm Network Address: # ifconfig EN0 (4) Start NIC: # ifconfig EN0 Up (5) Configuring routes There are two ways to join the route: Permanent routing # chdev -l inet0 -a route = '10.57.0.0 ',' 9.3.240.59 'Temporary Routing # Route Add 10.47.1.2 9.3.240.59 Use the command netstat -rn to view the routing table: Commonly command list: Any XXXX, ####, ****, or x is to be subsstitude | = filename dir = Directory | = Pipe symbol bosboot -a -d / dev / hdiskx -rebuilds boot record / Image On Boot Device (HDiskx) Cat -View Contents of A File Cat /TMP/****.1 -view A File, Look At Output Cat FN FN> NewFile -combines Two Files To a Single File CD -WILL RETURN You To DEFAULT DIR CD / -WILL PUTIT DIR CD / XXXX-CHANGE You to a Dir Anywhere Is System CD .. -will DROP You Out of 1 Dir At A Time CD XXXXX -WILL CHANGE You To a Dir in Current Dir CFGMGR - Will Auto Config Devices CFGMGR -V & - (- V) Shows Processes (&) Puts in Background ChP S -S xx hd # -increase Paging Space (xx = # of addt'l pps) cp oldfn newfn -copy a file cp oldfn dirn -copy a file to another directory crontab -l -list crontab entries for the Current User Ctrl V -Will Page Down 1 Page Ctrl 6 -will Page Up 1 Page Del Fn-Same As RM -I, Promts to Remove Fn Df - SHOWS Status Of File Systems (No Inodes) DF -IK - (K) Show Status in 1024 Bites (1MB) (ONLY AIX 4 Diag -a -Updates Changes in Hardware Configuration Diag ***** - **** =

A Device Type (As Tape, Disk .... fastpath) diag -cd Rmtx-Resets Tape Drive Dosformat -Formats a Diskette to dos dosdir -List Files on dos formated diskette dosread xx yy -copies dosve xx to aix file yy doswrite YY XX -copies aix file YY to dos file XX errpt -generates a one line synopsis of logged errors errpt | pg -list errorlog 1 page @ a time (1st column is ID) errpt -a -displays detailed information of logged errors errpt - s Mmddhhmmyy -select entries posted later than date errpt -aj XXXXXXX -list detail error by ID number. (XXX = 1st column) errpt -d S -list software errors errpt -j XXXXXXX -list summary report by ID number. errpt -aN XXXXXX -list detailed report by resource name column errpt -N XXXXXXX -list summary report by resource name column errclear 0 -clears errorlog errclear -N xXXXX 0 -clears errorlog by resource name, 0 = all enter errclear -j xXXXX 0 -clears errorlog By ID Number. Finger -same As WHo But With More Details Flcopy -Copies a Diskette to Another Diskette Format -formats a diskette in default diskette drive format -l -formats in lower denity: 1.44 on 2.44 / 720 on 1.44 hostname -responds with host system name host (hostname) -responds with internet address instfix -ik IPAR # -lists ipar fix was completely installed lppchk -v -checks install status of LPPs lppchk -v 2> / dev / lpX -sends output of lppchk to printer lpx lpstat -a all -view all printer queues lptest 80 5> / dev / lp0 -send test pattern to LP0 LS -LIST NAMES OF FILES & DIRECTORES IN CURRENT DIR LS -LIA -List Details of Files, Current Dir &

Subdir ls -al -list details of files or dir in current dir lsattr --l xxxxxx -list specified settings on a device limited lsdev -c | sort -d -f -list system hardware (devices) LSDEV-C | GREP 00-0X - List resourses for a adapter lsdev -cc xxxxx -h -list devices (xxx = tty, printer, disk, memory, advice) lsdev -cc tape -list tape devices lsdev -cc tape -list tape devices lsdev -cc tape -list tape devices LSDEV -CC Tape cs pci -list pci devices lsdev -Cs isa -list isa devices lscons -lists the assigned console lscfg -list hardware list (same as diags list) lscfg -rl mem * | pg -lists the memory on PCI bus machines lscfg -vl XXXXX -List Config Info from A Device. (RMT0, HDISK, ETC) LSCFG-VL Sysplanar0 -Lists the Machine Type, Model, S / N on SMP Lsfs -List All FileSystems Data from "DF" cmd lslpp -l | grep broker -lists incomplete ptfs lslv -m hd5 -finds boot drive under pv1 column lsps -a -checks available paging space lsps -s -checks available paging space lspv -lists information about the physical volumes lspv hdisk # -list drive i nfo lspv -l hdisk # -lists logical volume group disk in lsuser -f ALL -lists all attributes for all users lsvg -lists volume groups lsvg -p XXXXXX -lists disks in volume group (xxxxx = volume name) more -reads files and Displays the text one screen at a time. mpcfg -df -list all setting the machine is set to (SMP) MPCFG-C 11 1 -Changes to fast ipl on smallines (SMP) MV FN (Path Fn) -Move and rename a file oslevel -shows AIX version (3.2.4 and above) pg -reads and displays text one screen at a time. pdisable -makes unavailable or shows all disabled tty's pdisable tty # -disables a tty penable -makes available or shows all enabled TTY '

S Penable TTY # -ENABLES A TTY PS -EL | PG -LOOK AT Process Running On System Pwd -List What Dir You Are Currently In R-Repeats Last Command RM-I ******* -Remove A File & Will Prompt You If You Are Sure RMDEV -L XXXXX DATA BASE RMDEV -L XXXX DATA BASE RMDEV -L XXXXDD -REMOVES A Device and deletes it from data base set-o vi -sets up to veiw cammands That Have Been Run : wq -write (save) and quit file esc k -used with set command to list last command k, l --k = list next command ran, l = steps you thru command i -use with set command inserts characters j -steps YOU BACKWARDS CW-CW = Removes a Word, Just Type in new word (use with esc) a, x, r -a = added text, x = delete text, r = replace text (r letter) r-loading taste Over letters or word smit ***** - (***** = tape, disk, tty, etc .fastpath) su -stands for switch user, (not super user) Su -Switches to root ID or Prompts you for Password su xxxxxx -switches to xxxxxx's id tar -cvf / dev / rmtx / etc -will copy / etc to a Tape Drive tar -tvf / dev / rmt X -Will Read A Tape Drive TCTL -F / DEV / RMTX REWOFFL -REWIND & EJECT TAPE TCTL -F /DEV / RMTX.1 FSF 3 --FORWARD Advances a Tape to Be Read by tar tctl -f -List avil commands (- F flag is not correct) tctl retension -retensions tape in tape drive & -put any command in background with process ID uptime -how long since last IPL and how many users on system vmstat # # -reports virtual memory statistics and more iostat # # -reports CPU, Disk & CDROM Statistics Use with VM & Iostat -1st # (How Many Sec To REPEAT), 2nd # (How Many Times) Who -Shows Users on System Who Am i -shows User ID Your Terminal &

Tty Number Use The Following With Other Commands. ----------------------------------------- ----------> / Tmp / ****. 1 -Creates a file (used with lsxxx command)> / dev / lp # -redirectes output to aprinter (use with a comd) | grep -is useful to search for text in a file. | pg -use after any command to view one page at a time | greater than sign / -slash sign / -back slash sign >> -double redirect will add text to end of file & -put any command in background with process ID MUST unmount file system 1st to run fsck & dfsck / only use with a problem -------------------------------------------------- ------------------------------------------ fsck xxxxxxx -will check a file system for errors & prompt dfsck / XXXX / XXXX -will check 2 different file sys at the same time FOLLOWING command lines will delete a group of devices as a group, the #, sign is the hdisk # 's that you want to delete (this is an esta MPE.) ----------------------------------------------- -------------------------------------------------- - for disk in # # # -this line and the next 3 line work together do -the prompt will be> (Remember to hit enter) RMDEV -L HDisk # {disk} -d -the prompt will be> (Brackets Around Disk change) Done -The Prompt Will BE>

(On a printout. Change to -) SSA Related Commands -------------------------------------- --- lsattributes of ssax -list attributes of ssax -list attributes of ssax -list vPD of ssa adapters lsdev -c | grep ssa -list all ssa devices lslpp -l | grep ssa -list ssa device drivers maymap -ap -maymap display of SSA loop maymap -alph -maymap display of SSA loop lscfg -vl pdisk * -list VPD of pdisks ssaxlate -l hdiskX -list hdisk to pdisk assignment ssaxlate -l pdiskX -list pdisk to hdisk assignment ssa_rescheck -l hdiskX -show hdisk RESERVATION STATUS FOLLOWING CMDS LIST, COPY, AND RESTORE for CPIO, TAR, DD, Backup, Dos: Note: The fd0 is just a dev. so you may.com ----------- -------------------------------------------------- ---------------------- List Copy ------ -------- CPIO -ITV / dev / fd0 tar -tvf / dev / fd0 tar -cvf / dev / fd0 fn DD Li -L | DD DD IF = fn of = / dev / fd0 restore -tf / dev / fd0 backup - 0 -uf / dev / fd0 fn by inode restore -tf / dev / fd0 Find / -pr INT | backup -i -f / dev / fd0 by name dosdir doswrite -a (aix fn) (fn.ext) to restore ----------------- CPIO -IV Fn / dev / lpx -to list sys config / vpd lsuser -f all> / dev / lpx -to list users lsdev -cc tty -h -to list all tty's lsdev -cc lp -h -to List all lp's lsattr -el ttyx> / dev / lpx -to list ttyx parameters (do for easy tty) lsattr -el lpx> / dev / lpx -to list lpx parameters (Do for Each LP) LPSTAT> / DEV / LPX - To list queues lsfs>

转载请注明原文地址:https://www.9cbs.com/read-37287.html

9cbs

New Post(0)