LINK:
http://www.eygle.com/case/sga2.htm
Case Description: This is a large-scale production system problem. The system has a large number of user-processes users request less response. The new process is constantly trying to establish a connection connection to the database version: 9.2.0.3 Operating system: Solaris8
1. Check the following error message in the Alert file log, indicating that the disk asynchronous IO has problems:
Warning: Aiowait Timed Out 2 TimeStue Aug 26 15:33:32 2003Warning:
Aiowait Timed Out 2 Timestue Aug 26 15:33:34 2003Warning: Aiowait Timed
Out 2 Timestue Aug 26 15:33:36 2003Warning: Aiowait Timed Out 2
Timestue aug 26 15:33:38 2003Warning: Aiowait Timed Out 2 Timestue
Aug 26 15:33:43 2003Warning: Aiowait Timed Out 1 Timestue Aug 26
15:33:46 2003Warning: Aiowait Timed Out 1 Timestue Aug 26 15:33:49
2003Warning: Aiowait Timed Out 1 Timestue Aug 26 15:33:51
2003Warning: Aiowait Timed Out 1 Timestue Aug 26 15:33:52
2003Warning: Aiowait Timed Out 1 Timestue Aug 26 15:33:53
2003Warning: Aiowait Timed Out 1 Times ...........
We know that there is a problem with asynchronous IO on some versions of Sun, and the asynchronous IO default is open.
SQL> Show parameter disk_a
Name Type Value --------------------------------- ------------------------------ Disk_asynch_io boolean TRUE
For this issue, we deactivate the asynchronous IO write of the database.
2. Shared Memory Question Alert file also records the following error message:
Tue aug 26 21:37:40 2003
Warning: Einval Creating Segment of Size 0x0000000190400000
FIX SHM Parameters in / etc / system or equivalent
This information shows that the kernel parameter is set too small or and the SGA does not match us. We check the System configuration file.
$ cat /etc/system...................set shmsys: shminfo_shmmax = 4096000000 set shmsys: shminfo_shmmin = 1set shmsys: shminfo_shmmni = 200set shmsys: shminfo_shmseg = 200set semsys: seminfo_semmap = 1024set semsys: seminfo_semmni = 2048set semsys: seminfo_semmns = 2048set semsys: seminfo_semmnu = 2048set semsys: seminfo_semume = 200set semsys: seminfo_semmsl = 2048
We have found that the maximum shared memory setting is only 4G
3. Check the SGA settings
SQL * Plus: Release 9.2.0.3.0 - Production on Tuesday Aug 2002 26 21:46:35 2003
. Copyright (c) 1982, 2002, Oracle Corporation All rights reserved.Connected to: Oracle9i Enterprise Edition Release 9.2.0.3.0 - 64bit ProductionWith the Partitioning, OLAP and Oracle Data Mining optionsJServer Release 9.2.0.3.0 - Production
SQL> SHOW SGA
Total System Global Area 6695660272 Bytesfixed Size 740080 Bytesvariable size 2399141888 Bytesdatabase buffers 4294967296 BYTESREDO BUFFERS 811008 BYtes
We found that the SGA settings are close to 7G, which is the reason for the error prompt in step 2.
4. Exchange area problem We use TOP tools to check the system health
# / usr / local / bin / Toplast PID: 16899; Load Averages: 0.82, 0.81, 0.83 21: 49: 051230 Processes: 1228 Sleeping, 1 Running, 1 on CPucpu State: 50.1% iDLE, 7.4% User, 8.6% Kernel , 33.9% iowait, 0.0% swapMemory: 8192M real, 118M free, 12G swap in use, 11G swap freePID USERNAME THR PRI NICE SIZE RES STATE TIME CPU COMMAND15751 oracle 11 44 0 6456M 6408M sleep 0:02 0.49% oracle15725 oracle 11 58 0 6458M 6410M Sleep 0:02 0.46% Oracle251 Root 12 48 0 7096K 1944K Sleep 126540 Oracle 11 58 0 6458M 6411M Sleep 0:01 0.45% Oracle16766 Root 1 43 0 3744K 2248K CPU / 1 0: 01 0.41% TOP16408 Oracle 11 58 0 6457M 6410M Sleep 0:01 0.34% Oracle15989 Oracle 11 58 0 6458m 6409M Sleep 0:01 0.34% Oracle15919 Oracle 11 58 0 6457M 6409M Sleep 0:02 0.30% Oracle16404 Oracle 11 58 0 6457M 6409M Sleep 0:00 0.28 % ORACLE16327 ORACLE 11 55 0 6457M 6410M Sleep 0:00 0.27% Oracle14870 Oracle 11 58 0 6457M 6412M Sleep 0:05 0.24% Oracle16851 Oracle 11 35 0 6457M 6411M Sleep 0:00 0.22% Oracle16467 Oracle 11 58 0 6 457M 6409M Sleep 0:00 0.21% Oracle16163 Oracle 11 58 0 6457M 6408M Sleep 0:03 0.21% Oracle15159 Oracle 11 58 0 6457M 6408M Sleep 0:05 0.21% Oracle
Memory: 8192m Real, 118M Free, 12G Swap in Use, 11g Swap Free We found that the system is only 8G RAM, the physical memory only 118M can be used now SWAP area use 12G we initially make the following judgment: SGA is set too large (nearly 7G) This is also the first step of our first step to see WARNING: AIOWAIT TIMED OUT 1 TIMES The reason why the database performance caused a sharp drop in database performance, resulting in a rapid response of the user request, congestion, accumulated until the database freezes 5. the solution to this problem is mainly due to SGA caused by improper settings, we'll reduce the SGA set: SQL> show sgaTotal System Global Area 3591870848 bytesFixed Size 735616 bytesVariable Size 1442840576 bytesDatabase Buffers 2147483648 bytesRedo Buffers 811008 bytes At this time, the database reduces the exchange, reaching stable operation, and user requests can get a quick response. Problem solve completion.
6. System status adjustment After the system is health:
$ TOP
Last Pid: 12745; loading averages: 0.46, 0.79, 0.65 22:22:49
228 Processes: 227 Sleeping, 1 on CPU
CPU States: 92.3% iDLE, 5.0% User, 1.6% Kernel, 1.1% iowait, 0.0% SWAP
Memory: 8192m REAL, 3817M Free, 4015M Swap in Use, 15G Swap Free
Pid Username THR PRI Nice Size Res State Time CPU Command
12610 Oracle 1 51 0 3511M 22M Sleep 0:04 1.96% Oracle
12595 Oracle 1 48 0 3511m 22m Sleep 0:03 0.92% Oracle
12630 Oracle 1 38 0 3511M 21M Sleep 0:01 0.84% Oracle
12614 Oracle 1 46 0 3511m 22m Sleep 0:01 0.64% Oracle
12620 Oracle 1 58 0 3511M 22M Sleep 0:01 0.53% Oracle
12709 Oracle 1 48 0 3511M 21M SLEEP 0:00 0.45% Oracle
265 root 11 38 0 7032K 1920K Sleep 3:16 0.42% PICLD
12729 Oracle 1 0 0 3511M 20M Sleep 0:00 0.26% Oracle
12741 Oracle 1 58 0 2768K 1760K CPU / 3 0:00 0.19% TOP
12745 Oracle 1 44 0 3506M 16M Sleep 0:00 0.17% Oracle
12711 Oracle 1 48 0 3506M 16M Sleep 0:00 0.11% Oracle
12738 Oracle 1 43 0 3506M 16M Sleep 0:00 0.06% Oracle
7606 Oracle 1 45 0 17M 6928K Sleep 0:07 0.05% TNSLSNR12721 Oracle 1 34 0 3506m 16M Sleep 0:00 0.05% Oracle
12723 Oracle 1 53 0 3506M 16M Sleep 0:00 0.05% Oracle
After the system is adjusted, it has been stable to run.
A little summary: This case and the other very similarity that I mentioned in front, the database problem caused by the SGA settings itself is not complex. This kind of problem should be avoided in the database planning and construction stage. Time, the problem I am more like a psychological test. When all bosses are standing behind you, you can find and solve problems cool quickly. About Aiowait Timed Out on Sun has a lot of general situations and incentives I have a corresponding case description. -Eygle