Problem Description: The engineer of Binzhou Netcom in the morning of October 25th reported that the OSS application system was running slowly. The specific operation was through the OSS system to check the order. For a long time, it can return the results, which seriously affects the normal use of customers.
Problem processing: 1. Log in to the database host, use the SAR command to see the value of the IDLE is 0, the resources of the CPU have been exhausted:
BZ_DB1 # SAR 2 4
Sunos BZ_DB1 5.8 Generic_108528-19 Sun4u 10/26/04
10:56:46% USR% SYS% WIO% IDLE10: 56: 48 1 4 95 010: 56: 50 1 5 94 010: 56: 52 0 6 93 010: 56: 54 1 6 93 0
Average 1 5 94 0
2. Use the TOP command to see that there are two processes that significantly occupy the high CPU utilization, the following is the result of the TOP command:
BZ_DB1 # TOP
Last PID: 1664; Load Averages: 3.26, 3.24, 3.69 159 Processes: 152 Sleeping, 2 Running, 2 Zombie, 1 Stopped, 2 on CPucpu States: 1.5% iDle, 72.5% User, 17.9% Kernel, 8.0% iowait, 0.0 % SwapMemory: 2.0g Real, 233M Free, 2.0G Swap in Use, 3.4G Swap Free
PID Username Thr Pr NCE SIZE RES State Time FLTS CPU Command 27420 Oracle 1 10 0 1.3G 1.2G CPU01 22.9H 2 31.94% Oracle 27418 Oracle 1 10 0 1.3G 1.2G Run 23.0H 6 26.86% Oracle 5943 Oracle 1 59 0 1.3 G 1.2G Sleep 25:26 37 4.92% Oracle 6295 Oracle 1 55 0 1.3G 1.2G Run 25:14 74 4.90% Oracle 7778 Oracle 1 43 0 1.3G 1.2G Sleep 11:43 110 4.86% Oracle 13270 Oracle 1 59 0 1.3G 1.2G Sleep 210.6H 0 0.96% Oracle 13056 Oracle 1 48 0 1.3G 1.2G Sleep 303: 30 0 0.37% Oracle 10653 Root 1 58 0 2560K 1624K CPU00 0:00 0 0.32% TOP 18827 Oracle 1 58 0 1.3g 1.2G Sleep 18.4H 0 0.31% Oracle 12748 Oracle 258 58 0 1.3G 1.2G Sleep 555: 14 0 0.21% Oracle 10634 Oracle 1 59 0 1.3G 1.2G Sleep 0:01 0 0.21% Oracle 28458 Oracle 1 58 0 1.3g 1.2G Sleep 535: 02 0 0.18% Oracle 13075 Oracle 1 59 0 1.3G 1.2G Sleep 326: 33 0 0.15% Oracle 13173 Oracle 1 58 0 1.3G 1.2G Sleep 593: 07 0 0.13% Oracle 4927 Oracle 1 59 0 1.3G 1.2G Sleep 33.4H 0 0.11% Oracle can see these two processes are 27420 and 27418.3. Capture these two occupied CPUs The SQL statement corresponding to the process of high rate is too high:
The following uses the SQL statement I summarized:
SQL> set line 240SQL> set verify offSQL> column sid format 999SQL> column pid format 999 SQL> column S_ # format 999SQL> column username format A9 heading "ORA User" SQL> column program format a29SQL> column SQL format a60SQL> COLUMN OSname Format A9 Heading "OS User" SQL> Select P.PID PID, S.SID SID, P.SPID SPID, S.Username Username, S. OSUSER OSNAME, P. Serminal, P.Program Program , P. Background, S.Status, RTRIM (Substr (A.SQL_Text, 1, 80)) SQLFROM V $ Process P, V $ Session S, V $ SQLAREA A where p.addr = s.paddrand s.sql_address = a .address ( ) and p.spid Like '% & 1%'; Enter Value for 1: 27420 (note that the PID corresponding to the process of the highest CPU should be entered)
Get the following SQL statement:
SELECT NVL (SUM (Localcharge), 0), NVL (SUM (Usage), 0) from localusage where to_char (endtime, 'yyyymmdd') = 20041016and Localcharge> 0 and Caller Like '0543886%';
27418 The corresponding SQL statement is as follows: SELECT NVL (SUM (Localcharge), 0) from localusage where to_char (endtime, 'yyyymmdd') = 20041016 and Caller Like '0543888%;
4. Use the relevant users to connect to the database, check their execution plan: SQL> Connect Wacos / OsSconnected.
SQL> @? / Rdbms / admin / utlxplan.sqltable created.
SQL> Set autotrace on
SQL> Set Timing On
SQL> SELECT NVL (SUM (Localcharge), 0), NVL (SUM (Usage), 0) from localusage where to_char (endtime, 'yyyyymmdd') = 20041016 and localcharge> 0 and Caller Like '0543886%';
NVL (SUM (Localcharge), 0) NVL (SUM (Usage), 0) ------------------------------- -------- 0 0
Elapsed: 00: 02: 56.37
Execution Plan ------------------------------------------------ ---------- 0 Select Statement Optimizer = Choose (COST = 13435 Card = 1 Bytes = 5 3)
1 0 Sort (aggregate) 2 1 Partition Range (all) 3 2 Table Access (Full) of 'localusage' (COST = 13435 card = 1 81 bytes = 9593) statistics ------------- -------------------------------------------- 258 Recursive Calls 0 dB block gets 88739 consistent gets 15705 physical reads 0 redo size 580 bytes sent via SQL * Net to client 651 bytes received via SQL * Net from client 2 SQL * Net roundtrips to / from client 8 sorts (memory) 0 sorts (disk) 1 rows Processed
It was found that the Localusage Table was a full-tunition scan, and there was no return to 2 minutes. SQL> SELECT NVL (SUM (Localcharge), 0) from localusage where to_char (endtime, 'yyyymmdd') = 20040816 and Caller Like '0543888%;
NVL (SUM (Localcharge), 0) ----------------------- 27.6
Elapsed: 00: 03: 56.46
Execution Plan ------------------------------------------------ ---------- 0 Select Statement Optimizer = Choose (COST = 13435 Card = 1 bytes = 4 0)
1 0 sort (aggregate) 2 1 partition range (all) 3 2 Table access (full) of 'localusage' (COST = 13435 card = 3 615 bytes = 144600)
Statistics --------------------------------------------------- --------- 0 recursive calls 0 db block gets 88588 consistent gets 15615 physical reads 0 redo size 507 bytes sent via SQL * Net to client 651 bytes received via SQL * Net from client 2 SQL * Net roundtrips to / From Client 0 Sorts (Memory) 0 Sorts (Disk) 1 ROWS Processed
This SQL statement has a result returned and found that it is also a full table scan for the Localusage table, but the speed is also very slow, and it has been used for 3 minutes. SQL> SELECT Count (*) from localusage; count (*) ---------- 5793776
The table has more than 5.79 million records, the amount of data is large, and the full table scan is no longer suitable.
5. Check the type of the table:
SQL> SELECT INDEX_NAME, TABLE_NAME, STATUS, Partitioned from user_indexes where table_name = 'localusage';
Index_name Table_name Status Par ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- -------------- -------- --- I_LOCALUSAGE_SID LOCALUSAGE N / A YESUI_LOCALUSAGE_ST_SEQ LOCALUSAGE N / A YESSQL> SELECT index_name, table_name, locality FROM user_part_indexes where table_name = 'LOCALUSAGE "
Index_name table_name locali --------------------------------------------- ------------- ------ I_LOCALUSAGE_SID LOCALUSAGE LOCALUI_LOCALUSAGE_ST_SEQ LOCALUSAGE LOCAL
It is found that the table is a partition table and established a partition index on the ServiceID, Startime, and CDRSEquence columns. The index type is the local index. 6. View the index key value of the partition index:
SQL> SELECT INDEX_NAME, Column_Name, INDEX_OWNER from DBA_IND_COLUMNS Where Table_Name = 'localusage'
Index_name column_name index_owner ----------------------------------------------------------------------------------------------------- ----------------------- i_localusage_sid serviceid wacosui_localusage_st_seq starttime wacosui_localusage_st_seq CDRSEQUENCE WACOS
It is found that there is no index on the EndTime and Caller columns, which is also the final reason for the SQL statement to make a full table scan.
7. Decide to create a new partition index to eliminate full mete scans:
(1). First look at the Localusage Table Partition:
SQL> SELECT Partition_name, TableSpace_name from user_tab_partitions where table_name = 'localusage';
Partition_name TableSpace_name ---------------------------------------------------------------------------------------------------------------------------- ------------ LOCALUSAGE_200312 WACOSLOCALUSAGE_200401 WACOSLOCALUSAGE_200402 WACOSLOCALUSAGE_200404 WACOSLOCALUSAGE_200405 WACOSLOCALUSAGE_200406 WACOSLOCALUSAGE_200407 WACOSLOCALUSAGE_200409 WACOSLOCALUSAGE_200410 WACOSLOCALUSAGE_200411 WACOSLOCALUSAGE_200403 WACOSLOCALUSAGE_200408 WACOSLOCALUSAGE_200412 WACOS13 rows selected.
. (2) create local partitioned indexes on the caller column: SQL> set timing onSQL> create index I_LOCALUSAGE_CALLER on localusage (caller) LOCAL (PARTITION LOCALUSAGE_200312, PARTITION LOCALUSAGE_200401, PARTITION LOCALUSAGE_200402, PARTITION LOCALUSAGE_200404, PARTITION LOCALUSAGE_200405, PARTITION LOCALUSAGE_200406, PARTITION LOCALUSAGE_200407, PARTITION LOCALUSAGE_200409, PARTITION LOCALUSAGE_200410, PARTITION LOCALUSAGE_200411, PARTITION LOCALUSAGE_200403, PARTITION LOCALUSAGE_200408, PARTITION LOCALUSAGE_200412) TABLESPACE wacosSTORAGE (INITIAL 6553600 NEXT 6553600 MAXEXTENTS unlimited PCTINCREASE 0) PCTFREE 5 NOLOGGING;
Index created.
ELAPSED: 00: 06: 27.90 (due to large data volume, 6 minutes)
8. Review again: SQL> SELECT NVL (SUM (Localcharge), 0), NVL (SUM (Usage), 0) from localusage where to_char (endtime, 'YYYYMMDD') = 20041016and localcharge> 0 and caller like '0543886 % ';
NVL (SUM (Localcharge), 0) NVL (SUM (Usage), 0) ------------------------------- -------- 0 0
Elapsed: 00: 00: 03.00
Execution Plan ------------------------------------------------ ---------- 0 Select Statement Optimizer = Choose (COST = 22 Card = 1 Bytes = 53) 1 0 Sort (AggRegate) 2 1 Partition Range (ALL) 3 2 Table Access (by local index rowid) Of 'localusage' (cost = 22 card = 181 bytes = 9593) 4 3 Index (Range scan) of 'i_localusage_caller' (COST = 14 card = 65063) statistics ---------- ------------------------------------------------ 0 Recursive calls 0 db block gets 16813 consistent gets 569 physical reads 0 redo size 580 bytes sent via SQL * Net to client 651 bytes received via SQL * Net from client 2 SQL * Net roundtrips to / from client 0 sorts (memory) 0 sorts (disk 1 ROWS Processed
This time, the speed is obviously faster, and the result is returned for 3 seconds. SQL> SELECT NVL (SUM (Localcharge), 0) from localusage where to_char (endtime, 'yyyymmdd') = 20040816 and Caller Like '0543888%;
NVL (SUM (Localcharge), 0) ----------------------- 27.6
Elapsed: 00: 00: 24.73
Execution Plan ------------------------------------------------ ---------- 0 Select Statement Optimizer = Choose (COST = 22 Card = 1 Bytes = 40) 1 0 Sort (AggRegate) 2 1 Partition Range (All) 3 2 Table Access (by local index rowid) Of 'localusage' (COST = 22 card = 3615 bytes = 144600) 4 3 index (Range scan) of 'i_localusage_caller' (cost = 14 card = 65063)
Statistics --------------------------------------------------- --------- 0 recursive calls 0 db block gets 129336 consistent gets 7241 physical reads 0 redo size 507 bytes sent via SQL * Net to client 651 bytes via received SQL * Net from client 2 SQL * Net roundtrips to / From client 0 Sorts (Memory) 0 Sorts (Disk) 1 Rows Processed This SQL statement has taken an index, and the result is returned for 24 seconds, and the performance has improved significantly. 9. Continue to observe the use of CPU: BZ_DB1 # SAR 2 4Sunos BZ_DB1 5.8 Generic_108528-19 Sun4u 10/26/04
12:09:38% USR% SYS% WIO% IDLE12: 09: 40 1 2 0 9712: 09: 42 1 0 0 0 9812: 09: 44 2 2 2 9412: 09: 46 1 0 2 97
Average 1 1 97bz_db1 # vmstat 3 Procs Memory Page Disk Faults CPU RBW SWAP Free Re M2 M2 M4 in Sy CS US Sy ID 0 0 0 9736008 2603896 0 0 0 0 0 0 0 4 3 3 0 4294967196 0 0 9183336 1596392 0 2 66 0 0 0 0 0 0 0 0 434 689 1255 1 0 99 0 0 0 9183336 1596280 32 152 32 0 0 0 0 0 0 0 0 396 1046 1119 1 1 98 0 0 0 9183335 0 0 0 0 28 22 22 0 601 1165 1777 2 1 98 0 0 0 918336 1595608 0 0 0 136 0 0 0 0 0 0 0 0 552 1155 1797 1 1 98 0 0 0 The results showed normal use of CPU usage using the results. At this point, this problem is solved.