Analysis of problems with high diagnosis and solving CPU utilization

xiaoxiao2021-03-06  40

Author: Allan (allan@itpub.net)

Oracle Database often encounters a high CPU utilization, this time is mostly a SQL statement in the database. This SQL statement has greatly consumed CPU resources, causing the entire system. Of course, the reason for the SQL statement that causes severe performance is multifaceted, and specific reasons should be analyzed. The following is how to diagnose and solve the high CPU utilization.

Operating system: SOLAIRS8

Database: Oracle

9.2.0

.4

Problem Description: Field engineers report the database very slow, almost all application operations are not working properly.

First land the host, execute the TOP to find that the CPU resources are almost exhausted, there are many processes that occupy the CPU, and the memory and I / O are not high, the specific:

Last Pid: 26136; Load Averages: 8.89, 8.91, 8.12

216 Processes: 204 Sleeping, 8 Running, 4 On CPU

CPU States: 0.6% iDLE, 97.3% User, 1.8% Kernel, 0.2% iowait, 0.0% SWAP

Memory: 8192m Real, 1166M Free, 14M Swap in Use, 8179M Swap Free

Pid Username THR PRI NICE SIZE

RES

State

Time CPU Command

25725 Oracle 1 50 0 4550M 4508M CPU2

12:23

11.23% Oracle

25774 Oracle 1 41 0 4550M 4508M Run

14:25

10.66% Oracle

26016 Oracle 1 31 0 4550M 4508M RUN

5:41

10.37% Oracle

26010 Oracle 1 41 0 4550M 4508M Run

4:40

9.81% Oracle

26014 Oracle 1 51 0 4550M 4506M CPU6

4:19

9.76% Oracle

25873 Oracle 1 41 0 4550M 4508M Run

12:10

9.45% Oracle

25723 Oracle 1 50 0 4550M 4508M Run

15:09

9.40% Oracle

26121 Oracle 1 41 0 4550M 4506M CPU0

1:13

9.28% Oracle

So first check the alarm log Alert file for the database, there is nothing to find out, the log shows that the database is running normally, and the rule of the database itself exists.

Then check what the Oracle process is doing this occupied CPU resources, use the following SQL statement:

SELECT SQL_TEXT, SPID, V $ Session.program, Process from

V $ SQLAREA, V $ SESSION, V $ Process

Where v $ sqlarea.address = V $ session.sql_address

And v $ sqlarea.hash_value = v $ session.sql_hash_value

And v $ session.paddr = v $ process.addr

And v $ process.spid in (pid);

Use the PID in the PID in TOP to use the PID in the PID in the CPU, get the SQL statement executed by the corresponding Oracle process, found that the process that occupies the CPU resource is executing the same SQL statement: SELECT D.DOMAINNAME, D .mswitchdomainid, a.serviceid, a.Servicecode, A.USERTYPE, A.STATUS, A.NotifyStatus, to_char (A.Datecreated, 'YYYY-MM-DD HH24: MI: SS') Datecreated, VIPFLAG, STATUS2, CUSTOMERTYPE, CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn aND a.ServiceCode like c.code || '%' and a.serviceSpecID = 1 and a .status! = '4' and a.status! = '10' and a.Servicecode Like '010987654321%' and subsidiaryid = 99999999

Basically, this SQL has caused a large number of system CPU resources. What is the reason for this SQL such a large amount of CPU resources, let's take a look at the database's process waiting for the event:

SQL> SELECT SID, Event, P1, P1text from V $ session_wait;

Sid Event P1 P1Text

---------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------

12 latch free 4.3982e 12 Address

36 Latch Free 4.3982E 12 Address

37 latch free 4.3982e 12 address

84 Latch Free 4.3982E 12 Address

102 latch free 4.3982e 12 address

101 Latch Free 4.3982E 12 Address

85 latch free 4.3982e 12 address

106 Latch Free 4.3982E 12 Address

155 Latch Free 4.3982E 12 Address

151 Latch Free 4.3982E 12 Address

149 Latch Free 4.3982E 12 Address

147 Latch Free 4.3982E 12 Address

1 PMON Timer 300 Duration

From the above query we can see that most of the Latch Free is waiting for events, and then check what these Latch wait is generated:

SQL> SELECT SPID FROM V $ Process Where Addr in

(SELECT PADDR from V $ Session WHERE SID IN (84, 102, 101, 106, 155, 151));

Spid

----------------

25774

26010

25873

25725

This shows that Latch Free wait for the event that the SQL statement above is waiting, which takes up a lot of CPU resources. Let's take a look at the waiting type of Latch waiting, according to the following SQL statement: SQL> SELECT LATCH #, Name, Gets, Misses, Sleeps

From v $ latch

Where Sleeps> 0

ORDER by Sleeps;

Latch # name Gets Misses Sleeps

---------- -------------------------------------------------------------------------------------------------------------------------------- ---------------------------------------------------------------------------------------------------------------------------------------

15 Messages 96876 20 1

159 Library Cache Pin Allocation 407322 43 1

132 DML Lock Allocation 194533 213 2

4 session allocation 304897 48 3

115 redo allocation 238031 286 4

17 enqueue hash chains 277510 85 5

7 session idle bit 2727264 314 16

158 Library Cache Pin 3881788 5586 58

156 Shared Pool 2771629 6184 662

157 Library Cache 5637573 25246 801

98 Cache Buffers Chains 1722750424 758400 109837

From the above query, you can see that the most important Latch wait is Cache Buffers, this latch Waiting indicates that data stocks compete in separate blocks These Latch, let's see this lative lative Latch and its corresponding type:

SQL> SELECT Addr, Latch #, Gets, Misses, Sleeps

From v $ latch_children

Where Sleeps> 0

And latch # = 98

ORDER by Sleeps DESC;

AddR Latch # gets miss Sleeps

-------------------------------------------------------------------------------- --------

000004000A3DFD10 98 10840661 82891 389

000004000A698C70 98 159510 2 244

0000040009B21738 98 104269771 34926 209

0000040009B227A8 98 107604659 35697 185

000004000A3E0D70 98 5447601 18922 156000004000A6C2BD0 98 853375 7 134

0000040009B24888 98 85538409 25752 106

...............

Then we come to see more than SLEEP more objects:

SQL> Select Distinct A.OWNER, A.SEGMENT_NAME, A.SEGMENT_TYPE FROM

DBA_EXTENTS A,

(Select DBARFIL, DBABLK

From x $ bh

WHERE HLADDR IN

(SELECT ADDR

From (SELECT ADDR

From v $ latch_children

Order by Sleeps DESC)

WHERE ROWNUM <5)) B

WHERE A.RELATIVE_FNO = B. DBARFIL

And a.block_id <= b.dbablk and a.block_id a.blocks> b.dbablk;

Owner segment_name segment_type

-------------------------------------------------- -------------------------

Test i_service_servicespecid index

Test i_service_subsidiaryid Index

Test Service Table

Test mswitchdomain table

TEST i_SERVICE_SC_S INDEX

.......................

We see several objects in the SQL statement in the beginning of the SQL statement, so let's take a look at the beginning of the SQL execution plan:

SQL> Set Autotrace TRACE EXPLAIN

SQL> Select D. DomainName, D. MswitchDomainid, A.ServiceId, A.Servicecode, A.USERTYPE, A.STATUS, A.NOTIFYSTATUS, TO_CHAR (A.DATECREATED, 'YYYY-MM-DD HH24: MI: SS') DATECREATED, VIPFLAG, STATUS2, CUSTOMERTYPE, CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn aND a.ServiceCode like c.code || '%' And A.Servicespecid = 1 and A.Status! = '4' and a.status! = '10' and a.Servicecode Like '010987654321%' and subsidiaryid = 999999999; Execution PLAN

-------------------------------------------------- ------------

0 Select Statement Optimizer = Choose

1 0 NESTED LOOPS

2 1 nested loops

3 2 NESTED LOOPS

4 3 Table Access (Full) of 'SubbureAunumber'

5 3 Table Access (by index rowid) of 'GatewayLoc'

6 5 Index (Unique Scan) of 'PK_GATEWAYLOC' (UNIQUE)

7 2 Table Access (by index rowid) of 'mswitchdomain'

8 7 Index (Unique Scan) of 'PK_MSwitchDomain' (Unique)

9 1 Table Access (by Index Rowid) of 'Service'

10 9 and-equal

11 10 Index (Range Scan) of 'i_service_serviceSpecid' (Non

-Unique)

12 10 INDEX (Range Scan) of 'i_service_subsidiaryID' (Non-

Unique)

Depending on the beginning of the object to cause the Latch Free waiting in the Latch Free waiting, I feel that the index on the service table has problems, and it seems to have too many scans, so the same SQL statement is the same in other land. Execute on the database to view the appropriate execution plan:

SQL> Set Autotrace TRACE EXPLAIN

SQL> Select D. DomainName, D. MswitchDomainid, A.ServiceId, A.Servicecode, A.USERTYPE, A.STATUS, A.NOTIFYSTATUS, TO_CHAR (A.DATECREATED, 'YYYY-MM-DD HH24: MI: SS') DATECREATED, VIPFLAG, STATUS2, CUSTOMERTYPE, CUSTOMERID FROM service a, gatewayloc b, subbureaunumber c, mswitchdomain d WHERE b.mswitchdomainid = d.mswitchdomainid and b.gatewaysn = c.gatewaysn aND a.ServiceCode like c.code || '%' And A.Servicespecid = 1 and A.Status! = '4' and a.status! = '10' and a.Servicecode Like '010987654321%' and subsidiaryid = 999999999; Execution PLAN

-------------------------------------------------- ------------

0 Select Statement Optimizer = Choose

1 0 Table Access (By Index Rowid) of 'Service'

2 1 nested loops

3 2 NESTED LOOPS

4 3 Nested Loops

5 4 Table Access (Full) of 'SubbureAunumber'

6 4 Table Access (By Index Rowid) of 'Gatewayloc'

7 6 INDEX (UNIQUE SCAN) of 'PK_GATEWAYLOC' (UNIQUE)

8 3 Table Access (by index rowid) of 'mswitchdomain'

9 8 index (unique scan) of 'pk_mswitchdomain' (unique)

10 2 INDEX (RANGE SCAN) of 'i_service_sc_s' (non-unique)

Compare two execution plans, find index i_service_servicespecid and i_service_subsidiaryid should not be taken, so compare the number of indexes on both local service tables:

SQL> SELECT INDEX_NAME from User_indexes where Table_Name = 'Service'

Index_name

------------------------------

I_service_accountnum

I_service_cid

I_service_dateactivated

I_service_pricepland

I_service_sc_s

I_Service_ServiceCode

I_service_servicespecid

I_service_subsidiaryidID

PK_SERVICE_SID

SQL> SELECT INDEX_NAME from User_indexes where Table_Name = 'Service'; Index_Name

------------------------------

I_service_accountnum

I_service_cid

I_service_dateactivated

I_service_sc_s

I_Service_ServiceCode

PK_SERVICE_SID

Discover how many I_Service_PricePlanid, i_service_servicespecid, i_service_servicespecid, i_service_servicespecid, i_service_servicespecid, i_service_servicespeiRiD, i_service_service_service_servicespecid, i_service_service, and these indexes have caused the SQL statement to use the index that should not be used, causing the Latch Free waiting and CPU occupied A very high culprit, so I delete the three indexes, re-executing the corresponding SQL statement, soon, the result is that the utilization rate of the CPU will fall immediately as normal, and the observations are as follows:

Last Pid: 26387; Load Averages: 1.61, 1.38, 1.21

195 Processes: 194 Sleeping, 1 On CPU

CPU States: 96.2% iDLE, 1.6% User, 1.7% Kernel, 0.5% iowait, 0.0% SWAP

Memory: 8192m REAL, 1183M Free, 14M Swap in Use, 8179M Swap Free

Pid Username THR PRI NICE SIZE

RES

State

Time CPU Command

26383 Oracle 1 59 0 4550M 4506M Sleep 0:12 4.52% Oracle

409 root 15 59 0 7168K 7008K Sleep 173.1H 0.53% PICLD

25653 Oracle 1 59 0 4550M 4508M SLEEP

2:12

0.48% Oracle

26384 Root 1 59 0 2800K 1912K CPU2 0:00 0.21% TOP-3.5B8-Sun4u

25569 Oracle 1 59 0 4550M 4508M Sleep 0:12 0.09% Oracle

25717 Oracle 1 59 0 4550M 4507M Sleep 0:07 0.05% Oracle

25571 Oracle 1 59 0 4550M 4507M Sleep 0:10 0.04% Oracle

25681 Oracle 1 59 0 4550M 4508M Sleep 0:10 0.04% Oracle

25544 Oracle 1 58 0 4554M 4501M Sleep 0:14 0.03% Oracle

25703 Oracle 1 59 0 4550M 4506M Sleep 0:23 0.03% Oracle

..................

For the case where the CPU utilization is too high, if the SQL statement is relatively low, it can be diagnosed and solved according to this idea. Of course, the specific problem has to be analyzed, there are many ways to solve the problem, but here is Throwing the jade, as long as it can eventually meet the purpose of our solving the problem.

About the Author:

Photo Temporary Net Name COOLYL

9CBS Emag Oracle Electronics Magazine Editor's ITPUB Oracle Management Master. Good at the maintenance of the database, has its own unique experience for the installation, adjustment, and backup of the database. At the same time, there is also a major training for large companies in China, and there is a certain training experience. Database maintenance and support work for many large projects has been done, which has considerable practical experience in Oracle's maintenance, and is good at solving problems on site. He has worked for a large software company to do Oracle databases, customers all over the country, especially telecommunications, and government industries. He is currently based in North China Branch, DBA, responsible for the maintenance of more than 40 database systems in North China, rich in large database management experience. The editor of the "Oracle Database DBA Special Technology" and the main author.

Mail address: allan@itpub.net

转载请注明原文地址:https://www.9cbs.com/read-61665.html

New Post(0)