Fixed experience in SQL Server2000 database ****************************************************** ******************************************* AUTHOR: Huangshan Guangming top mail: Leimin@jxfw.com Version: 1.0.0 date: 2004-1-30 (For reproduced, please indicate the source! If there is a problem, please send me :-)) **************** *********************************************************** *********** The background of a story I said is like this, using SQL Server 2000 in a POS project to do a reception database, IBM's DB2 is a background database. The environment where the front desk database is such an operating system is Windows2000 Server (10 users), the database is SQLServer2000 (E) SP3, Application is a cashier system (a real-time trading system). The hardware configuration is: P4 XRON 2.4G * 2, 36G HDD * 5 made RAID5, 1G Memory, HP DDS4 tape drive, the capacity of the database is generally kept around 5g. Because the data is more important, and the data capacity is not large, our backup strategy is a full backup of POS_DB every day (a loop for a loop for a week in the evening), all backup on the hard disk (Master, MSDB) , POS_DB). This keeps double insurance. 1. Fault broke: 2003-12-26 13:00 Customer Report All POS cranes and Server runs very slow.
After restarting the server (start alarm to check the RAID card) We found this information in the "System Log" in WINDEOWS 2000 Server: Error: 823, Severity: 24, State: 2 I / O Error (Torn Page) Detected During Read at Offset 0x0000001BF96000 in file d: /data/pos_db.mdf '. SQL Server's "Error Log" is available in "Error Log": 2003-12-10 03: 34: 22.23 SPID56 Error: 823, Severity: 24, State : 2 2003-12-10 03: 34: 22.23 SPID56 I / O Error (Torn Page) Detected During Read at Offset 0x00000074964000 In File 'D: /Data/pos_db.mdf' .. from MSDN: I / O Logical check failure: If a read Windows API call or a write Windows API call for a database file is successful, but specific logical checks on the data are not successful (a torn page, for example), an 823 error is raised The following error. Message is an example of an 823 error for an I / o Logical Check Failure: 2003-09-05 16: 51: 18.90 Spid17 error: 823, Severity: 24, State: 2 2003-09-05 16: 51: 18.90 SPID17 I / O Error (Torn Page) Detected During Read at Offset 0x00000094004000 in File 'f: /sqldata/mydb.mdf' .. to resolve this problem, first run the DBCC CHECKDB statement on the database that is associated with the file in the error message. If the DBCC CHECKDB statement reports errors, correct those errors before you troubleshoot this problem. If the problem persists even after the DBCC CHECKDB errors have been corrected, or if the DBCC CHECKDB statement does not report any errors, review the Microsoft Windows NT system event log for any system errors or disk-related errors. You can also contact your hardware vendor to run any appropriate diagnostics. I / O logic Check failed: If there is a Windows program to be successful when reading and writing database files, there is no success in detailed data logic check (such as: incomplete pages), SQL Server returns an error in the MSG 823.
Here is an I / O logic check failed to fail MSG 823: 2003-09-05 16: 51: 18.90 SPID17 Error: 823, Severity: 24, State: 2 2003-09-05 16: 51: 18.90 SPID17 I / O Error (Torn Page) Detected During Read at Offset 0x00000094004000 In File 'f: /sqldata/mydb.mdf' .. To resolve such a problem, first perform DBCC Checkdb in this database (the database file of the error message prompt). If the DBCC Checkdb is wrong, correct these errors before you fix an error. If these error messages are kept until the DBCC Checkdb run, or DBCC Checkdb does not report any errors, check the event viewer for the Windows NT system and system errors or disk errors. You can also contact hardware vendors to run the correct diagnostic tool. Broken :- (, the database file has a problem, check the OS's Event Viewer, we found there to have an error message before a week (only the offset address of Offset). Hurry to check HDD, really found in RAID5 first Fast HDD bright red light (too much dust, it is difficult to see) Execute DBCC CHECKDB ('POS_DB') check: Server: MSG 8909, Level 16, State 1, Line 1 Table Error: Object ID 26342838, Index ID 35207 Page ID (1: 50978). The pageid in the page header = (32230: -2048732002). Server: MSG 8939, Level 16, State 1, Line 1 Table Error: Object ID 859150106, Index ID 255, Page (1 : 238770). TEST (is_on (buf_ioerr, bp-> bstat) && bp-> Berrcode) failed. Values are 2057 and -1. Server: MSG 8928, Level 16, State 1, Line 1 Object ID 861246123, INDEX ID 0 : Page (1: 57291) Could Not Be Processed. See Other Errors for Details. Server: MSG 2511, Level 16, State 1, Line 1 Table Error: Object ID 862626116, Index ID 0. Keys Out of Order on Page (1 : 269310), Slots 0 and 1.haha, there is a lot of tables have false associations (please record the Object ID of each wrong table) from MSDN: Error number MSG 823: Represents SQLServer in reading data and writing The data is detected when the hardware device has a problem or there is a problem. Torn page: The meaning is incomplete page 0x0000001BF96000: This is the number of bytes from the data file to Torn Page.
Error number MSG 8939: Everyone can take a look: http://support.microsoft.com/default.aspx? Kbid = 320434 FIX: When you run checkdb, a large-capacity insertion (Bulk INSERT, BCP, etc.) with Tablock prompts may result in Error 8929 and 8965 Error Number MSG 8928: Yes, the information associated with 8939, the error number MSG 8965: Yes, the information associated with 8939, you can find the relevant information to the following address: http://support.microsoft.com /default.aspx?scid=kb;n-us; 826433 PRB: Additional SQL Server Diagnostics Added to Detect Unreported I / O Problems http://support.microsoft.com/default.aspx?scid=kb;n-us; 828339 PRB: Error message 823 may indicate hardware problems or system problems http://support.microsoft.com/default.aspx?scid=kb;en-us;308795 FIX: CheckDB May Not Fix Error 8909 or Error 8905 fault diagnosis: RAID has an HDD bad, causing database file damage 2. Replacing HDD2003-12-28 23:00 Now reflects the benefits of RAID5, breaking an HDD, the system can run as usual, but the system log and SQLServer log still have MSG823 An error message. Bind the new HDD to the original RAID5 in accordance with the RAID card, successfully completed :-) Check the database's integrity DBCC Checkdb ('POS_DB') with all_errormsgs, the same Error before replacing HDD Information, it seems that the database file is still problematic. - There is a strange problem 1, since it is a 5 HDD RAID5, why there is a HDD bad that affects the damage of the database file, does not understand? ? ? :-( 3. Recovery Database 2003-12-29 00:30 There is no way, using backup data set recovery database (what is important to backup) Use master Go Restore Database POS_DB from from Disk = 'D: / DatabaseBackup / POS_DB_BACKUP .Dat 'restarts the MSSQLSERCVER service, net stop mssqlserver / net start mssqlserver uses DBCC Check the database's integrity DBCC Checkdb (' pOS_DB ') with all_errormsgs, the error message before recovery, has not changed .-- 奇怪 问题 2, SQLServer BACKUP does not verify the integrity of the database, the full backup of the database is problematic. Anger !! It seems that it can only be repaired by the tool (- Record the number of records recorded before modifying, to repair the database Comparison).
In Run Query Analyzer: ALTER DATABASE POS_DB SET SINGL_USER GO DBCC CHECKDB ( 'POS_DB', repair_allow_data_loss) WITH TABLOCK GO ALTER DATABASE POS_DB SET MULTI_USER GO CHECKDB has three parameters: REPAIR_ALLOW_DATA_LOSS Performs all repairs done by REPAIR_REBUILD, including row and The page is assigned and unassigned to correct allocation errors, structural rows or pages, and delete corrupted text objects. These repairs may result in some data loss. Repair operations can be done under user transactions to allow users to roll back. If rollback fixes, the database still contains errors and should be recovered from backup. If an error is missing, any repair depending on the repair is missing because the repair level is provided. After the repair is complete, the database is backed up. REPAIR_FAST is small, not time consuming, such as repairing additional keys in the non-aggregated index. These repairs can be completed quickly and will not have the risk of losing data. Repair_Rebuild Performs all fixes completed by Repair_Fast, including repair required for a longer period of time (such as rebuilding indexes). Do not have the risk of losing data when performing these fixes. The first run, we will find: DBCC results for 'TABLE_NAME' There are 1 rows in 1 pages for object 'TABLE_NAME' The error has been repaired CHECKDB found 0 allocation errors and 1 consistency errors in table '(Object ID... 26342838) '(object ID 26342838). CHECKDB fixed 0 allocation errors and 1 consistency errors in table' (Object ID 26342838) '(object ID 26342838). such information are numerous and have "the error has been repaired" tips . However, there is still such information: Checkdb Found 0 Allocation Errors and 19 Consistency Errors in Database 'POS_DB'. Checkdb Fixed 0 Allocation Errors and 19 consistency errors in database 'pos_db'. Run again, or the same error. Worse: =) It seems that this way is unable to repair such measurement errors. failure! ! ! Take a closer look at the SQL Server Bol found that there is a very useful parameter of the physical_only physical_only only to check the integrity of the physical structure of the check page and the record title, and the consistency between the page object ID and the index ID and the allocation structure. This check is designed to check the physical consistency of the database at a lower overhead while also detecting a hunching and common hardware failure of user data security. Physical_only always means NO_INFOMSGS, and cannot be used with any fix options.
Run again: DBCC CHECKDB ( 'POS_DB') with NO_INFOMSGS, PHYSICAL_ONLY then run: DBCC CHECKDB ( 'POS_DB', repair_allow_data_loss) WITH TABLOCK This will return an error message number 8952.8956 is: Server: Msg 8952, Level 16, State 1, Line 1Table Error: Database 'POS_DB', INDEX 'POS_REFER.IDX2_POS_REFER' (Id 861246123) (Id 861246123) (Idra OR INVALID Key for the Keys: Server: MSG 8956, Level 16, State 1, Line 1Index Row (1: 26315: 23) with values (PLU_ID = '6922825200240' and PRD_AGGR_ID = 10006 and EVNT_ID = NULL and RGST_MDE = 0 and SUBPRD_NBR = 0 and STR_ID = 12 and PRD_AGGR_ID = 10006 and SUBPRD_NBR = 0 and STR_ID = 12 and PLU_ID = '6922825200240' and EVNT_ID = NULL and RGST_MDE = 0) points to the data row identified by () according to the instructions on the MSDN:.. This problem does not cause any data or index corruption the problem is in the metadata which is corrected only by dropping and re -creating the indexes. These issues do not cause damage to data or indexes, and the metadata of these issues is correct, just deleting re-establish indexes. It seems that the problem is modified. Run DBCC Checkdb again, run again: DBCC Checkdb ('POS_DB'), and Message has no error message. OK Successfully Repair :-) 4. Check the repaired database and back up the database to check the DBCC CHECKDB error-related table, and compare the number before the DBCC is executed, and find a small 40 records.