Source: http://www.ddvip.net/Ob/Ob/linux/index6/56.htm Find out and resolve the main method of program errors on Linux Steve Best (sbest@us.ibm.com) JFS core team member, IBM You can use various ways to monitor the running user spatial program: You can run the debugger and single step to debug the program, add a print statement, or add a tool to analyze the program. This article describes several methods that can be used to debug programs running on Linux. We will review the four debugging issues, including paragraph errors, memory overflow, and leaks, and hang. This article discusses four cases of debugging Linux programs. In the first case, we use two sample programs with memory allocation issues, using the MemWatch and Yet Another Malloc Debugger (YAMD) tools to debug them. In the second case, we use the Strace utility in Linux, which can track the system calls and signals to find out where the program is wrong. In the third case, we use the Linux kernel's OOPS function to resolve the program's paragraph error and show you how to set the kernel source Level Debugger (KGDB) to use the GNU debugger (GNU Debugger , GDB) to solve the same problem; the KGDB program is a Linux kernel remote GDB using a serial connection. In the fourth case, we use the magic key sequence provided on Linux to display information that triggers the components of the suspend problem. Common debugging methods When you include an error in your program, you are likely to have a condition in your code, you think it is true (TRUE), but actually false. The process of finding the wrong way is that after finding the error, it has been confident that it is really a condition for a certain condition. The following examples are some types of conditions you might be confident: somewhere in the source code, a variable has a specific value. At a given place, a structure has been properly set. For a given IF-THEN-ELSE statement, the IF part is the path being executed. When the subroutine is called, the routine correctly receives its parameters. It is to identify whether all of the above conditions exist. If you are sure that a variable should have a specific value when the subroutine is called, then check if the situation is true. If you believe that the IF structure will be executed, then check if it is true. Usually, your hypothesis will be correct, but eventually you will find that the assumption does not match. As a result, you will find a place where there is a mistake. Debugging is the task you can't escape. There are many ways to make commissioning, such as printing messages to the screen, using debuggers, or just considering the situation executed, and carefully tricks. You must find out its source before fixing the problem. For example, for a paragraph error, you need to understand which line of the code occurs in the code. Once you have discovered the error in the code, make sure the value of the variable in this method, the method called by the method, and how the error will occur. Using the debugger will make the information that finds all this is very simple. If there is no debugger available, you can use other tools. (Please note that the debugger may not be provided in the product environment, and the Linux kernel does not have a built-in debugger.) Practical Memory and kernel tools You can use the debugging tools on Linux, trace user space and kernel issues in a variety of ways .
Use the tools and techniques below to build and debug your source code: User Space Tools: Memory Tools: MemWatch and Yamd Strace GNU Debugger (GDB) Magic Key Press Core Tools: Nuclear Source Code Debugger (KGDB) Built Nuclear Trial (KDB) OOPS This article will discuss a number of issues that are not easy to find through artificial inspection code, and such problems exist only in rare cases. Memory errors typically appear in multiple situations, and you can only find memory errors after deploying programs. In the first case: Memory Trying Tool C Language serves as a standard programming language on Linux system, which gives us great control of dynamic memory allocation. However, this freedom may lead to severe memory management issues, and these issues may cause the program to crash or degrade performance over time. Memory leak (ie, there is a corresponding free () call execution after the corresponding free () call execution) and buffer overflow (such as writing a memory that previously assigned to a number of arrays) is some common problems, they may be difficult detected. This section will discuss several debugging tools that greatly simplify the process of detecting and finding memory issues. MemWatch MemWatch is written by Johan Lindh, is an open source C language memory error detection tool, you can download it yourself (see the reference) of this article). As long as you add a header file in your code and define the MemWatch in the GCC statement, you can track the memory leaks and errors in the program. MemWatch supports ANSI C, which provides results log records that detect double release (Double-free), error release, unfreed memory, overflow, and so on. Listing 1. Memory Sample (TEST1.C) Code: #include #include #include "MemWatch.h" int main (void) {char * ptr1; char * ptr2; ptr1 = malloc (512); ptr2 = malloc (512) ; PTR2 = Ptr1; Free (PTR2); Free (PTR1);}
The code in Listing 1 will allocate two 512-byte memory blocks, and then point to the pointer to the first memory block is set to point to the second memory block. As a result, the address of the second memory block is lost, resulting in memory leakage. Now we compile the list 1 MemWatch.c. Below is an Makefile example: Test1 gcc -dmemwatch -dmw_stdio test1.c MemWatch C -O Test1 When you run the test1 program, it generates a report about the leak-leaking memory. Listing 2 shows the sample memwatch.log output file. Listing 2. Test1 MemWatch.log file MemWatch 2.67 Copyright (c) 1992-1999 Johan Lindh ... double-free: <4> TEST1.C (15), 0x80517b4 WAS FREED from test1.c (14) ... unfreed : <2> test1.c (11), 512 bytes at 0x80519e4 {FE FE FE FE FE FE FE FE FE FE FE FE ..............} Memory usage statistics (global): N) Umber of allocations Made: 2 L) ARGEST MEMORY USAGE: 1024 T) Otal of All Alloc () Calls: 1024 U) NFREED BYTES TOTALS: 512 MemWatch to show the truly problem. If you release a already released pointer, it will tell you. The same is true for memory that is not released. The end of the log shows statistics, including how much memory has been leaked, how much memory is used, and how much memory is allocated. YAMD YAMD software package is written by Nate Eldredge, you can find a problem related to memory allocation in C and C . When writing this article, YAMD's latest version is 0.32. Please download YAMD-0.32.tar.gz (see Resources). Execute the Make command to build a program; then execute the make install command installer and set the tool. Once you have downloaded YAMD, use it on Test1.c. Please delete #include memwatch.h and make Makefile as follows: Use YAMD Test1 GCC -G Test1.c -o Test1 Listing 3 shows the output of YAMD on Test1.
Listing 3. Output YAMD Version 0.32 Executable: /usr/src/test/yamd-0.32/test1 ... Info: Normal Allocation of this Block Address 0x40025E00, Size 512 ... Info: Normal Allocation of this block Address 0x40028e00, size 512 ... INFO: Normal deallocation of this block Address 0x40025e00, size 512 ... ERROR: Multiple freeing At free of pointer already freed Address 0x40025e00, size 512 ... WARNING: Memory leak Address 0x40028e00, size 512 WARNING: Total memory leaks: 1 unfreed allocations totaling 512 bytes *** Finished at Tue ... 10:07:15 2002 allocated a grand total of 1024 bytes 2 allocations Average of 512 bytes per allocation Max bytes allocated at one time: 1024 24K Alloced INTERLY / 12 K Mapped Now / 8 K Max Virtual Program Size IS 1416 k end. YAMD shows that we have released memory and there is memory leaks. Let's try YAMD on another sample program in Listing 4. Listing 4. Memory code (TEST2.C) code: #include #include int main (void) {char * ptr1; char * ptr2; char * chptr; int i = 1; ptr1 = malloc (512); ptr2 = malloc 512); chptr = (char *) Malloc (512); for (i; i <= 512; i ) {chptr [i] = 's';} PTR2 = Ptr1; Free (PTR2); free (PTR1); Free (chptr);
You can start Yamd: ./run-yamd / usr / src / test / test2 / usr / src / test / test2 / usr / src / test / test2 / test2 list 5 shows the output of YAMD on the sample program Test2. YAMD tells us that there is a "out-of-bounds" in the FOR cycle. Listing 5. Using YAMD TEST2 output Running / USR / SRC / TEST / TEST2 / TEST2 TEMP OUTPUT to /TMP/yamd-out.1243 ******** ./run-yamd: line 101: 1248 Segmentation FAULT (CORE DUMPED) YAMD VERSION 0.32 Starting Run: / USR / SRC / TEST / TEST2 / TEST2 EXECUTABLE: / USR / SRC / TEST / TEST2 / TEST2 Virtual Program Size IS 1380 K ... INFO: NORMAL Allocation of this Block Address 0x40025e00, size 512 ... INFO: Normal allocation of this block address 0x40028e00, size 512 ... INFO: Normal allocation of this block address 0x4002be00, size 512 ERROR: Crash ... Tried to write address 0x4002c000 Seems to be part of This Block: Address 0x4002Be00, Size 512 ... Address In Questions Will Dump Core After Checking Heap. Done. MemWatch and YAMD are very useful debugging tools, their usage is different . For MemWatch, you will need to add an included file MemWatch.h and open two compilation time tags. For the link statement, YAMD only needs the -g option. Electric Fence Most Linux Distribution Edition contains an electric fence package, but you can also choose to download it. Electric Fence is a malloc () debug library written by Bruce Perens. It is assigned a protected memory after you assign memory. If there is a FENCEPOST error (exceeding the end of the array), the program will generate a protection error and end immediately. By binding to Electric Fence and GDB, you can accurately track which row to try to access protected memory. Another function of Electric Fence is to detect memory leaks. In the second case: Using the Strace Strace command is a powerful tool that display all system calls issued by the user spatial program. Strace Displays the parameters of these calls and returns the value of the symbolic form. Strace receives information from the kernel, and does not need to build the kernel in any special manner. It is useful to send tracking information to applications and kernel developers. In Listing 6, a format of the partition has an error, and the list shows the beginning of the Strace, and the content is about calling up the creation of file system operations (MKFS). STRACE determines which call leads to problems.
Listing 6. At the beginning of Strace on MKFS ("/ sbin / mkfs.jfs", ["mkfs.jfs", "-f", "/ dev / Test1"], & ... open ("/ dev / TEST1 ", O_RDWR | O_LARGEFILE) = 4 stat64 (" / dev / test1 ", {st_mode = &, st_rdev = MakeDev (63, 255), ...}) = 0 ioctl (4, 0x40041271, 0xbffe128) = -1 EINVAL (INVALID Argument) Write (2, "Mkfs.jfs: Warning - Cannot Setb" ..., 98mkfs.jfs: warning - cannot set blocksize on block device / devalid / devalid argument) = 98 stat64 ("/ dev / TEST1 ", {ST_MODE = &, ST_RDEV = MakeDev (63, 255), ...}) = 0 open (" / dev / test1 ", o_rdonly | o_largefile) = 5 ioctl (5, 0x80041272, 0xbffe124) = - 1 Einval (Invalid Argument) Write (2, "Mkfs.jfs: CAN / 'T Determine Device" ..., ..._ exit (1) =? Listing 6 Display IOCTL call results in formatting the MKFS program for formatting partitions failed .Ioctl blkgetsize64 failed. (BLKGET-SIZE64 defined in the source code of IOCTL.) BLKGETSIZE64 IOCTL will be added to all devices in Linux, where the logical volume manager does not support it. So, if BLKGETSIZE64 IOCTL call Failure, MKFS code will be changed to call earlier IOCTL calls; this makes MKFS for logical volume manager. Third case: Using GDB and OOPS You can use GDB program from the command line (Free Software Foundation debugger) Find out errors, you can use G from one of the graphics tools such as Data Display Debugger (DDD). DB program to find out the error. You can use GDB to debug user spatial programs or Linux kernels. This section only discusses the situation of running GDB from the command line. Start GDB using the GDB Program name command. GDB will load the executable program symbol and display the input prompt, allowing you to start using the debugger. You can use the GDB to view the process in three ways: Use the attach command to see a running process; Attach will stop the process. Use the Run command to execute the program and start the debug program from the header. Check the existing core file to determine the status of the process termination. To view the core file, start the GDB with the command below. GDB ProgramName Corefilename is commissioned with a core file, you not only need program executable files and source files, but also the core file itself. To start GDB with a core file, use the -c option: GDB-Core ProgramName GDB displays the core dump in which the program will occur. List the source code you feel wrong before running the program or connected to the running program, set the breakpoint, then start the debug program.
You can use the HELP command to view a comprehensive GDB online help and detailed tutorial. The KGDB KGDB program (remote host Linux kernel modulator using GDB) provides a mechanism for debugging the Linux kernel using a GDB. The KGDB program is an extension of the kernel, which allows you to connect to the kernel machine that runs KGDB expansion when running GDB on the remote host. You can then go deep into the kernel, set breakpoints, check data, and other operations (similar to how you use GDB on your application). One of the main features of this patch is to run the GDB host to connect to the target machine during the boot process (the kernel to be debugged). This allows you to start debugging as soon as possible. Note that the patch adds a function to the Linux kernel, so GDB can be used to debug the Linux kernel. Two machines are required to use KGDB: one is a development machine and the other is a test machine. A serial line (air conditioning demodulator cable) will connect them through the serial port of the machine. The kernel you want to debug is running on the test machine; GDB runs on the development machine. GDB communicates with the serial line with the kernel you want to debug. Follow the steps below to set the KGDB debugging environment: Download the patch for your Linux kernel version. Build a component to the kernel because it is the easiest way to use KGDB. (Note that there are two ways to build most kernel components, such as as a module or directly to the kernel. For example, the Log Record File System, JFS can be built as a module, or directly to the kernel By using the GDB patch, we can build JFS directly into the kernel.) Apply the kernel patches and re-build the kernel. Create a file called .Gdbinit and save it in the kernel source file subdirectory (in other words, / usr / src / linux). Document.Gdbinit has the following four lines of code: Code: SET Remotebaud 115200 Symbol-file VMLinux Target Remote / DEV / TTYS0 SET OUTPUT-RADIX 16
Add append = gdb to LILO, LILO is the bootloader for selecting which kernel for use when booting the kernel. Image = / boot / bzimage-2.4.17 label = gdb2417 read-only root = / dev / sda8 append = "gdb gdbttys = 1 GDB-BAUD = 115200 nmi_watchdog = 0" list 7 is an example of a script, which will be developed The kernel and modules built on the machine introduce the test machine. You need to modify the following: Best @ SFB: User ID and Machine Name. /usr/src/linux-2.4.17: The directory of the kernel source code tree. BZIMAGE-2.4.17: Test the name of the kernel will be guided on the machine. RCP and RSYNC: It must be allowed to run on the machine to build the kernel. Listing 7. The script of the core and module introduced into the test machine SET -X RCP BEST @ sfb: /usr/src/linux-2.4.17/Arch/i386/boot/bzimage /boot/bzimage-2.4.17 RCP Best @ SFB : /usr/src/linux-2.4.17/system.map /boot/system.map-2.4.17 rm -rf /lib/modules/2.4.17 rsync -a best @ sfb: /lib/modules/2.4. 17 / lib / modules chown -r root /lib/modules/2.4.17 LILO Now we can start the GDB program on the development machine by rewinding the directory starting using the kernel source code tree. In this example, the kernel source code tree is located in /usr/src/linux-2.4.17. Enter the GDB launch program. If everything is normal, the test machine will stop during startup. Enter the GDB command CONT to continue the startup process. A common problem is that air conditioning demodulator cables may be connected to the wrong serial port. If the GDB does not start, change the port to the second serial port, which will activate the GDB. Using the KGDB Debug Core Problem Listing 8 lists the modified code in the source code of the JFS_Mount.c file, we created an empty pointer exception in the code, so that the code is generated in the 109th line. Listing 8. Modified JFS_Mount.c code code: INT JFS_MOUNT (Struct Super_block * SB) {... int Ptr; / * line 1 added * / jfyi (1, ("/ nmount jfs / n")); / * * Read / Validate SuperBlock * (Initialize Mount Inode from the Superblock) * / if ((RC = Chksuper (SB))) {goto errout20;} 108 PTR = 0; / * line 2 added * / 109 printk ("% D / N ", * PTR); / * Line 3 Added * /
Listing 9 Displays a GDB exception after issuing a mount command to the file system. KGDB provides several commands, such as displaying data structures and variable values and all tasks in the display system, where they stay, where are they use CPUs and so on. Listing 9 will display the information provided by the backtrack to this issue; where the where command is used to perform reverse tracking, which will tell the executed call to stop in the code. Listing 9. GDB unusual and anti-tracking mount -t jfs / dev / sigsegv, segmentation fault. Jfs_mount (SB = 0xf78a3800) AT JFS_MOUNT.C: 109 109 Printk ("% d / n", * PTR ); (gdb) Where # 0 JFS_MOUNT (SB = 0xf78a3800) AT jfs_mount.c: 109 # 1 0xc01a0dbb in jfs_read_super ... at super.c: 280 # 2 0xc0149ff5 in get_sb_bdev ... at super.c: 620 # 3 0xc014a89f in do_kern_mount ... at super.c: 849 # 4 0xc0160e66 in do_add_mount ... at namespace.c: 569 # 5 0xc01610f4 in do_mount ... at namespace.c: 683 # 6 0xc01611ea in sys_mount ... at namespace .c: 716 # 7 0xc01074a7 in system_call () at AF_PACKET.C: 1891 # 8 0x0 in ?? () (GDB) The next section will also discuss this same JFS error problem, but do not set the debugger if you are In a non-KGDB kernel environment, execute the code in Listing 8, then it uses the OOPS message that the kernel may generate. OOPS Analysis OOPS (also known as PANIC, panic) message contains details of system errors, such as the content of the CPU register. In Linux, the traditional way to debug system crash is to analyze the OOPS message sent to the system console when crashing. Once you have a detail, you can send messages to the Ksymoops utility, which will try to convert the code into instructions and map the stack value to the kernel symbol. In many cases, this information is enough for you to determine what is the possible cause of the error. Note that the OOPS message does not include the core file. Let us assume that the system has just created an OOPS message. As a written code, you want to solve the problem and determine what has caused the OOPS message, or you want to provide most of the information about your problem to the developer showing the OOPS message, thereby solving the problem in time. The OOPS message is part of the equation, but if it does not run through the ksymoops program, it is not enough. The following figure shows the process of formatting the OOPS message. Format OOPS messages Ksymoops requires several content: OOPS messages, from the system.map file from the running kernel, and / proc / ksyms, vmlinux, and / proc / modules. About how to use Ksymoops, kernel source code /usR/src/linux/documentation/oops-tracing.txt or in the KSYMOOPS man page can be referred to. The ksymoops disassembled code section indicates an error, and displays a tracking section indicates how the code is called. First, save the OOPS message in a file to run it through the Ksymoops utility.
Listing 10 shows an OOPS message created by the mount command to install the JFS file system, which is generated by the three lines of code added to the JFS installation code in Listing 8. Listing 10. OOPS messages after Ksymoops, Ksymoops 2.4.0 on i686 2.4.17. Options use ... 15:59:37 sfb1 kernel: unable to handle kernel Null Pointer Dereference At Virtual Address 0000000 ... 15:59: 37 sfb1 kernel: c01588fc ... 15:59:37 sfb1 kernel: * PDE = 0000000 ... 15:59:37 sfb1 kernel: OOps: 0000 ... 15:59:37 sfb1 kernel: CPU: 0 .. 15:59:37 sfb1 kernel: EIP: 0010: [JFS_Mount 60/704] ... 15:59:37 SFB1 kernel: Call TRACE: [JFS_READ_SUPER 287/688] [GET_SB_BDEV 563/736] [Do_kern_mount 189/336] [Do_add_mount 35/208] [do_page_fault 0/1264] ... 15:59:37 sfb1 kernel: Call TRACE: [] ... 15:59:37 sfb1 kernel: ... 15:59:37 SFB1 KERNEL: CODE: 8B 2D 00 00 00 00 55 ... >> EIP; C01588FC <===== ... Trace; C0106CF3 CODE; C01588FC 00000000 <_eip>: Code; C01588FC <===== 0: 8B 2D 00 00 00 MOV 0x0,% EBP <=================================================================================== problem. The OOPS message tells us that the question is caused by an instruction located at the offset address 3c. One of the ways to do this is to use the Objdump utility for JFS_Mount.o files, then view offset addresses 3c. Objdump is used to disassemble the module function to see what compilation instructions you have in your C source code. Listing 11 shows the content you will see after using Objdump, then we check the JFS_MOUNT C code, you can see that the null value is caused by line 109. The offset address 3c is important because the OOPS message identifies the location of the problem. Listing 11. JFS_Mount assembler list code: 109 Printk ("% d / n", * ptr); objdump jfs_mount.o jfs_mount.o: file format ELF32-I386 Disassembly .text: 00000000: 0:55 push% EBP ... 2C: E8 CF 03 00 00 Call 400 31: 89 C3 MOV% EAX,% EBX 33: 58 POP% EAX 34: 85 DB Test% EBX,% EBX 36: 0F 85 55 02 00 00 JNE 291 3C : 8B 2D 00 00 00 00 MOV 0x0,% EBP << Problem Line Above 42: 55 PUSH% EBP
The KDB Linux kernel, KDB is a patch of the Linux kernel, which provides a way to check the kernel memory and data structures when the system is running. Note that KDB does not require two machines, but it does not allow you to debug the source code level like KGDB. You can add additional commands to give the identity or address of the data structure, which can format and display basic system data structures. The current command set allows you to control kernel operations, including the following operations: Stop When the processor is single-step, stop when accessing (or modify) a particular virtual memory location when accessing (or modify) Stop the current active task and all other tasks in the register in the output address space, the stack trace (through the process ID), reverse the instructions override the instruction, you must not fall into a similar to thousands of calls that overflow this overflow situation. Our group spent more time to track the wrong memory error problem. Applications can run on our development workstation, but on new product workstations, this application cannot be run after calling malloc (). The real problem is to overflow after approximately one million calls. This problem exists in the new system because the layout of the preserved Malloc () area is different, so that these scattered memory is placed in different places, and some different content is destroyed when overflows. We use a variety of different techniques to solve this problem, one of which is the use of the debugger, the other is to add a tracking function in the source code. In my career, I am also at this time, I have begun to pay attention to the memory debug tool, I hope to solve these types of problems faster more effectively. When starting a new project, one of my first things is to run MemWatch and YAMD, see if they will indicate a problem with memory management. Memory Leaks are common problems in the application, but you can use the tools described in this article to solve these problems. The fourth case: Using the Magic Key Order Sequence Retrospection If your keyboard is still available when Linux hangs, then you can use the following method to help solve the root of the problem. Follow these steps, you can display the currently running process and all the backtracks of the process using the magic keyword. The kernel you are running must be built in the case where config_magic_sys-req is enabled. You must also be in text mode. CLTR Alt F1 will enable you into text mode, CLTR Alt F7 will return you back to X Windows. When in text mode, press and press. The hit button of the above magic will give the currently running process and stack tracking of all processes. Please find / var / log / messages. If everything is set correctly, the system should have converted the symbolic address of the kernel. Retrospective tracking will be written to the / var / log / messages file. Conclusion Help debugging programs on Linux have many different tools available. The tools described in this article can help you solve many encoding problems. Tools that can display memory leaks, overflow, etc. can solve memory management issues, I found that MemWatch and YAMD are very helpful. Using the Linux kernel patch will make GDB work on Linux kernels, which is helpful to solve the problem of file system for Linux used in my work. In addition, tracking utilities can help determine where the file system utility is faulty during the system call. Try some of these tools when you want to flatten your errors in Linux. References download MemWatch. Download YAMD. Download electricfence. Check the Dynamic Probes debugging function. Please read the article "Linux Software Debugging with GDB".