From: http://www-900.ibm.com/developerWorks/cn/Linux/SDK/L-DEBUG/INDEX.SHTML Find the main method of finding and resolve program errors on Linux Steve Best (sbest@us.ibm. COM) JFS core group member, IBM 2002 August
You can use various ways to monitor the running user spatial program: You can run the debugger and single step to debug the program, add a print statement, or add a tool to analyze the program. This article describes several methods that can be used to debug programs running on Linux. We will review the four debugging issues, including paragraph errors, memory overflow, and leaks, and hang.
This article discusses four cases of debugging Linux programs. In the first case, we use two sample programs with memory allocation issues, using the MemWatch and Yet Another Malloc Debugger (YAMD) tools to debug them. In the second case, we use the Strace utility in Linux, which can track the system calls and signals to find out where the program is wrong. In the third case, we use the Linux kernel's OOPS function to resolve the program's paragraph error and show you how to set the kernel source Level Debugger (KGDB) to use the GNU debugger (GNU Debugger , GDB) to solve the same problem; the KGDB program is a Linux kernel remote GDB using a serial connection. In the fourth case, we use the magic key sequence provided on Linux to display information that triggers the components of the suspend problem.
Common debugging methods When you include an error in your program, it is likely to have a condition in your code, you think it is true (TRUE), but actually false. The process of finding the wrong way is that after finding the error, it has been confident that it is really a condition for a certain condition.
The following examples are some types of conditions you might be confident that the established conditions:
Somewhere in the source code, a variable has a specific value. At a given place, a structure has been properly set. For a given IF-THEN-ELSE statement, the IF part is the path being executed. When the subroutine is called, the routine correctly receives its parameters.
It is to identify whether all of the above conditions exist. If you are sure that a variable should have a specific value when the subroutine is called, then check if the situation is true. If you believe that the IF structure will be executed, then check if it is true. Usually, your hypothesis will be correct, but eventually you will find that the assumption does not match. As a result, you will find a place where there is a mistake.
Debugging is the task you can't escape. There are many ways to make commissioning, such as printing messages to the screen, using debuggers, or just considering the situation executed, and carefully tricks.
You must find out its source before fixing the problem. For example, for a paragraph error, you need to understand which line of the code occurs in the code. Once you have discovered the error in the code, make sure the value of the variable in this method, the method called by the method, and how the error will occur. Using the debugger will make the information that finds all this is very simple. If there is no debugger available, you can use other tools. (Note that the debugger may not be provided in the product environment, and the Linux kernel has no built-in debugger.)
Practical Memory and Kernel Tools You can use the debugging tool on Linux to track user space and kernel issues in a variety of ways. Use the tools and techniques below to build and debug your source code: User Space Tools: Memory Tools: MemWatch and Yamd Strace GNU Debugger (GDB) Magic Key Sequence Core Tools:
Nuclear source code grade debugger (KGDB) built-in kernel debugger (KDB) OOPS
This article will discuss a number of problems that are not easy to find through artificial inspection code, and such problems exist only in the case of rare. Memory errors typically appear in multiple situations, and you can only find memory errors after deploying programs.
In the first case: Memory Trying Tool C Language serves as a standard programming language on Linux system, which gives us great control of dynamic memory allocation. However, this freedom may lead to severe memory management issues, and these issues may cause the program to crash or degrade performance over time.
Memory leak (ie, there is a corresponding free () call execution after the corresponding free () call execution) and buffer overflow (such as writing a memory that previously assigned to a number of arrays) is some common problems, they may be difficult detected. This section will discuss several debugging tools that greatly simplify the process of detecting and finding memory issues.
MemWatchMemWatch is written by Johan Lindh is an open source C language memory error detection tool, you can download it yourself (see Resources). As long as you add a header file in your code and define the MemWatch in the GCC statement, you can track the memory leaks and errors in the program. MemWatch supports ANSI C, which provides results log records that detect double release (Double-free), error release, unfreed memory, overflow, and so on.
Listing 1. Memory sample (TEST1.C)
#include
#include
#include "memwatch.h"
Int main (void)
{
Char * ptr1;
Char * PTR2;
Ptr1 = malloc (512);
PTR2 = Malloc (512);
PTR2 = PTR1;
Free (PTR2);
Free (PTR1);
}
The code in Listing 1 will allocate two 512-byte memory blocks, and then point to the pointer to the first memory block is set to point to the second memory block. As a result, the address of the second memory block is lost, resulting in memory leakage.
Now we compile the list 1 MemWatch.c. Below is an Makefile example:
TEST1
gcc -dmemwatch -dmw_stdio test1.c membatch
C -O Test1
When you run the Test1 program, it generates a report about the leak. Listing 2 shows the sample memwatch.log output file.
Listing 2. Test1 MemWatch.log file
MemWatch 2.67 Copyright (C) 1992-1999 Johan Lindh
...
Double-free: <4> TEST1.C (15), 0x80517b4 WAS FREED from test1.c (14)
...
Unfreed: <2> Test1.c (11), 512 BYTES AT 0x80519E4
{FE Fe Fe Fe Fe Fe Fe Fe Fe Fe ............} Memory Usage Statistics (Global):
N) umber of allocations make: 2
L) ARGEST MEMORY USAGE: 1024
T) Otal of All Alloc () Calls: 1024
U) NFREED BYTES TOTALS: 512
MemWatch shows you a row that truly cause problems. If you release a already released pointer, it will tell you. The same is true for memory that is not released. The end of the log shows statistics, including how much memory has been leaked, how much memory is used, and how much memory is allocated.
YamdyAMD software package is written by Nate Eldredge, you can find problems related to memory allocation in C and C . When writing this article, YAMD's latest version is 0.32. Please download YAMD-0.32.tar.gz (see Resources). Execute the Make command to build a program; then execute the make install command installer and set the tool.
Once you have downloaded YAMD, use it on Test1.c. Please delete #include memwatch.h and make the following small modifications to makefile:
Use YAMD Test1
GCC -G Test1.c -o test1
Listing 3 shows the output of YAMD from Test1.
Listing 3. Output using YAMD TEST1
YAMD VERSION 0.32
Executable: /usr/src/test/yamd-0.32/test1
...
Info: Normal Allocation of this Block
Address 0x40025e00, Size 512
...
Info: Normal Allocation of this Block
Address 0x40028e00, Size 512
...
Info: Normal DEAllocation of this Block
Address 0x40025e00, Size 512
...
Error: Multiple Freeing At
Free of Pointer Already FREED
Address 0x40025e00, Size 512
...
Warning: Memory Leak
Address 0x40028e00, Size 512
Warning: Total Memory Leaks:
1 Unfreed Allocations Totaling 512 Bytes
*** Finished At Tue ... 10:07:15 2002
Allocated a grand total of 1024 bytes 2 Allocations
Average of 512 bytes per allocation
Max bytes Allocated At One Time: 1024
24K AlloCed INTERNALLY / 12 K Mapped now / 8k Max
Virtual Program Size IS 1416 K
End.
YAMD shows that we have released memory and there is a memory leak. Let's try YAMD on another sample program in Listing 4.
Listing 4. Memory code (TEST2.C)
#include
#include
Int main (void)
{
Char * ptr1;
Char * PTR2;
CHAR * CHPTR;
INT i = 1;
PTR1 = Malloc (512); PTR2 = Malloc (512);
CHPTR = (char *) Malloc (512);
For (i; i <= 512; i ) {
CHPTR [I] = 's';
}
PTR2 = PTR1;
Free (PTR2);
Free (PTR1);
Free (chptr);
}
You can start YAMD using the following command:
./run-yamd / usr / src / test / test2 / test2
Listing 5 shows the output obtained using YAMD on the sample program Test2. YAMD tells us that there is a "out-of-bounds" in the FOR cycle.
Listing 5. Output using YAMD Test2
Running / usr / src / test / test2 / test2
Temp Output to /tmp/yamd-Out.1243
**********
./run-yamd: line 101: 1248 Segmentation Fault (Core Dumped)
YAMD VERSION 0.32
Starting Run: / usr / src / test / test2 / test2
EXECUTABLE: / USR / SRC / TEST / TEST2 / TEST2
Virtual Program Size IS 1380 K
...
Info: Normal Allocation of this Block
Address 0x40025e00, Size 512
...
Info: Normal Allocation of this Block
Address 0x40028e00, Size 512
...
Info: Normal Allocation of this Block
Address 0x4002BE00, SIZE 512
Error: Crash
...
Tried to Write Address 0x4002C000
Seems to Be Part of this block:
Address 0x4002BE00, SIZE 512
...
Address In Question IS At Offset 512 (Out of Bounds)
Will Dump Core After Checking Heap.
DONE.
MemWatch and YAMD are very useful debugging tools, and their use is different. For MemWatch, you will need to add an included file MemWatch.h and open two compilation time tags. For the link statement, YAMD only needs the -g option.
Electric Fence Most Linux Distribution Edition contains an electric fence package, but you can also choose to download it. Electric Fence is a malloc () debug library written by Bruce Perens. It is assigned a protected memory after you assign memory. If there is a FENCEPOST error (exceeding the end of the array), the program will generate a protection error and end immediately. By binding to Electric Fence and GDB, you can accurately track which row to try to access protected memory. Another function of Electric Fence is to detect memory leaks.
The second case: Using the stracestrace command is a powerful tool that displays all system calls issued by the user spatial program. Strace Displays the parameters of these calls and returns the value of the symbolic form. Strace receives information from the kernel, and does not need to build the kernel in any special manner. It is useful to send tracking information to applications and kernel developers. In Listing 6, a format of the partition has an error, and the list shows the beginning of the Strace, and the content is about calling up the creation of file system operations (MKFS). STRACE determines which call leads to problems. Listing 6. Head of Strace on MKFS
Execve ("/ sbin / mkfs.jfs", ["mkfs.jfs", "-f", "/ dev / test1"], &
...
Open ("/ dev / test1", o_rdwr | o_largefile) = 4
STAT64 ("/ dev / test1", {st_mode = &, st_rdev = MakeDev (63, 255), ...}) = 0
IOCTL (4, 0x40041271, 0xBFFFE128) = -1 einval (Invalid Argument)
Write (2, "mkfs.jfs: warning - cannot setb" ..., 98mkfs.jfs: warning -
Cannot Set Blocksize on Block Device / dev / test1: invalid argument
= 98
STAT64 ("/ dev / test1", {st_mode = &, st_rdev = MakeDev (63, 255), ...}) = 0
Open ("/ dev / test1", o_rdonly | o_largefile) = 5
IOCTL (5, 0x80041272, 0xBFFFE124) = -1 einval (Invalid Argument)
Write (2, "mkfs.jfs: can / 't determine device" ..., ..._ exit (1)
=?
Listing 6 shows that IOCTL calls result in formatting the MKFS program that format the partition. IOCTL BLKGETSIZE64 failed. (BLKGET-SIZE64 is defined in the source code of IOCTL.) BLKGETSIZE64 IOCTL will be added to all devices in Linux, and here, the logical volume manager does not support it. Therefore, if the BLKGETSIZE64 IOCTL call failed, the MKFS code will be changed to call earlier IOCTL calls; this makes MKFS for the logical volume manager.
Article 3: Use GDB and OOPS you can use the GDB program (Free Software Foundation's debugger), you can use GDB from one of several graphics tools such as Data Display Debugger (DDD). The program is to find an error. You can use GDB to debug user spatial programs or Linux kernels. This section only discusses the situation of running GDB from the command line.
Start GDB using the GDB Program name command. GDB will load the executable program symbol and display the input prompt, allowing you to start using the debugger. You can view processes with GDB in three ways:
Use the attach command to start viewing a running process; Attach will stop the process. Use the Run command to execute the program and start the debug program from the header. Check the existing core file to determine the status of the process termination. To view the core file, start the GDB with the command below. GDB ProgramName CorefileName
To debug with a core file, you need not only the executable files and source files of the program, but also the core file itself. To start GDB with a core file, use the -c option:
GDB-C core ProgramName
GDB displays which row code leads to the core dump of the program.
List the source code you feel wrong before running the program or connected to the running program, set the breakpoint, then start the debug program. You can use the HELP command to view a comprehensive GDB online help and detailed tutorial.
The KGDBKGDB program (remote host Linux kernel modulator using GDB) provides a mechanism for debugging the Linux kernel using a GDB. The KGDB program is an extension of the kernel, which allows you to connect to the kernel machine that runs KGDB expansion when running GDB on the remote host. You can then go deep into the kernel, set breakpoints, check data, and other operations (similar to how you use GDB on your application). One of the main features of this patch is to run the GDB host to connect to the target machine during the boot process (the kernel to be debugged). This allows you to start debugging as soon as possible. Note that the patch adds a function to the Linux kernel, so GDB can be used to debug the Linux kernel.
Two machines are required to use KGDB: one is a development machine and the other is a test machine. A serial line (air conditioning demodulator cable) will connect them through the serial port of the machine. The kernel you want to debug is running on the test machine; GDB runs on the development machine. GDB communicates with the serial line with the kernel you want to debug.
Please follow the steps below to set the KGDB debugging environment:
Download your patch for your Linux kernel version. Build a component to the kernel because it is the easiest way to use KGDB. (Note that there are two ways to build most kernel components, such as as a module or directly to the kernel. For example, the Log Record File System, JFS can be built as a module, or directly to the kernel By using the GDB patch, we can build JFS directly into the kernel.) Apply the kernel patches and re-build the kernel. Create a file called .Gdbinit and save it in the kernel source file subdirectory (in other words, / usr / src / linux). Document. There is a four row code in Gdbinit:
SEMBOL-FILE VMLINUX TARGET Remote / dev / TTYS0 SET OUTPUT-RADIX 16 Add Append = GDB lines to LILO, which is a boot loader that is used to select which kernel for use when booting the kernel.
Image = / boot / bzimage-2.4.17 label = gdb2417 read-only root = / dev / sda8 append = "gdb gdbttys = 1 GDB-BAUD = 115200 nmi_watchdog = 0"
Listing 7 is an example of a script that introduces the kernel and modules you built on the development machine into the test machine. You need to modify the following:
Best @ SFB: User ID and Machine Name. /usr/src/linux-2.4.17: The directory of the kernel source code tree. BZIMAGE-2.4.17: Test the name of the kernel will be guided on the machine. RCP and RSYNC: It must be allowed to run on the machine to build the kernel. Listing 7. The script of the core and module introduced to the test machine
Set -X
RCP Best @ sfb: /usr/src/linux-2.4.17/Arch/i386/boot/bzimage /boot/bzimage-2.4.17
RCP Best @ sfb: /usr/src/linux-2.4.17/system.map /boot/system.map-2.4.17
Rm -rf /lib/modules/2.4.17
Rsync -a best @ sfb: /lib/modules/2.4.17 / lib / modules
Chown -r root /lib/modules/2.4.17
LILO
Now we can launch the GDB program on the development machine by rewinding the directory that starts by the kernel source codes. In this example, the kernel source code tree is located in /usr/src/linux-2.4.17. Enter the GDB launch program.
If everything is normal, the test machine will stop during startup. Enter the GDB command CONT to continue the startup process. A common problem is that air conditioning demodulator cables may be connected to the wrong serial port. If the GDB does not start, change the port to the second serial port, which will activate the GDB.
Using the KGDB Debug Core Problem Listing 8 lists the modified code in the source code of the JFS_Mount.c file, we created an empty pointer exception in the code, so that the code is generated in the 109th line.
Listing 8. Modified JFS_Mount.c code
INT JFS_MOUNT (Struct Super_Block * SB)
{
...
Int ptr; / * line 1 added * /
Jfyi (1, ("/ nmount jfs / n")));
/ *
* Read / Validate Superblock
* (Initialize Mount Inode from the superblock)
* /
IF ((rc = chksuper (sb)))) {
Goto errout20;
}
108 PTR = 0; / * line 2 added * /
109 Printk ("% D / N", * PTR); / * line 3 added * /
Listing 9 Displays a GDB exception after issuing a mount command to the file system. KGDB provides several commands, such as displaying data structures and variable values and all tasks in the display system, where they stay, where are they use CPUs and so on. Listing 9 will display the information provided by the backtrack to this issue; where the where command is used to perform reverse tracking, which will tell the executed call to stop in the code.
Listing 9. GDB exception and anti-tracking
Mount -T JFS / DEV / SDB / JFS
Program received Signal SigSegv, Segmentation Fault.
JFS_MOUNT (SB = 0xf78a3800) AT JFS_MOUNT.C: 109
109 Printk ("% D / N", * PTR);
(GDB) WHERE
# 0 JFS_MOUNT (SB = 0xf78a3800) AT JFS_MOUNT.C: 109
# 1 0xc01a0dbb in jfs_read_super ... at super.c: 280
# 2 0xc0149ff5 in get_sb_bdev ... at super.c: 620
# 3 0xc014a89f in do_kern_mount ... at super.c: 849 # 4 0xc0160e66 in do_add_mount ... at namespace.c: 569
# 5 0xc01610f4 in do_mount ... at namespace.c: 683
# 6 0xc01611ea in sys_mount ... at namespace.c: 716
# 7 0xc01074a7 in system_call () at AT AF_PACKET.C: 1891
# 8 0x0 in ?? ()
(GDB)
The next section will also discuss this same JFS paragraph error problem, but do not set the debugger, if you perform the code in Listing 8 in a non-KGDB kernel environment, it uses the OOPS message that the kernel may generate.
OOPS Analysis OOPS (also known as PANIC, panic) message contains details of system errors, such as the content of the CPU register. In Linux, the traditional way to debug system crash is to analyze the OOPS message sent to the system console when crashing. Once you have a detail, you can send messages to the Ksymoops utility, which will try to convert the code into instructions and map the stack value to the kernel symbol. In many cases, this information is enough for you to determine what is the possible cause of the error. Note that the OOPS message does not include the core file.
Let us assume that the system has just created an OOPS message. As a written code, you want to solve the problem and determine what has caused the OOPS message, or you want to provide most of the information about your problem to the developer showing the OOPS message, thereby solving the problem in time. The OOPS message is part of the equation, but if it does not run through the ksymoops program, it is not enough. The following figure shows the process of formatting the OOPS message.
Format OOPS message
Ksymoops requires several contents: OOPS messages, from the system.map file from the running kernel, and / proc / ksyms, vmlinux, and / proc / modules. About how to use Ksymoops, kernel source code /usR/src/linux/documentation/oops-tracing.txt or in the KSYMOOPS man page can be referred to. The ksymoops disassembled code section indicates an error, and displays a tracking section indicates how the code is called.
First, save the OOPS message in a file to run it through the Ksymoops utility. Listing 10 shows an OOPS message created by the mount command to install the JFS file system, which is generated by the three lines of code added to the JFS installation code in Listing 8.
Listing 10. OOPS messages after Ksymoops processed
Ksymoops 2.4.0 on i686 2.4.17. Options used
... 15:59:37 sfb1 kernel: unable to handle kernel null pointer Dereference At
Virtual Address 0000000
... 15:59:37 sfb1 kernel: c01588fc
... 15:59:37 sfb1 kernel: * PDE = 00000000
... 15:59:37 sfb1 kernel: OOps: 0000
... 15:59:37 sfb1 kernel: CPU: 0
... 15:59:37 sfb1 kernel: EIP: 0010: [JFS_MOUNT 60/704]
... 15:59:37 sfb1 kernel: Call TRACE: [JFS_Read_super 287/688]
[GET_SB_BDEV 563/736] [Do_kern_mount 189/336] [Do_ADD_MOUNT 35/208] [DO_PAGE_FAULT 0/1264]
... 15:59:37 sfb1 kernel: Call TRACE: [
... 15:59:37 sfb1 kernel: [ ... 15:59:37 sfb1 kernel: Code: 8b 2d 00 00 00 55 ... >> EIP; C01588FC ... Trace; c0106cf3 Code; c01588fc 00000000 <_eip>: CODE; C01588FC 0: 8B 2D 00 00 00 MOV 0x0,% EBP <===== CODE; C0158902 6: 55 push% EBP Next, you have to determine which line of code in JFS_MOUNT has caused this problem. The OOPS message tells us that the question is caused by an instruction located at the offset address 3c. One of the ways to do this is to use the Objdump utility for JFS_Mount.o files, then view offset addresses 3c. Objdump is used to disassemble the module function to see what compilation instructions you have in your C source code. Listing 11 shows the content you will see after using Objdump, then we check the JFS_MOUNT C code, you can see that the null value is caused by line 109. The offset address 3c is important because the OOPS message identifies the location of the problem. Listing 11. Assembler list of JFS_MOUNT 109 Printk ("% D / N", * PTR); Objdump jfs_mount.o JFS_Mount.o: File Format ELF32-I386 Disassembly of section .text: 00000000 0:55 push% EBP ... 2C: E8 CF 03 00 00 Call 400 31: 89 C3 MOV% EAX,% EBX 33: 58 POP% EAX 34: 85 DB Test% EBX,% EBX 36: 0F 85 55 02 00 00 JNE 291 3C: 8B 2D 00 00 00 MOV 0x0,% EBP << Problem Line Above 42: 55 PUSH% EBP The Kdblinux kernel, KDB) is a patch of Linux kernels, which provides a way to check the kernel memory and data structures when the system is running. Note that KDB does not require two machines, but it does not allow you to debug the source code level like KGDB. You can add additional commands to give the identity or address of the data structure, which can format and display basic system data structures. The current command set allows you to control kernel operations, including the following operations: Stop When the processor is single-step, stop when accessing (or modify) a particular virtual memory location when accessing (or modify) Stop the current active task and all other tasks when the registers in the address space and all other tasks (through the process ID) reverse the instructions Chasing memory overflow You definitely do not want to fall into a situation in which it has allocated overflow after thousands of calls. Our group spent more time to track the wrong memory error problem. Applications can run on our development workstation, but on new product workstations, this application cannot be run after calling malloc (). The real problem is to overflow after approximately one million calls. This problem exists in the new system because the layout of the preserved Malloc () area is different, so that these scattered memory is placed in different places, and some different content is destroyed when overflows. We use a variety of different techniques to solve this problem, one of which is the use of the debugger, the other is to add a tracking function in the source code. In my career, I am also at this time, I have begun to pay attention to the memory debug tool, I hope to solve these types of problems faster more effectively. When starting a new project, one of my first things is to run MemWatch and YAMD, see if they will indicate a problem with memory management. Memory Leaks are common problems in the application, but you can use the tools described in this article to solve these problems. The fourth case: Using the Magic Key Order Sequence Retrospection If your keyboard is still available when Linux hangs, then you can use the following method to help solve the root of the problem. Follow these steps, you can display the currently running process and all the backtracks of the process using the magic keyword. The kernel you are running must be built in the case where config_magic_sys-req is enabled. You must also be in text mode. CLTR Alt F1 will enable you into text mode, CLTR Alt F7 will return you back to X Windows. When in text mode, press Endy language Help debugging procedures on Linux have many different tools available. The tools described in this article can help you solve many encoding problems. Tools that can display memory leaks, overflow, etc. can solve memory management issues, I found that MemWatch and YAMD are very helpful. Using the Linux kernel patch will make GDB work on Linux kernels, which is helpful to solve the problem of file system for Linux used in my work. In addition, tracking utilities can help determine where the file system utility is faulty during the system call. Try some of these tools when you want to flatten your errors in Linux. Reference Download MemWatch. Download YAMD. Download electricfence. Check the Dynamic Probes debugging function. Please read the article "Linux Software Debugging with GDB". (DeveloperWorks, Feb 2001) Please visit the IBM Linux Technology Center. More Linux articles can be found in the developerWorks Linux zone. About the author Steve Best works in IBM Linux Technology Center in Austin, Texas. Currently, he is working in the Log Record File System (JFS) of Linux projects. Steve has rich experience in operating systems, and his focus is the file system, internationalization and security.