Cflags Xiang

xiaoxiao2021-03-06 52

CFLAGS is one of the key to determining the efficiency and stability of Gentoo system. The appropriate CFLAGS can be balanced in performance, compile time, and failed CFLAGS, may result in compilation failure, and even system damage. So, how can I get the need for the needle in the CFLAGS sea?

The CFLAGS of this file is mainly gcc 3.4 (GNU Compiler Collections - http://gcc.gnu.org/) on the X86, if you use other compilers (such as ICC, Compaq C Compiler or others) Platform (such as PowerPC, Alpha), this chapter may not be able to use more than 50% of things.

Please refer to the author from the network, information about the needs of the server and workstation. Of course, the need for the server or desktop is absolutely not only these, and only items related to design CFLAGS are listed here. The following is a list of sorted out:

1. Server system:

o Launched for a long time (24 hours a day, 365 days a year, no breaks all year round)

o is very stable (uptime is at 99.999% [Note] or more)

o High security (don't doubt, CFLAGS has a big relationship with safety)

o Under the premise of launching, you can take care of yourself.

o Value is not the first consideration

o Interactive reaction is not very fast, enough.

2. Desktop, workstation:

o Start time is not as long (when the user is going to use)

o Can not be so stable (mostty of the user directly in processing, uptime can drop 99.99% or less)

o Efficacy is also considered

o Express quickly (such as loading a page, which makes him display it in three seconds, it is better to make it displayed at four seconds in four seconds.)

So, get the CFLAGS design of the desktop system:

1. Program startup time

2. Fast reaction

3. High performance

4. Stability can be slightly poor (allowed within the range)

Reduce the size of the execution file, you can reduce the amount of memory, saving some disk space. At the same time, the maximum performance bottleneck of the desktop system is in the disk drive, and the reduction of the file size is also indirectly reduced the number of access to the disk, and can accelerate the startup of the program, enhance the reaction speed of the first execution.

[edit]

CFLAGS option

Let's take a look at several options in GCC. Here is only the features of these options, please refer to the Man GCC for a detailed description.

Increase the number of options while also increasing the time compiled. All options increase compilation time, with special instructions "will increase compilation time" option, indicating that "a large number of" compile time (more options are more than other options ...).

[edit]

Safe option

The first is a safe option:

· Instruction

o Parameter and usage [Note]

o suggestion

· -O

O -O (-O1), -O0, -O2, -O3, -OS

o According to the size of the following numbers, the degree of optimization is also different (stability may be decremented). Where-3 is a relatively special level, optimized for the original code size.

o Use -OS to reduce the time of the program load.

· -FFORCE-MEM-Fforce-Addr

O -FFORCE-MEM, -FNO-force-mem-fforce-addr, -fno-force-addr

o Force the value (MEM) or Memory Location (AddR) in the memory to the buffer before calculation. Start these two options can make better program code.

o These are good things, start them! Where -FFORCE-MEM is started in -O2, -O3, -OS, so if you use these three options, as long as the -fforce-addr is enough. · -FOMIT-FRAME-POINTER

o -FOMIT-FRAME-POINTER, -FNO-OMIT-FRAME-POINTER

o If it is not necessary, the Frame Pointer is not placed in the buffer. This will avoid your program storage, setting, and restore Frame Pointer; also province next to a plurality of functions. This option may make the defective work on some platforms becomes unlike! . If the platform support does not use Frame Pointer to start with the error, this option will be started in -O, -O2, -O3, -OS.

O Sorry, X86 is just right and wrong with this unable to get the wrong platform. But ... do you want to make your desktop to make your desktop? If the answer is not, you can rest assured that this option is started.

· -Finline-functions

o -finline-functions, -fno-inline-functions

o Integrates all simple functions into their functions. The compiler will automatically try and determine those functions worth consistency. Started at -O3.

o Although this option increases the size of the program, he is a good thing to enhance efficacy. I suggest you start it here and use the following instruction to specify the Inline condition.

· -Finline-limited

o -finline-limited = n

O N is the length of the dummy command that determines whether the letter can be by the inline. The preset value is 600.

o This value is smaller, the faster the program starts, the slower the speed of operation. As a desktop, I recommend -finline-Limit = 400.

· -Fmove-all-movables-freduce-all-givs

O -FMOVE-All-Movables, -fno-move-all-moveables-freduce-all-givs, -fno-redduse-all-givs

o These are cyclic optimization techniques, and the operation of unrelated loop content is executed outside the cycle. Compiled executives may be faster or slower, and the result is a big relationship with the procedure.

O Although the performance is related to the procedure, most of the two options will make a relatively small and fast program code, so I suggest you start them!

· -Freorder-block-freorder-functions

O -FREORDER-blocks, -fno-reorder-block-freorder-functions, -fno-repyr

o Advance efficiency and reduce the size of the execution file by reorganizing the program block.

o These are also good things, so I suggest you start them. The disadvantage is that the compilation time will become longer.

O -FREORDER-blocks starts when -O2, -O3, and close it in -OS. -freorder-functions starts when -O2, -O3, -OS.

· -Fexpensive-optimizations

O -FEXPENSIVE-OPTIMIZATIONS, -FNO-EXPENSIVE-OPTIZATIONS

o Execute a few non-primary optimization procedures that will extend compilation time. Preset in -O2, -O3, and -OS.

o Although the compilation time will increase, it can reduce the performance size, so it is recommended to enable.

· -Falign-functions-falign-labels-falign-loops-falign-jumpso -falign-functions, -falign-functions = n-falign-labels = n-falign-loops, -falign-loops = n -falign-jumps, -falign-jumps = n

o Subtock up to the beginning of the N-bytes in accordance with the minimum 2 bytes of larger than N.

o I know this is very abstract, explaining a lot of space, so please use the default value, namely -falign-functions, -falign-loops, -falign-jumps, but not specified = n. Started at -O2 and -O3. Close in -OS.

· -Frename-registers

o -FRename-registers, -fno-rename-registers

o Use the remaining buffer after being positioned by the buffer. This optimization is most obvious on the CPU with many buffers (such as ARM, PowerPC ... et al. X86 is not part of them). It will increase the difficulty of degne.

o Although it is not obvious on x86, it is still useful. And X86-64 provides more buffers, so it is recommended that you still open it. Started at -O3.

· -Fweb

O -FWEB, -FNO-WEB

o Establish a frequently used buffer network. Provide better buffer usage. However, it will also increase the difficulty of degne.

o This is the option to be biased toward the experimental nature in the security option, although you recommend that you start, but if the program is unstable, please turn it off. Started at -O3.

Experimental nature option

The following is a true experimental thing, if the system is unstable after startup, please turn off them.

· Instruction

o Parameter and usage

o suggestion

· -Ffast-math

o -ffast-math, -fno-fast-math

o Set -fno-math-errno, -funsafe-math-optimizations, -fno-trapping-math, -fnite-math-only, -fno-runking-math, and -fno-signaling-nans, And set the __fast_math__macrower of the pre-processor. Although these technologies are faster, it violates the rules of IEEE or ISO, and it is likely to make the program calculate the value of the error.

o This is a dangerous thing, it may cause an error of the calculation results (1.1 1.2 = 1.4 !? Of course, not so free, ...), it is recommended not to use.

· -FUnit-AT-A-TIME

O -FUNIT-AT-A-TIME, -FNO-Unit-AT-A-TIME

o Analyze the entire compilation unit before making execution files. Provide some opportunities for some additional optimization, but use more memory.

o This thing is quite safe, please use it with confidence!

· -Funroll-loops-fold-unroll-loops. - FOLD-UNROLL-LOOPS

O -Funroll-loops, -fno-unroll-loops-fold-uroll-loops, -fno-ild-unroll-loops

o Expands a loop that can be determined at the compile stage, which may make the program execute faster or slower. -fold-unroll-loops use the old algorithm. o Because this will make the program make a lot, it is recommended not to use.

· -Funroll-all-loops-fold-unroll-all-loops

O-Funroll-all-loops, -fno-unroll-all-loops-fold-unroll-all-loops, -fno-ild-unroll-all-loops

o Even if the number of cyclic execution is uncertain, all loops are expanded. Most conditions will make the program run slower.

o is more slow and bigger, so don't use ...

· -Fprefetch-loop-arrays

o -fprefetch-loop-arrays, -fno-prefetch-loop-arrays

o If the target machine supports, load the array to memory before accessing a large array cycle execution. Close in -OS.

o In fact, there is no need to access a large number of cycles (multimedia, database, scientific computing software), so you can rest assured that this option is turned off.

· -Ffunction-sections-fdata-sections

o -ffunction-sections, -fno-function-sections-fdata-sections, -fno-data-sections

o Place a function or data item into your own section. Most of the SPARC systems that use ELF target formats and SPARC systems that perform Solaris 2 support these optimizations. The time of the linking process will increase, and the size of the execution file is increased, and the difficulty of increment is also increased.

o In my experience, there is no particularly significant effect, and the execution file will become large, so it is recommended not to use.

· -Fbranch-target-load-optimize-fbranch-target-loading-optimize2

o -fbranch-target-load-optimize, -fno-branch-target-load-optimize-fbranch-target-loading-optimize2, -fno-branch-target-load-optimize2

o Perform branch programming buffer load optimization in the execution order start and the end.

o This is the advanced optimization option, it is recommended that you start.

Platform-related options

Finally, X86 is related to X86-64:

· Instruction

o Parameter and usage

o suggestion

· -Mcpu (expired, may be removed in later versions.) - MTune-March

o -MCPU = CPU-type-march = cpu-type-mtune = CPU-TYPE

o Perform optimization in accordance with different target processors. Note that if you specify -March, the created executive will not be used on other CPUs. However, if the light specifies -mtune, GCC will avoid using the platform proprietary instruction set and the private schedule option. Available options are:

§ i386: Standard Intel i386 processor

§ i486: Intel's i486 processor (no schedule is active)

§ I586, Pentium: Intel Pentium processor without MMX instruction set

§ Pentium-mmx: Intel PentiumMMX processor supporting MMX instruction set

§ I686, PentiumPro: Intel PentiumPro processor

§ Pentium2: Intel Pentium2 processor supporting MMX instruction set § Pentium3: Intel Pentium3 processor supporting MMX with SSE instruction set

§ Pentium4: supports the Intel Pentium4 processor for MMX, SSE and SSE2 instruction set

§ K6: AMD K6 processor supporting MMX instruction set

§ K6-2, K6-3: Support MMX and 3DNOW! The advanced version of the instruction set AMD K6 processor

§ Athlon, Athlon-Tbird: Support MMX, 3DNOW !, AMD Athlon processor in advance loading instruction set in advance

§ Athlon-4, Athlon-XP, Athlon-MP: Support MMX, 3DNOW !, AMD Athlon processor for full SSE instruction set

§ K8, Opteron, Athlon64, Athlon-fx: Support X86-64 instruction set, based on AMD K8 core-based processor. (Support MMX, SSE, SSE2, 3DNOW!, Enhanced version 3DNOW! And 64-bit instructions.)

§ Winchip-C6: IDT Winchip C6 processor to support I486 processing of the MMX instruction set.

§ Winchip2: IDT Winchip2 processor to support I486 processing of MMX and 3DNOW! Instruction sets.

§ C3: Support MMX and 3DNOW! The VIA C3 processor of the instruction set (no schedule is active)

§ C3-2: Supports VIA C3 processors for MMX and SSE instructions (no scheduled work)

o Specify -March and -mtune in accordance with your system, such as "-March = Athlon-XP -MTune = Athlon-XP" is used by my system.

· -Mfpmath

o -Mfpmath = unit

o Produces execution files in accordance with the unit selected. The units available:

§ 387: Use standard 387 floating point arithmetic processing units for i386 default.

§ SSE: Use the floating point calculation processing instruction provided by the SSE instruction set to enable it through -msse or -mse2. On the X86-64 platform, this option is enabled.

§ SSE, 387: Using standard 387 floating point arithmetic processing units and SSE instruction sets, this will provide nearly double additional buffers, as well as possible floating point operation processing efficiency. Carefully use this option because this technology is still experimental nature, which may cause the system to install.

o If you use a processor that supports SSE, you can try to use a relatively fast-Mfpmath = SSE, 387. If you are more timid (afraid of a machine), -mfpmath = sse can be used.

· -MMMX-MSSE-MSSE2-MSSE3-M3DNOW

O-mmmx,-mno-mmx-msse, -Mno-sse-msse2, -MNO-SSE2-MSSE3, -MNO-SSE3-M3DNOW, -MNO-3DNOW

o Start or turn off instruction set support.

o Please start or close these instruction sets in accordance with your system. The instruction set supported by the processor can be queried through CAT / Proc / CPUInfo.

· -Maccumulate-outgoing-args

o -Maccumulate-outgoing-args, -Mno-account-outgoing-args

o Calculate the space required to output the output parameter when the function starts. On most modern processors, the stack usage can be reduced due to reducing dependence, enhancing the schedule, and when the stack boundary is not equal to 2, this option can increase some effectiveness. A disadvantage is that the execution file will become large.

o Because the execution file will become large, it is not recommended.

-Malign-stringops

o-malign-stringops, -Mno-align-stringops

o Decides whether to align the target string operand that integrates the original code. This reduces the size of the execution file, and adds some performance when the target string operand has aligned.

o Two-side blade, so please decide yourself ...

· -Minline-all-stringops

o -Minline-all-stringops, -Mno-inline-all-stringops

o Preset GCC will only know the destination will be aligned in a string operation of the 4bytes boundary. This option launches more integration, increasing the size of the execution, but may add programs that require fast Memcpy, Strlen, and MemSet.

o Because it increases the size of the program code, but not so many programs use Memcpy / Strlen / MemSet, it is recommended not to use.

Other options

Other CFLAGS-independent CFLAGS:

· -Pipe

o -PIPE

o Use the pipeline when communication between the program, not a temporary storage disk. The GNU group is supported by this option.

o can shorten some compile time and recommend it.

Suggested cflags

Depending on the purpose, the CFLAGS of each system is not the same. The suggested CFLAGs are listed below. The is changed to your CPU, as well as the options inside in accordance with the system support:

· The safest, nothing, nothing, there is no CFLAGS. Before you return the error (SIG 11, a machine, operation error), please use this cflags to compile once. If you still happen, you will go to Gentoo Bugzilla (http://bugs.gentoo.org/) Report:

o -March = i686 -mtune = i686 -O2 -PIPE

· Slightly faster CFLAGS (recommended those who are stably asked):

o -March = -mtune = -o2 -pipe -fomit-frame-pointer

· Pay attention to the CFLAGS used in the performance period (it is recommended to use the software to use quickly):

o -March = -mtune = -mfpmath = SSE, 387 [-MMMX -MSSE -MSSE2 -MSSE3 -M3DNOW] -MINLINE-All-Stringops -Pipe -O3 -Fomit-frame- pointer -fforce-addr -finline-functions -finline-limit = 800 -fmove-all-movables -freduce-all-givs -freorder-blocks -freorder-functions -fexpensive-optimizations -falign-functions -falign-labels -falign- loops -falign-jumps -frename-registers -fweb -funit-at-a-time -funroll-loops -fprefetch-loop-arrays -ffunction-sections -fdata-sections -fbranch-target-load-optimize -fbranch-target- Load-Optimize2 · Focus on file size and loading speed CFLAGS (suggesting that the program is often started):

o -March = -mtune = -mfpmath = SSE, 387 [-MMMX -MSSE -MSSE2 -MSSE3 -M3DNOW] -MACCUMULATE-OTGOING-ARGS-MALIGN-STRINGOPS -PIPE -OM - fomit-frame-pointer -fforce-addr -finline-functions -finline-limit = 400 -fmove-all-movables -freduce-all-givs -freorder-blocks -freorder-functions -fexpensive-optimizations -frename-registers -fweb - Funit-at-a-time -fbranch-target-load-optimize -fbranch-target-load-optimize2

· Commercial CFLAGS (from the loading speed with CFLAGS that can be compromised, most people may want to use this):

o -March = -mtune = -mfpmath = SSE, 387 [-MMMX -MSSE -MSSE2 -MSSE3 -M3DNOW] -PIPE -OS -FOMIT-FRAME-POINTER -FFORCE-AddR - finline-functions -finline-limit = 400 -fmove-all-movables -freduce-all-givs -freorder-blocks -freorder-functions -fexpensive-optimizations -falign-functions -falign-labels -falign-loops -falign-jumps - FRENAME-registers -fweb -funit-at-a-time -fbranch-target-load-optimize -fbranch-target-loading-optimize2

Please note that the cflags listed here is not the best, and never have the best cflags. Understand the intention and meaning of each option, the combination is the best point for your CFLAGS is the point of this chapter, and the cflags provided are just used as reference. If you have any advice on the cflags listed here, Welcome to the GOT discussion area (http://forums.gentoo.org.tw/) and everyone discussion :) Note: You may feel that the uPtime of 99.999 is very high, but actually This is the jumping number of the server system uPtime. Let us calculate, 1 day = 86400 second, 1 year = 365 days, so 1 year = 31536000 SECS, after using 99.999% UPTIME, Downtime is 315.36 seconds. That is about five minutes. Considering that the server is relatively slow (there are many system testing and program starting), this data is about 1 to 2 times a year. 99.99% of the desktop system is ten times that of this number, while the desktop system is turned on quickly and after finishing.

Note: If you add `NO 'in front of the parameters, it indicates that the parameter is turned off. Such as -fforce-mem opens Force-MEM, -fno-force-mem offs Force-MEM.

转载请注明原文地址:https://www.9cbs.com/read-75770.html

9cbs

New Post(0)