Comparison of stacks

xiaoxiao2021-03-19  208

/// q Let a class can only be allocated on the stack and cannot be assigned on the heap.

Class foo {

Private:

Void * operator new (size_t); // Statement to private should be a call to NEW.

}

int main ()

{

Foo F;

Foo * p = new foo;

Return 0;

}

/ r makes it impossible to compile the object of the object to allocate the class on the stack

/ RR

IMO, Generally It Is Impossible to do this withoutsupport from the OS or compilation environment, Sinceit Is Difficult To Determine WHether The Object Ison The Heap OR Not ...

For Some Very Special System, this Is Possible Sinceyou CAN Control The Heap Address / Length for Some Embedded System, But We Still Do Not Have a General Solution ...

/ RRR

Net over the foo * p = new foo;

/// q Let a class can only be assigned on the heap and cannot be assigned on the stack.

Let the class only allocate on the heap or

Destructure function protected.

Add a Public member function destroy () {delete this;}

This a * a = new a;

When destructured, A-> destroy ();

/ r

The destructor is that the default must be called

You do this, the Programmer is called DESTROY, the system still calls the destructor

So, I don't think it is impossible to let users allocate on the stack ...

/ RR

The destructor has been called in DESTROY and releases memory.

No need to adjust, you write a program to try

/// SITG

1. Use the set_new_handle function to handle problems with insufficient allocation of memory.

2. Allocate a large piece and then in the already allocated block.

3. Have n identical objects. It can be handled like this,

Directly allocate new [4000], use the linked list to store these objects, use it, take it from the linked list, Table that the node identifies is being used, if not, the direct identification is not used, which reduces allocation and release of memory consumption.

Thank you SITG, so detailed. It should be good to study hard, I don't understand these big cows. After leaving Tsinghua, there is no chance ... more learning more questions.

Stack basic knowledge

Heap and Stack (stack) is C / C programming inevitably two basic concepts that will encounter. First of all, these two concepts

They can be found in the books of the data structure, they are all basic data structures, although the stack is simpler.

In the specific C / C programming framework, these two concepts are not parallel. Research on the underlying machine code can reveal that the stack is the data structure provided by the machine system, and the stack is provided by the C / C function library.

Specifically, modern computers (serial execution mechanisms) are directly supported by the stack in the code underlayer. This is reflected in that there is a special register point to the address where the stack is located, and there is a dedicated machine instruction to complete the operation of the stack out of the stack.

This mechanism is characterized by high efficiency, limited data, generally integers, pointers, floating point numbers and other systems

The type of data is held, and other data structures are not directly supported.

Because this feature of the stack, the use of the stack is very frequent in the program. The call to the subroutine is done directly by the stack. The machine's CALL instruction implies to push the return address into the stack, then jump into the operation of the subroutine address, and the RET instruction in the subroutine is implied from the operation that ejaches the address and jumps from the stack. Automatic variables in C / C are examples of direct use stacks, which is why the function is automatically invalid when the function returns. Unlike the stack, the data structure of the heap is not supported by the system (whether it is a machine system or operating system), but is provided by the function library. The basic Malloc / Realloc / Free function maintained a set of internal heap data structures. When the program uses these functions to get a new memory space, this set of functions first try to find available memory spaces from the internal stack. If there is no memory space available, try to utilize system calls to dynamically add the memory size of the program data segment. The newly assigned space is first organized into the internal heap, and then returns to the caller in an appropriate form. When the program releases the assigned memory space, this memory space is returned in the internal heap structure, which may be properly processed (such as incorporating other idle spaces into larger idle space) to make the next memory allocation application. This complex allocation mechanism is actually equivalent to a memory allocated buffer pool (Cache), using this mechanism has the following reasons:

1. System calls may not support any size memory allocation. Some systematic system calls only support fixed size and their multiple memory requests (by page assignment); this kind of waste will cause waste to a large number of small memory classifications.

2. System call requesting memory may be expensive. System calls may involve transformation of user-state and core states.

3. Memory allocation without management is easy to cause memory fragmentation under the distribution of complex memory.

Pile and stack comparison

From the above knowledge, the stack is the functionality provided by the system. It is characterized by fast and efficient, and the disadvantages are limited. The data is not flexible; and the stack is the function of the function library, which is flexible and convenient. The data adaptation is wide, but the efficiency has a certain decrease. . The stack is the system data structure, and the process / thread is unique; the stack is the internal data structure in the library, not necessarily unique. The memory allocated by different stacks cannot operate with each other.

Stack space is static allocation and dynamic allocation. Static allocation is the completion of the compiler, such as auto variables (AUTO). Dynamic allocation is done by the malloc function. The dynamic assignment of the stack does not need to be released (it is automatically), and there is no release function. For the portable procedures, the dynamic allocation operation of the stack is not encouraged! The allocation of the stack is always dynamic. Although all data spaces are released back to the system at the end of the program, the exact application of memory / release memory is a basic element of a good program.

Use stack

1 heap of implementation

Traditionally, the operating system and runtime library are coexisting with the reactors. In the beginning of a process, the operating system creates a default pile called "Process Heap". If there is no other stack available, the block allocation uses "Process Heap".

Separate a heap can be created in the process while the language is running. (For example, Create its own heap when C is running.) In addition to these dedicated stacks, applications or many loaded dynamic link library (DLLs) can create and use separate stacks. Win32 provides a complete set of APIs to create and use private piles. For detailed guidance on the heap function (English), see MSDN.

These stacks are in the process space when the application or DLL creates a private bunch and is accessible within the process. The data allocated from the given pile will be released on the same heap. (You cannot assign from a heap to another pile.)

In all virtual memory systems, the pile resides at the top of the "Virtual Memory Manager" of the operating system. Language is running in the top of the virtual memory. In some cases, these stacks are layers in the operating system stack, while the language is running through large block allocation to perform their own memory management. Do not use the operating system heap, and use the virtual memory function more beneficial to the allocation and block of the block. A typical heap implementation consists of the front and back-end allocation procedures. The front-end allocation program maintains an idle list of fixed larger blocks. For one distribution call, the heap attempts to find a free block from the front list. If fails, the heap is forced to assign a large block from the backend (reserved and submit virtual memory) to meet the request. The general implementation has overhead of each assignment, which will cost the execution cycle, and also reduce the available storage space.

Knowledge Base Article Q10758, "Use Calloc () and Malloc () Manage Memory (Search) (Search), contain more background knowledge about these topics. In addition, detailed discussion on heap implementation and design can also be found in the following works: "Dynamic Storage Allocation: asurvey and critical review", author Paul R. Wilson, Mark S. Johnstone, Michaelnely, and David Boles; "International Workshop On Memory ", Author Kinross, Scotland, UK, September 1995 (http://www.cs.utexas.edu/users/oops/papers.html) (English).

Windows NT implementation (Windows NT version 4.0 and updated version) uses 127 sizes from 8 to 1,024 bytes of 8-byte alignment block idle lists and a "big block" list. "Big Block" list (idle list [0]) saves a block greater than 1,024 bytes. The idle list contains an object that is linked with a two-way lin list. By default, "Process Heap" performs collection operations. (Collection is to combine adjacent empty blocks into a large block of operation.) Collecting an extra cycle, but reduces the internal fragmentation of the block.

Single full-local lock protection, prevent multi-line usage. (See "Server Performance and Scalability Killers", George Reilly, on "MSDN Online Web Workshop" (site: http://msdn.microsoft.com/workshop/server/iis/ Tencom.asp (English).) Single global lock is essentially used to protect the stack data structure to prevent random access across a multi-thread. If the heap operation is too frequent, a single global lock will have an adverse effect on performance.

2 What is a common heap performance problem?

Here are the most common questions you have encountered when you use the pile:

The speed caused by the distribution operation slows down.

Light distribution takes a long time. Most likely to slow running speed is that the idle list has no block, so the runtime allocation program code will consume a larger idle block, or allocate new blocks from the backend allocation program.

The speed caused by the release operation slows down.

The release operation is consumed more cycle, mainly to enable collection operations. During the collection,

Each release operation "Find" its adjacent block, removes them and confeshes a larger block, and then insert this larger block into the idle list. During the lookup, the memory may randomly, causing the cache that can not hit, performance decrease.

The speed caused by the competition is slow. When two or more threads accesses data, and a thread must wait for another thread to complete when the other thread is completed. Competition is always caused; this is also the biggest problem that is currently encountered in multiprocessor systems. When a large number of applications or DLLs that use memory blocks are run (or run on a multiprocessor system) in a multi-threaded manner, the speed is slowed down. Single locking use - commonly used solutions - means that all operations using the heap are serialized. Serialization occurs when the lock is waiting to cause the thread to switch the context. It is possible to imagine the speed of the red light that the intersection flicker is stopped. Competition usually causes the context of threads and processes to switch. The overhead of the context switch is large, but the overhead is the data from the data from the processor cache, and the data reconstruction when the thread is resurrected.

The speed caused by the damage caused. The reason for causing the damage is the application of the application to the incorrect use of the stack. Usually, the situation involves releasing the released stack or uses the released stacks, and the offshore rewrite of the block. (Destroy is not within the scope of this article. For other details such as memory rewriting and leak, see Microsoft Visual C (R) debug document.)

Frequent speeds are slower than the speed of dispensing. This is a very common phenomenon when using scripting languages. If the string is repeatedly allocated, the distribution is increased and released. Don't do this, if possible, try to assign big strings and use buffers.

Another method is to use less connection operation.

Competition is a problem that causes speed slower in allocation and release operations. Ideally, I hope to use a pile without competition and fast distribution / release. Unfortunately, there is no such universal heap, perhaps in the future.

In all server systems (such as IIS, MSProxy, DatabaseStacks, web servers, Exchange?

Others), the stack lock is really a large bottleneck. The more processors, the more competition will deteriorate. Try to minimize the use of piles

Now you understand the problem that exists when you use the pile, don't you want to have a super magic stick that can solve these problems? I hope there is. But there is no magic to speed up the stack - so don't expect to be greatly changed before the last week before the product is shipped. If you plan a stack in advance, the situation will greatly improve. Adjusting the method of using a heap, reducing the operation of the heap is a good way to improve performance.

How to reduce the use of a heap action? The number of heap operations can be reduced by using the location within the data structure. Consider the following examples:

Struct Objecta {

// Objecta's data

}

Struct ObjectB {

// ObjectB data

}

/ / Use Objecta and ObjectB simultaneously

//

// Use a pointer

//

Struct ObjectB {

Struct Objecta * Pobja;

// ObjectB data

}

//

// Use embedded

//

Struct ObjectB {

Struct Objecta Pobja;

// Objectb data

}

//

// Collection - use Objecta and Objectb in another object

//

Struct ObjectX {

Struct Objecta Obja;

Struct ObjectB objb;

}

Avoid using a pointer to associate two data structures. If the pointer is used to associate two data structures, objects A and B in the previous instance will be assigned and released separately. This will increase additional overhead - we have to avoid this practice.

Embed a child object with a pointer into the parent object. When there is a pointer in the object, it means that there is a dynamic element (80%) and a new location without reference. Embedding an increased position to reduce the need for further distribution / release. This will improve the performance of the application.

Merged small objects form large objects (polymerization). The polymerization reduces the number of blocks allocated and released. If there are several developers,

Different parts of their respective development and design, there will be many small objects to be merged. The integrated challenge is to find the correct aggregate boundary.

The inline buffer can meet the needs of 80% (AKA 80-20 rules). In individual cases, memory cache is required to save string / binary data, but do not know the total number of bytes in advance. It is estimated that one of the inline can meet the buffer required by 80%. For the remaining 20%, a new buffer can be allocated and pointers to this buffer. This reduces the distribution and release of the position space to increase the data, and fundamentally improves the performance of the code. Assign objects (blocks) in blocks. Blocking is a method of assigning multiple objects at a group manner. If you continue to track the list of items, such as a list of {name, value} pair, there are two options: Select One is assigned to each "Name-Value" pair; selecting the second is to assign a accommodation (eg Five) "Name - Value" pair structure. For example, in general, if you store four pairs, you can reduce the number of nodes, and if an additional amount of space is required, an additional linked list pointer is required. Blocking is a friendly processor cache, especially for L1-caches, because it provides an increased location - many data blocks will be in the same virtual page for block allocation.

Use _AMBLKSIZ correctly. C Run (CRT) has its custom front-end allocation, the allocation program from the backend (

Win32 heap) The distribution size is the block of _AMBLKSIZ. Setting_AMBLKSIZ to a higher value potentially reduce the number of calls to the backend. This only applies to extensive programs that use CRT.

The benefits of using the above techniques will vary depending on the type, size, and workload. However, it can always be gains in performance and deliverability. On the other hand, the code will be a bit special, but if you think care, the code is still easy to manage.

3 Other technology to improve performance

Here are some technologies that improve speeds:

Use Windows NT5 Pile

Due to several colleagues' efforts and hard work, several major improvements in Microsoft Windows (R) 2000 in early 1998:

Improved lock within the bunch code. Stack of code is a lock for each pile. The global lock protective stack data structure prevents the use of multi-wire. But unfortunately, in the case of high communication, heap is still trapped in global locks, leading to high competition and low performance. In Windows 2000, the critical regions of the lock code will minimize the possibility of competition, thereby increasing scalability.

Use the "Lookaside" list. The stack data structure uses all idle items of blocks using a fast cache between 8 and 1,024 bytes (incremental increment). Fast caches were initially protected in the global lock. Now, use the LOOKASIDE list to access these fast cache idle lists. These lists do not require locking, but use 64-bit interlock operations, thus improving performance.

The internal data structure algorithm is also improved.

These improvements avoid demand for cache, but do not exclude other optimizations. Use Windows NT5 pile to evaluate your code; it is optimal to blocks less than 1,024 bytes (1 kb) (blocks from the front-end allocation program). GlobalAlloc () and Localaloc () are built on the same pile to access generic mechanisms for each process heap. If you want to achieve high local performance, use the Heap (R) API to access each process heap, or create your own stack for the assignment operation. If you need to operate on a large block, you can use the VirtualAlalloc () / VirtualFree () operation.

The above improvements have been used in Windows 2000 Beta 2 and Windows NT 4.0 SP4. After the improvement, the competition rate of the stack is significantly reduced. This benefits all direct users of all Win32 heaps. The CRT pile is built on top of Win32 pile, but it uses its own small blocks, so it cannot benefit from Windows NT improvement. (Visual C version 6.0 also has improved heap allocation procedures.) Use allocation cache

Assigning a cache allows the cache allocated block to be reused. This reduces the pile of processes (or global stack)

The number of allocation / release calls is also allowed to maximize the blocks that have been allocated. In addition, allocation caches allow for collecting statistics in order to better understand the use of the object at a higher level.

Typically, the custom heap allocation program is implemented at the top of the process stack. The custom heap allocation program is similar to the behavior of the system. The main difference is that it provides a cache at the top of the process stack for allocation. The cache is designed into a fixed size (such as 32 bytes, 64 bytes, 128 bytes, etc.). This is a good strategy, but this custom heap allocation program loses "semantic information" related to the object associated with allocation and release.

In contrast to the custom heap allocation, "Assign Cache" is implemented as each type allocation cache. Enable

They can also retain a lot of semantic information outside of all benefits of custom heap allocation programs. Each assignment cache handler is associated with a target binary object. It can initialize a set of parameters, which represents the concurrent level, the object size, and the number of elements held in the idle list. Assigning a cache handler object maintains your own private free solid pool (no more than the specified threshold) and uses a private protection lock. Together, allocate caches and private locks reduce traffic with the main system stack, thereby providing increased concurrency, maximum reuse and higher scalability. You need to use the cleaning program to periodically check all the activities of allocating the cache handler and reclaim unused resources. If there is no activity, the pool of the assigned object will be released, thereby increasing performance. You can review each assignment / release activity. The first level information includes the total number of objects, allocation, and release calls. The semantic relationship between the individual objects can be obtained by viewing their statistics. This relationship can be used to reduce memory allocation with one of the many technologies described above.

Assigning a cache also plays a role in debug assistant to help you track the number of objects that do not have completely cleared. You can even find the exact failure caller by viewing the status stack returns the trace and the signature other than the object that is not cleared.

MP pile

MP stack is a package for multi-processor-friendly distributed distribution, in Win32 SDK (Windows NT 4.0, and Update

The version can be obtained. Implemented by JVERT, the abstraction is built on top of the Win32 stack package. The MP pile creates multiple Win32 heaps and attempts to distribute allocation calls to different piles to reduce competition in all single locks.

This package is a good step - an improved MP-friendly custom heap allocation program. However, it does not provide semantic information and lack of statistical functions. MP piles are usually used as the SDK library. If you create reusable components using this SDK, you will greatly benefit. However, if this SDK library is created in each DLL, the work setting will be added.

Reconstruction algorithm and data structure

To scale on a multiprocessor machine, algorithm, implementation, data structure, and hardware must be dynamically telescope. Please see the most often allocated and released data structures. Try, "Can I do this with different data structures?" For example, if a list of read-only items is loaded when the application is initialized, this list does not have to be a list of linear links. If it is a dynamically assigned array, it is very good. The dynamically assigned array will reduce the stacks and debris in the memory to enhance performance.

Reduce the number of small objects to reduce the load of the heap allocation program. For example, we use five different objects on the critical processing path of the server, each object separately, and release. To cache these objects together, add the pile to five to one, significantly reduce the load of the heap, especially when processing more than 1,000 requests per second.

If you use the "Automation" structure, consider deleting "Automation BSTR" from the main line code, or at least avoid duplicate BSTR operations. (BSTR connection results in too much redistribution and assignment / release operation.)

Summary

There is a huge overhead for all platforms. Each individual code has a specific requirement, but the design can use the basic theories discussed herein to reduce the interaction between heaps.

1. Evaluate the use of your code.

2. Improve your code to use fewer piles: Analyze critical paths and fixed data structures.

3. Methods of using quantitative piles before implementing custom packaging prior to implementing custom packages.

4. If you are dissatisfied with performance, ask OS to enter the heap. More such requests mean more attention to improving stacks.

5. Requires the C running group to make a small distribution package for the stacks provided by the OS. With the improvement of the OS heap, the cost of the piles of C operation will be reduced.

The operating system (Windows NT family) is constantly improving the heap. Please pay attention to and use these improvements.

/// q Let a class can only be allocated on the stack and cannot be assigned on the heap.

Class foo {

Private:

Void * operator new (size_t); // Statement to private should be a call to NEW.

}

int main ()

{

Foo F;

Foo * p = new foo;

Return 0;

}

/ r makes it impossible to compile the object of the object to allocate the class on the stack

/ RR

IMO, Generally It Is Impossible to do this withoutsupport from the OS or compilation environment, Sinceit Is Difficult To Determine WHether The Object Ison The Heap OR Not ...

For Some Very Special System, this Is Possible Sinceyou CAN Control The Heap Address / Length for Some Embedded System, But We Still Do Not Have a General Solution ...

/ RRR

Net over the foo * p = new foo;

/// q Let a class can only be assigned on the heap and cannot be assigned on the stack.

Let the class only allocate on the heap or

Destructure function protected.

Add a Public member function destroy () {delete this;}

This a * a = new a;

When destructured, A-> destroy ();

/ r

The destructor is that the default must be called

You do this, the Programmer is called DESTROY, the system still calls the destructor

So, I don't think it is impossible to let users allocate on the stack ...

/ RR

The destructor has been called in DESTROY and releases memory.

No need to adjust, you write a program to try

/// SITG

1. Use the set_new_handle function to handle problems with insufficient allocation of memory.

2. Allocate a large piece and then in the already allocated block.

3. Have n identical objects. It can be handled like this,

Directly allocate new [4000], use a linked list to store these objects, take it from the linked list, Table This node identifies is being used, if not, the direct ID is not used, which avoids allocation and release issues. Thank you SITG, so detailed. It should be good to study hard, I don't understand these big cows. After leaving Tsinghua, there is no chance ... more learning more questions.

转载请注明原文地址:https://www.9cbs.com/read-130068.html

New Post(0)