Gregor Noriskin
Microsoft CLR Performance Group
Applicable to: Microsoft .NET Framework
Summary: Learn the .NET Framework Public Language Runture from the perspective of performance. Learn how to find best practices for managed code performance, and how to measure the performance of managed applications.
Download CLR Profiler. (330KB)
This page
Software development is like juggling .NET public language runtime hostess data and garbage collector assignment profile for analysis API and CLR analyzer host server GC Terminal (Dispose Pattern) Weak reference Note Checked Code and CLR JIT Value type exception handling thread and synchronous reflection (Reflection) advanced binding security COM interoperability and platform call performance counter other tools Summary resources
Software development is like juggling
We can make software development processes more juggling. Typically, there are at least three items in juggling, and there is no limit to the number of items that can be used. When you start learning how to play juggle, you will find a single ball when you are catching up and thrown. As the proficiency increases, you will start focusing on these balls instead of concentrating on each single ball. When you have a way of juggling, you will focus on a single ball again, while trying to keep this ball balance and continue to throw other balls. You can know the direction of the movement of these balls, and you can also put your hand in the right place to catch the ball and throw. However, what is this similar to software development?
Different roles will "play" different three items in software development: projects and plan managers are playing function, resources and time, while software developers are correct, performance and security. People always try to use more items when juggling, but as long as they have learned juggling people, even if only one ball will make the balance of all the ball in the air, it becomes more difficult. As a result, if the ball is less than three, it will not be juggling. As a software developer, if you don't consider the correctness, performance, and security of the code being written, you will not be in your work. When you have just begun to consider correctness, performance, and security, you will find that you can only concentrate on one aspect each time. When they become part of your daily work, you will find that you don't need to focus on a particular aspect, because when they have already integrated your work. Once you have mastered them, you will be able to trade off with intuition and adjust your attention accordingly. This is the same as juggling, the key is to practice.
Write high-performance code itself also exists in three items: set targets, measure, and understand the target platform. If you don't know the speed of the code must reach, how do you know that you have finished? If you don't measure and analyze the code, how do you know that you have reached your goals, or why didn't you achieve your goals? If you don't understand the target platform, how do you know what to optimize when you don't implement your goals? These principles apply almost all the high-performance code development processes, whether you use a platform as a target platform. To complete an article about writing high performance code, you must mention these three aspects. Although all of these three areas are equally important, the focus of this article is after two aspects because they apply to high performance applications that are written in a Microsoft® .NET framework as a target platform.
The basic principles of writing high performance code on any platform are:
1. Set the performance goal 2. Measure, measure, Measurement 3. Understand the target hardware and software platform of the application
Back to top
.NET Public Language Runture
The core of the .NET frame is a public language runtime (CLR). CLR provides all of the runtime services for your code: real-time compilation, memory management, security, and a large number of other services. The design of CLR considers high performance requirements. That is, you can make full use of its performance or do not play these properties. This article will introduce the public language running library from the perspective of performance, to find the best way to manage managed code performance, will also show how to measure the performance of managed applications. This article does not intend to conduct a comprehensive discussion of the performance characteristics of the .NET framework. According to the purpose of this article, the performance of performance mentioned in the article will include throughput, scalability, startup time, and memory usage.
Back to top
Managed data and garbage collector
When using managed applications in performance-centered applications, one of the most concerned issues most concerned is the overhead of CLR memory management - this management is performed by garbage collector (GC). The overhead of memory management is overwritten by the memory associated with the type instance, and the management overhead of memory in the entire life cycle of the instance and the release cost of memory when the memory is no longer needed.
The overhead of managed distribution is usually very small, in most cases, less time required to C / C Malloc or New. This is because the CLR does not need to find the next available continuous memory block sufficient to accommodate new objects by scanning the available list, but always keep the pointer to the next available location in the memory. We can regard allocation of the hosted stack as "similar to the stack". If GC needs to release memory to assign new objects, the assignment may lead to recycle operations. In this case, the allocated overhead will be larger than Malloc and New. The fixed object also affects the overhead of the allocation. A fixed object refers to those GC receiving instructions that cannot move their locations during the reclaim operation, which is usually transmitted to the local API due to the address of these objects.
Unlike Malloc or New, managing memory in the entire life cycle of the object will bring overhead. The CLR GC distinguishes different generations, which means it is not recycling the entire pile every time. However, GC still needs to understand whether the root of the active object in the remaining stack is part of the object being recovered in the heap. If the object contained in the memory has a reference to the next-generation object, the overhead of managing memory will be very large in the life cycle of these objects.
GC is a label and cleaning garbage collector. The hosted pile contains three generations: 0th generation contains all new objects, and the first generation contains longer cases, and the second generation contains long-term survive objects. GC will recycle the smallest portion from the heap as much as possible from the heap in the premise of releasing enough memory to continue to run. The recycling operation of the generation includes the recovery of all next generation, in which case the first generation recovery will also recover the 0th generation. The size of the 0th generation will dynamically change according to the size of the processor cache and the distribution rate of the application, which is usually not more than 10 milliseconds. The size of the first generation is dynamically changed according to the distribution rate of the application, and it is usually between 10 and 30 milliseconds. The second generation of the second generation depends on the distribution of the application, which is used to recycle it depends on this file. The maximum performance of the performance overhead for managing application memory is the second generation of recycling operations.
Tip GC has self-regulating energy, which will be adjusted according to the requirements of the application memory. In most cases, it is not enabled by programming the GC by programming. By calling GC.Collect, "Help" GC is usually unable to improve your application performance.
GC can relocate the living object during recycling objects. If these objects are relatively large, the overhead of the relocation will be large, so these objects will be assigned to a special area called large object stacks in the stack. In a special area of Large Object HEAP. The large object heap can be recovered, but it cannot be compressed, for example, these large objects cannot be relocated. Big object refers to those larger than 80K. Please note that this concept may vary in the CLR version of the future. When you need to recycle large object stacks, it will be enhanced fully recycled, and they are recycled in the second generation of recycling. The allocation rate and mortality of objects in large object stacks will have a large impact on performance overhead of managing applications. Back to top
Assign configuration file
The global allocation profile of the hosted application defines how much the garbage collector is managed with the application-related memory. The larger the workload of GC management memory, the more the number of CPUs experienced by the GC, and the shorter the time spent on the CPU running the application code. The allocation profile is calculated by the number of allocated objects, the size of the object and its lifecycle. A most obvious way to alleviate GC pressure is to reduce the number of objects allocated. Use object-oriented design technology to design the application as an scalable, modular, and can be used, often leading to an increase in the number of assignments. Abstract and "precision" will result in performance degradation.
GC Friendly Assignment Profiles will contain some objects assigned when the application is started, the lifecycle of these objects is consistent with the application, while other object life cycles are short. Longer time longer or less or does not include references to objects with shorter survival time. When assigned a configuration file deviates from this principle, GC must work hard to manage the memory of the application.
The object of the GC unfriendly assignment configuration will contain some objects that can survive to the second generation but will then die, or will contain some objects that are shorter to allocate to large object stacks. Those survival time is enough to enter the second generation and then die will die, the object is the largest object of management overhead. Just as I mentioned earlier, if the object in the previous generation in the Perform GC includes reference to the new generation of objects, it will also increase the overhead of the recycling.
A typical actual allocation profile is between the two allocation profiles mentioned above. An important metric standard for allocating profiles is the percentage of CPU spent on GC. You can get this number through the .NET CLR Memory: Time IN GC Performance Counter. If the average value of this counter is greater than 30%, you may need a careful check of your assigned profile. This does not necessarily mean that your assignment profile has problems. In some applications that occupy a large amount of memory, GC reaches this level is inevitable, and it is normal. When you encounter a performance problem, you should first check this counter, which will immediately display the problem in your assigned profile.
Tip If the .NET CLR Memory:% Time IN GC Performance Counter indicates that your application flowers in GC is higher than its total time 30%, indicating that you need a careful check of your assigned profile. .
Tip The 0th generation set included in the GC friendly application is far more than the second generation of collections. This ratio can be obtained by comparing NET CLR Memory: # Gen 0 Collectes and Net CLR Memory: # Gen 2 Collectes Performance Counter.
Back to top
API and CLR analyzer for analysis
The CLR contains a powerful API for analysis, and third parties can use it to write custom analyzers ã ã for managed applications. The CLR analyzer is an allocation analysis example tool, written by the CLR product team, but does not provide technical support, which is used by this API for analysis. Developers can use the CLR analyzer to see the assignment profile of their management applications.
Figure 1: CLR Analyzer Main Window
The CLR analyzer includes a wide range of very useful allocation profile views, including: allocated type histogram, allocation, and calling chart showing different generations of GC and the time line of the result of the hosted stack after these recycles, and display each method Treatment of the allocation and assembly load condition. Figure 2: CLR Analyzer Assignment Chart
Tip For more information on how to use the CLR analyzer, see the readme files included in the ZIP file.
Note that the CLR analyzer has high performance characteristics, which can significantly change the performance characteristics of the application. If you run the CLR analyzer while running the application, the problem arising from pressure is likely to disappear.
Back to top
Host server GC
There are two different garbage collectors available for CLR: workstation GC and server GC. In the console and the Windows Forms application, the host GC is the workstation GC, and the host GC is hosts in ASP.NET. The server GC is optimized for the scalability of throughput and multiprocessor. Server GCs will suspend all threads running managed code during the entire recycling period (including the tag phase and clear phase), and GC will execute in parallel on all CPUs used in the process of advanced CPU dedicated threads in high priority. If the thread runs the unit during the GC, then these threads will be suspended only when the unit is running. If the server application you want to build will run on a multiprocessor computer, you strongly recommend that you use the server GC. If your application is not provided by ASP.NET, you must write a native application to explicit host CLR.
Tip If you want to build an extensible server application, host the host server GC. See Implement a Custom Common Language Runtime Host for your management app.
The workstation GC is optimized, and its lag is very short, which is very suitable for client applications. No one will hopes that the client application has a significant pause during the execution of GC, because the client's performance is usually measured by reaction performance through the original throughput. The workstation GC is concurrent GC, which means it performs a tag phase while the hosted code is run. The workstation GC will only suspend the thread running the hosted code only when the clearance phase needs to be executed. In the workstation GC, since the GC is running on a thread, it is only running on one CPU.
Back to top
FINALIZATION
The CLR provides a mechanism for automatically cleaning before the release of the memory associated with the type instance. This mechanism is called finalization. Typically, the end is used to release the native resources, in which case the database connection or operating system handle is used by the object.
The end is a very cost-effective function, and it will increase the pressure of the GC. The GC will track objects that need to perform end operations in the Finalizable queue. If the GC discovers an object that is no longer survived and needs to be ended during the recovery, it will move the object in the FINALIZABLE queue. The end operation is performed in a standalone thread called the finalizer thread. Since all states of the object may be required during the execution of the terminator, all objects thereof will be upgraded to the next generation. The memory associated with objects or object maps is only released when the subsequent GC is executed.
Resources that need to be released should be packaged in a small end-up object, for example, if your classes need to reference managed resources and unmanaged resources, you should pack unmanaged resources in new can be used, and make it Class becomes a member of your class. The parent class cannot be end class. This means that only classes containing unmanaged resources will be upgraded (if you are not reference the parent class in the class containing the unmanaged resource). Also remember that there is only one end thread. If a terminator causing this thread to be blocked, you cannot call subsequent terminations, and resources cannot be released, and your application will lead to resource leakage. Tip should keep the terminal simple as much as possible, and never block.
Tip The packaging class that will only be cleaned up is set to end.
The termination can be considered as an alternative to the reference count. Objects to execute reference counts will track how many other objects are referenced (this results in some very well known issues) to release their resources when the reference count is zero. The CLR does not implement a reference count, so it needs to provide a mechanism to automatically release the resource when there is no reference to the object. The end is this mechanism. Typically, the end is only required to be in unclear life cycle of objects to be cleaned.
Back to top
Dispose Pattern
The non-hosting resources associated with the object should be released as soon as possible when the life cycle of the object is unclear. This process is called "disposal" object. Disposal mode is implemented through the IDisposable interface (although you are also easy to implement). If you want to terminate the class app, for example, to make the class instance can be disposed, you need to let the object implements the IDisposable interface and execute the Dispose method. Using Dispose Method You can call the same segment cleaning code in the terminator and notify the GC to end the object by calling the gc.suppressFinalization method. It is best to use the Dispose method and terminator to call the universal end function, so you only need to maintain a cleanup code. Moreover, if the semantics of the object are like this: Close method is more logical than the Dispose method, then a Close method should also be implemented, in which case the database connection or socket is logically "off". CLOSE can just call the Dispose method.
Using the terminator provides the Dispose method to provide a good practice because there will never know how to use the class, for example, if it can clearly know its lifecycle. If you use the class implements a disposal mode, and you know how to place the object, it is best to call Dispose.
Tip Please provide a Dispose method for all end-end classes.
Tip Please cancel the end operation in the Dispose method.
Tip Please call the universal cleaning function.
Tip If you use the object to implement idisposable, and you know that the object is no longer needed, please call the Dispose method.
C # provides a very convenient way to automatically dispose of objects. Use the keyword using to mark the code block, which will then call the Dispose for a large number of available objects.
C # Using keyword
Using (DisposableType T) {// do some work with t} // T.dispose () is Called Automatically
Back to top
Weak reference notes
In the stack, on the register, on other objects or other GC root objects, any reference to the object will cause the object to remain in the GC. In general, this is a good thing because this usually represents the application is not performed by the object. However, sometimes you need to reference an object, but don't want to affect its life cycle. In this case, the CLR provides a mechanism called "weak reference" for achieving this. Any strong reference (eg, reference to the root) can be converted into the weak reference. For example, when you need to create an external cursor object that can traverse the data structure, it may be necessary to use weak references when you do not affect the life cycle of the object. For example, when you need to create a cache that is refreshed when there is a memory pressure, it is also necessary to use a weak reference, for example, when GC occurs. Create a weak reference in C #
MyRefType MRT = New MyRefType (); // ... // Create Weak Reference Weakreference WR = New Weakreference (MRT); MRT = NULL; // Object IS no longer rooted / / ... // HAS Object Been Collected? IF (wr.isalive) {// Get a strong reference to the object mrt = wrote; // Object is rooted and can be used again} else {// recreate the object mrt = new myreftype ();
Back to top
Managed code and CLR JIT
The hosted assembly is a distribution unit of the hosted code, which is composed of a Microsoft Intermediate Language (MSIL or IL) for all processors. The Real Time (JIT) function of the CLR can compile IL into optimized n86 instructions. JIT is a compiler that performs optimization operations, but because compilation is performed at the software runtime, and only when the method is first calling the method, the number of optimization needs to be balanced with the time spent on the execution compilation. . Typically, this is not important for server applications because startup time and responses usually do not constitute problems; but for client applications, it is important. Please note that you can speed up the startup time by using NGEN.exe during installation.
Many optimized operations performed by JIT have no programming mode associated with them, for example, you cannot explicitly encode them, but there are also some optimized operations with associated programming mode. The next section will discuss part of the parties in the latter.
Tip Using the NGEN.exe utility compiles the application when installing, you can speed up the startup time of the client application.
Inline
All method calls will bring overhead. For example, it is necessary to push the parameters into the stack or stored in the register, and the method that needs to be performed starts (ProPoG) and Epilog. Only the method of transferring the method of the called method is required into the main body of the caller, and the call overhead of certain methods can be avoided. This operation is called the method inline. JIT uses a large number of detection methods to determine if there should be a method inline. Below is some list of some of the more important probing methods (note this is not a detailed list):
• The method of more than 32 bytes of IL will not be inline. The virtual function will not be inline. • Functions containing complex processes are not inline. Complex flow control is any process control other than IF / THEN / ELSE, in which case Switch or while. • The method containing an exception handling block is not inline, but the method that causes an exception can be inline. • If all of the alignment of a method is a structure, the method will not be inline.
I will seriously consider the problem of explicitly encoding these detection methods, as they may vary in the later JIT versions. Please do not give up the correctness of the method to ensure that the method can inline. You may have noticed an interesting phenomenon, keyword inline in C cannot guarantee the compiler to inline (although __forceinline can). Under normal circumstances, the GET and SET methods of the attribute are ideal for inline because they are primarily used to initialize private data members.
Tip Please do not give up the correctness of the method to try to ensure inline.
Remove range check
There are many advantages in the hosted code, one of which is automatically performed. JIT will check each time you use the Array [Index] Semantic Access Array, JIT will check to ensure that the index is in the array range. In a loop environment with a large number of iterations and a small amount of instructions executed by each iteration, the overhead of the range check may be large. In some cases, JIT may also delete it from the cyclic body while detecting these ranges, that is, check it only before the loop starts execution. There is a programming mode in C # to ensure that these range checks will be deleted: explicitly testing the length of the array in the "for" statement. Note that as long as there is a subtle deviation in this mode, it can cause the inspection that cannot be removed. In this case, you need to add a value to the index.
Remove the range check in C #
// range check will be eliminated for (int i = 0; i When searching for large-scale irregular arrays, the optimization operation is particularly obvious because the internal cycle and external loop range check will be simultaneously deleted. Require variable usage tracking optimization operation A large number of JIT compiler optimization operations require JIT to track the use of setters and local variables, for example, their first and most recent time in the method subject. In CLR 1.0 and 1.1, JIT can track the total number of variables that use usage within 64. For example, "Storage" is an optimization operation that needs to be tracked for use. The registration operation refers to the storage of variables in the processor register, not in the stack frame (for example, in memory). Compared to the time to access the variables in the stack frame, the access to the registered variable is much faster, even if the variable in the frame is in the cache of the processor. You can only store 64 variables, all other variables will be pushed to the stack. In addition to the registration operation, there are also some optimized operations that need to be used for use. The number of differential ginseng and local parameters in the method should be maintained below 64 to ensure maximum number of JIT optimization operations. Remember that this number may vary in the later CLR version. Tips keep the method brief. There are many reasons to do so, including methods inline, registration, and JIT duration. Other JIT optimization operations The JIT compiler can also perform a large number of other optimized operations: constant and copy propagation, cyclic invariance promotion, and several other operations. It is free to implement the optimized programming mode, no need to spend money. Why didn't I see these optimization functions in Visual Studio? When you use the debug menu in Visual Studio or press the F5 key to launch an application, all JIT optimization features will be disabled regardless of the release or debug version. When the host is started through the debugger, even if it is not the debug version of the application, JIT will also issue a non-optimized X86 instruction. If you want JIT to make an optimization code, start the application from the Windows Explorer, or use the Ctrl F5 combination key in Visual Studio. If you want to view an optimized disassembler, and compare it with non-optimized code, you can use Cordbg.exe. Tip Using Cordbg.exe You can view the optimization and non-optimized code of the JIT. After using Cordbg.exe to launch your application, you can set the JIT mode by typing the following code: (CORDBG) Mode Jitoptimizations 1 Jit's Will Product Optimized Code (Cordbg) Mode Jitoptimization JIT will generate a debugging (non-optimized) code. Back to top Value type The CLR provides two sets of different types: reference types and value types. The reference type is always assigned to the hosted stack and passed according to the reference (as implied as its name). The value type is assigned to the stack or part of the object in the heap. By default, you can pass them by reference to deliver them. When allocating value types, the desired overhead is very small, assuming that they are always small and simple, and the overhead will be small when they communicate as a parameter. A good example of the correct use value type is to include the Point value type of X and Y coordinates. Point value type Struct Point {public INTX; public int y; //} The value type can also be considered an object, for example, the object method can be called on the value type, which can be converted to an object or passed to the location where the object is required. Regardless of what is used, as long as the value type is converted to a reference type, it is necessary to pass the boxing. When the value type is packaged, a new object is assigned in the hosted stack, and the value is copied to this new object. This operation will take up a large system overhead, which may also reduce or completely eliminate the performance obtained by using the value type. The process of implicit or explicit conversion value types will be referred to as unboxed. Packing / unpacking value type C #: INT boxunboxvalueType () {INT i = 10; Object O = (Object) i; // i is boxed return (int) o 3; // i is unboxed} MSIL: .method private hidebysig instance int32 BoxUnboxValueType () cil managed {// Code size 20 (0x14) .maxstack 2 .locals init (int32 V_0, object V_1) IL_0000: ldc.i4.s 10 IL_0002: stloc.0 IL_0003: ldloc. 0 IL_0004: Box [mscorlib] system.int32 IL_0009: STLOC.1 IL_000A: LDLOC.1 IL_000B: UNBOX [mscorlib] System.Int32 IL_0010: Ldind.i4 IL_0011: LDC.I4.3 IL_0012: Add IL_0013: Ret} // End of method class1 :: boxunboxvalueType If you implement a custom value type (the structure in the C #), you should consider overwriting the ToString method. If this method is not overwritten, the call to the TOString on the value type will cause this type to be packaged. This is also true for other methods inherited from System.Object. In this case, please use Equals to overwrite, although Tostring is likely to be the most commonly used call method. If you want to know if the value type is packaged and how to pack it, you can use the ILDASM.EXE utility to find the BOX instruction (as described above) in MSIL. Cover the toString () method in C # to prevent packing Struct Point {public int x; public int y; // this will prevent type being boxed when torstring is called public override string toString () {return x.toString () " y.toString ();}} Note that when you create a collection (for example, a floating point array list), each item added to the collection will be packaged. You should consider using an array or a value type to create a custom set class. Improved packing when using the collection class in C # ArrayList al = new arraylist (); al.add (42.0f); // implicitly boxed Becuase add () takes objectfloat f = (float) al [0]; // unboxed Back to top Abnormal processing Typically, the error condition will be used as a regular process. In this case, if you try to add the user to the Active Directory instance by programming, you can only try to add the user if the system returns E_ADS_OBJECT_EXISTS HRESULT, indicating that they already exist in this directory. In addition, you can also find the user by searching the directory, if the search fails, just add this user. The use of errors in conventional processes will reduce performance in the CLR environment. The error handling in the CLR is implemented by means of structured exception processing. Before the abnormality, the overhead of the hosted abnormality is very small. In the CLR, when an exception is triggered, you need to use the stack traversal to find the corresponding exception handler. Stack traversal is a larger operation. As its name is indicated, the exception should be used for abnormal or unexpected situations. • Tip For performance-centric methods, consider returning an enumeration result of the expected result, not an exception. • Tips There are a variety of .NET CLR exception performance counters can notify you how much exceptions have been triggered in your application. • Tip If you use VB.NET's Use Exceptions instead of ON Error Goto, the error object is unnecessary overhead. Back to top Threads and synchronization The CLR provides a wealth of threads and synchronization features, including the ability to create your own threads, thread pools, and various synchronous primitives. Before leaving the threads supported in the CLR, you should carefully consider the usage of threads. Keep in mind that adding threads will actually reduce throughput without increasing throughput, but it will definitely increase the utilization rate of memory. In a server application that will run on a multiprocessor computer, a parallel operation (although this depends on how many locks will be executed, for example, serialization execution mode) to add threads to significantly improve throughput; at the client In the application, the thread adds display activity and / or progress can increase the reaction performance (low throughput overhead). If the thread in the application is not dedicated to threads for specific tasks, or there is a special state, you should consider using a thread pool. If you have used the Win32 thread pool in the past, it will be more familiar with the CLR thread pool. There is only one thread pool instance for each hosted process. The thread pool can intelligently identify the number of threads it created and the itself is adjusted according to the load on the computer. To discuss thread processing, you must discuss synchronization. All throughput benefits brought by multi-threads may be incorrect because of synchronous logic writing. The locking particle size will greatly affect the overall throughput of the application because creation and management locks bring system overhead, and lock is likely to serialize the execution step. I will use an example of adding nodes in the tree to illustrate this view. For example, if the tree will become a shared data structure, the multi-thread needs to access it during the execution of the application, and you need to synchronize the tree. You can choose to lock the entire tree while adding nodes, which means you only bring overhead when you lock, but other threads that try to access the tree may be blocked. This will be a coarse-grained locking example. Alternatively, you can lock each node while traversing the tree, which means that you will bring overhead when you create a lock on each node, but other threads will not block, unless they try to access the specific node you have locked. . This is a fine-grained lock example. A subtree to which it is to be operated is perhaps a more suitable locking particle size. Note that in this example, you may use a shared lock (rwlock) because there is only such a multiple readers to access simultaneously. The simplest and effective way to perform synchronous operation is to use the System.Threading.Interlocked class. The Interlocked class provides a large number of low-level atomic operations: Increment, Decrement, Exchange, and CompareExchange. Use system.threading.interlocked classes in C # using System.Threading; // ... public class MyClass {void MyClass () // Constructor {// Increment a global instance counter atomically Interlocked.Increment (ref MyClassInstanceCounter);} ~ MyClass () // Finalizer {// Decrement A global instance counter atomically interlocked.Decrement (ref myclassinstancecounter); // ...} // ...} The most commonly used synchronization mechanism may be a monitor or a critical section. The monitor lock can be used directly, or it can be used with the LOCK key in C #. For a given object, the LOCK keyword will synchronize a specific code block. From the perspective of performance, if the monitoring of the monitor is low, the system overhead is relatively small; however, if the contention rate is high, the system overhead will be relatively large. C # lock keyword // Thread will attempt to obtain the lock // and block until it does lock (mySharedObject) {// A thread will only be able to execute the code // within this block if it holds the lock} // Thread releases the lock Rwlock provides a shared lock mechanism: for example, the lock can be shared between the "reader", but cannot share between the "author". In this case where this lock is also applicable, using RWLock can bring better throughput than using the monitor, which allows only one reader or author to get the lock each time. System.Threading Namespace also includes Mutex classes. Mutex is a synchronous primitive that can be used to perform synchronization operations across processes. Please note that it is much larger than the overhead of the critical regions, and it should be used when you need to perform a synchronization operation across the process. Back to top Reflection (REFLECTION) Reflection is a mechanism provided by CLR for obtaining type information by programming during operation. Reflection depends to a large extent on metadata embedded in the hosted assembly. Many reflection APIs require search and analyze metadata, which are overhead of these operations. These reflection APIs can be divided into three performance intervals: type comparison, member enumeration and member calls. The system overhead of these intervals has been getting bigger. Type comparison operations, in this example, TypeOf (C #), and getType, IS, isinstanceOfType, etc., are the minimal reflection API, although their actual overhead is not small. Members enumeration operations can be checked by programming methods, attributes, fields, events, constructor, etc. For example, a member enumeration operation of this class may be used in the scheme in the design, in which case this action will enumerate the Property Browser in Visual Studio, Customs Web Controls (Custom Web) The properties of the controls). Reflection APIs used to dynamically call classes or dynamically issues JIT and perform a method are the largest reflection API. Of course, if you need to dynamically load assemblies, type instantiation, and method calls, there is a late binding scheme, but this loose coupling relationship needs to make a clear performance weigh. In general, the reflection API should be avoided in a code path that affects performance. Please note that although you don't use the reflection directly, the API you use may use it. Therefore, pay attention to whether the reflection API is used indirectly. Back to top Late binding The late binding call is a function of using reflection internally. Visual Basic.Net and JScript.net support advanced binding calls. For example, you don't have to make a declaration before using a variable. The late binding object is actually a type object, and the object can be converted to the correct type when running. The late binding call is slower number magnitude than the direct call. Unless you really need a late binding behavior, you should avoid using it in the Performance Key Code Path. Tip If you are using VB.NET, and do not necessarily need late binding, you can include Option Explicit on and Option Strict on the top of the source file to notify the compiler that the late binding is not allowed. These options will force you to declare and ask you to set the variable type and turn off the implicit conversion. Back to top safety Safety is necessary and the components of the main CLR, which will reduce performance when using it. When the code is Fully Trusted (Full Trust) and the security policy is default, the impact of security to the throughput and startup time of the application will be small. For code-incomplete trust attitude (eg, code from the Internet or intranet area) or reduced MyComputer Grant Set will increase security performance overhead. Back to top COM interoperability and platform call COM interoperability and platform calls provide local APIs for managed code in almost transparent ways, usually do not need any special codes when they call most of this API, but may need to use the mouse multiple clicks. As you expect, call the native code from the hosted code will bring overhead, and vice versa. This overhead consists of two parts: a part is fixed overhead, this overhead is related to the conversion between the native code and the managed code; the other part is variable overhead, this overhead is encapsulated by the parameters that may be used. The return value is related. COM interoperability and fixed overhead of platform calls are smaller in overhead: usually no more than 50 instructions. The overhead of the sealing process between the various managed types depends on the similarity of the representation of the boundary on both sides of the boundary. The type of type overhead that needs to be converted is relatively large. For example, all strings in the CLR are unicode strings. If you want to call the WIN32 API that requires an ANSI character array through the platform, you must narrow each character in the string. However, if it is passed to the type of hosted integer array to a type of integer array in the unit, it is not necessary to make a seal process. Because there is a performance overhead associated with calling this code, you should make sure that the overhead is a reasonable overhead. If you plan to perform local calls, make sure that the work made by this machine makes the performance overhead due to this call, that is, as much as possible, "small," rather than "large but". A good way to measure this machine call overhead is to measure the performance of the native method that does not accept any parameters native method, and then measure the performance of the machine method you want to call. The difference between them is the overhead of the seal processing. Tip should create "small" COM interoperability and platform calls, not "large and complete" calls, and ensure that the cost of call is cost-effective to call the workload. Note that there is no thread mode related to the hosted thread. When you intend to COM interoperability, you need to make sure that the thread that has been invoked is initialized to the correct COM thread mode. This operation is usually implemented using MtathReadAttribute and StathReadAttribute (although it can be implemented). Back to top Performance counter There is a large number of Windows performance counters available for .NET CLR. These performance counters are weapons they can choose when developers first diagnose performance issues, or attempt to identify the performance characteristics of managed applications. I have briefly introduced several performance counters related to memory management and exceptions. In the CLR and .NET framework, performance counters are almost everywhere. Typically, these performance counters can be used and are harmless to the system, and their overhead is low, and the performance characteristics of the application will not change the application. Back to top Other tools In addition to performance counters and CLR analyzers, you also need to use regular analyzers to determine which methods in the application costs the most time and is most often called. These methods will be the method you need to optimize. There are a variety of commercial analyzers that support managed code available, including COMPARE DEVPARTNER Studio Professional Edition 7.0 and Intel® VTune® Performance Analyzer 7.0. Compuware also produces a free managed code analyzer called DEVPARTNER Profiler Community Edition. Back to top summary This article only introduces the CLR and .NET framework from the perspective of performance. In the CLR and .NET framework, there are many other ways to affect the performance of the application. I hope that developers are: Please don't make any assumptions for your application's target platform and performance of the API you are using. Please measure them! I wish you success. Back to top Resource Compuware DevPartner Studio Professional Edition 7.0. Intel.VTune Performance Analyzer 7.0. Compuware DevPartner Profiler Community Edition. Jan Gray, Writing Faster Code: Knowing what things cost, msdn. Rico Mariani, Garbage Collector Basics and Performance Hints, MSDN. Go to the original English page