YUKON CLR Basic Architecture
Overview
With the launch of SQL Server "YUKON" Beta1 (maybe someone: SQL Server2005), people found that Microsoft integrates a lot of new features in yukon, the most eye-catching is integrated with Windows in the database engine. NET Framework Common Language Runtime Support for "CLR Public Language Running Environment). On the professional developer conference (PDC) in July 2000, Microsoft first showed this new feature to the world.
As a developer of SQL Server 2000, the application writes databases is often limited to T-SQL, and the host has a CLR's YUKON, which introduces many powerful new features, we can use Visual Basic.NET, C # or other language-oriented languages Complete the task that is difficult to complete in T-SQL. For example, before we want to call some system functions (Win32API or COM components), we must write extended stored procedures, interact with the database engine through the ODS (Open Data Services) open data service layer and database engine, and now we offer managed objects provided by CLR It can be very convenient to call Windows .NET Framework Class Library (FCL) .NET Framework class library. These managed objects include:
· Managed stored procedure
· Managed function
· Manage trigger
· Custom complex database type
· Custom complex type index
· Custom set function
First, we must understand the advantages of the database engine integration CLR and some of the key features that CLR can provide.
The CLR is a managed operating environment. The so-called "hosting" means that many tasks need to be responsible for programmers in the past, such as memory management now can now delegate CLR. As part of the hosted code, the CLR control code execution process is performed during execution of the CLR, and the CLR is based on type security (CLR playing code verification roles) and secure licenses - thoroughly compiles Native Code. The CLR can "identify" running code attempt to process memory or call non-.NET code.
The .NET's executable code is loaded as an assembly (Assembly) in the CLR. The assembly includes intermediate language code (IL instructions), describing metadata and other resources (such as: reference other files list), source code is not directly stored in the program set, but compiled as a CLR Intermediate Language (IL) can be read, and the CLR converts the code into a local code before the program execution (JIT) is converted to a local code (usually X86 code). If the code is running locally, this local code will be loaded into a period of caching with lifecycle, which is said that the code of the .NET is uninterrupted - it will eventually compile the local code when running.
The CLR has the way you manage memory: if an object or a set of objects are not referenced by other running code, since theoretically, all the memory is not used, but for CLR, the memory is only recycled when running resources. The CLR calls the garbage collector when the additional memory resources are required, which makes the programmer don't have to worry about memory leakage.
All code is running in your own application domain. In the CLR environment, the application domain's role is different application domains in a single process, used to isolate different execution code - in other words, the code in an application domain does not affect other The code running in the application domain. There is at least one application domain with CLR, of course, there may be multiple - this is determined by the hosting process and the hosted code. When an assembly is loaded by the CLR, it means loaded (instant compiling (JIT)) into a specified application domain, and this assembly can also be loaded into different application domains in the same process. You can also load a neutral assembly - in this case, a separate backup of an assembly can provide services to all users in a process. Host
The host CLR's YUKON database engine has some special needs. The database engine is very cautious about resource management - such as: Move memory between different cache by visualizing the resource variations used in the database. Therefore, some tacit relationship should be maintained in the database engine and the resource management method of the CLR.
In the version of the CLR 1.0 and 1.1, the CLR provides an API to enhance the control of the process - such as judging the size of the CLR thread pool. Of course, this API is not powerful to replace the role of CLR in YUKON, such as: Consider the following cases:
Memory management
The YUKON database engine fully controls the memory space that has been assigned, switches between different cache, such as stored procedures cache and data cache. In this case, the yukon database engine requires a more flexible way to run the CLR's garbage collector, and how much memory is affected by the conditions, allocation, or release of the CLR garbage collector.
Thread management
The CLR completes asynchronous tasks and diversified concurrent operations via thread technology. The CLR has its own hosted thread pool to plan to implement the tasks in the thread pool. However, SQL Server uses a more refined mechanism to deal with possible huge concurrency requests, so SQL Server requires the CLR's thread management.
Moreover, since there is a subtle relationship between the CLR operating mechanism and the need of SQL Server, this is aware of the need to expand the CLR's boarding interface. Therefore, as WHIDBEY (the next generation of Microsoft Visual Studio .NET) is released. NET will provide more and more powerful boarding API (YUKON released .NET version and yukon supported .NET version will be consistent)
Whidbey Characteristics of API
In Whidbey, there are 7 key parts of the boarding API to be extended:
Memory management
Now the CLR allows the host (YUKON) to replace the Windows and C run libraries allocate regular memory, so the host can control and determine how much memory is allocated or released. At the same time, the host can replace the standard memory notification mechanism with its own memory notification mechanism so that the garbage collector can be triggered. The host can also do not perform memory allocation. At this time, the CLR will warn the result of the host memory allocation failure, so that the host can take the corresponding policy.
YUKON applies to the impact of memory allocation through this policy, which controls the size of the cache and prevents potentially may cause memory paging.
2. Thread
Now, the CLR abstracts a new concept called task from the thread. The host can control the assignment and storage of the task, including the beginning, end and synchronization of the task, and the like. At the same time, the CLR and the host are notified with a significant event (such as when the task changes from the unhakable state, and vice versa, the host should also notify the CLR). The host can provide a configuration CLR thread pool tool. Such an integration level allows YUKON to continue its trajectory while allowing CLR to run 2 asynchronous tasks. 3. I / O implementation
The boarding API can now configure the host I / O synchronization for the interest of the CLR, and the CLR is notified after the I / O operation is completed. Such YUKON can fully understand all I / O operations in the process.
4. Synchronize
If the CLR gives up control of the task, you must have a means of synchronizing different tasks. Therefore, the host must provide a method to establish: critical section, mutexes, events, reader / write locks, and monitors. The host can detect deadlock operations through these control methods. Such yukon can correctly process the planning task of the CLR request to improve the reliability of the overall operation, solve the deadlock in the synchronization process.
5. Hosting and non-hosting
The host host hosted code completes the P / Invoke call, and there may be problems when converted to a local machine code, as there is currently not a method of detecting what locks discarded at the time of code. The new host API allows the task to notify the host when detached from CLR. For example: When the code runs outside the CLR, YUKON adjusts the task to the Non-Fibre Scheduled task. Also, these unmanaged code blocks are transparent to the database engine - such as the API of data access, these and other unmanaged code are different.
6. Application domain neutral code
Application domain neutral allows all application domains to share a single application domain to share a single apparent programming code code when loading an assembly (askSEMBLY). In whidbey Previous versions, CLR has 3 options: Load all assemblies application domain neutrality; do not load assembly application domain neutrality; only load a strong name assembly application domain neutrality.
It has been aware that this does not provide sufficient control granularity for the host. A host machine that has a system assembly may also need to load application domain neutrality, and all users' assemblies, whether they are strong names or weak names, can be loaded into each application domain. This allows the user's assembly to retain the assembly of the system when it is closed in an application domain.
YUKON is using all user assemblies to the database application domain, which is through this feature load process, provider assemblies and other application domain neutral system assemblies.
7. Assembly Resolution
The assembly can be customized for a host to get failed by the host. In doing so, the host can load the assembly to a managed byte array and pass this array to the program set resolution. This mechanism is not well flexible for Yukon because most of the assemblies are loaded by their customized methods. Allow normal assemblies to break the failure and deliver byte arrays, which require a large number of performance overheads. Therefore, the boarding API must allow more interactions with the program integrator.
Currently, the host API allows the host to determine if the CLR executes assembly decomposition or requires itself to ensure load, which can return the assembly to non-managed cache to prevent additional memory replication.
Yukon is not in the file system to save the user's assembly in the database, so YUKON is not using a standard assembly to identify the assembly of the user using its own mechanism. The C LR version used by Yukon is completely tightly integrated in whidbey, instead of using the latest version on the machine, for example, despite yukon to execute the previous version of the CLR-compiled code, but as long as the Whidbey is compatible with the code. Fortunately, CLR has worked hard to keep the previous version of the previous version, as much as possible, the large number of managed code running in Yukon is not need to recompile as much as possible in the Whidbey platform.
Assembly management
Let's pay attention to how the CLR runs in YUKON? We need to know how yukon manages and save assembly code. Yukon does not rely on a standard assembly control process (when a program set is loaded, the CLR takes over the process of positioning this assembly) is saved in its own database. This creates a complete assembly database backup without reference to file system assemblies that may change during database backup and recovery.
Use the create assembly command to add the program set to the database, for example:
Create assembly 
From 
This command not only loads the assembly to the database, but also other non-system assemblies that are called by the assembly. For example, there is an assembly Customer to call another assembly util:
Create Assembly Customer
From 'c: /build/customer/customer.dll'
This command saves the Customer assembly and the UTIL assembly to the database. There is no need to join the database for any Customer and Util calls because YUKON already knows these system assemblies. The system assembly list is unconfigured. For yukon, it is a difference to the user assembly and system assembly, and the system assembly is the standard assembly loaded from the file system.
The assembly identifier (in this instance is: Customer) must be unique in the database, yukon replaces the original 4-segment of the program set according to this identifier, the original assembly name is stored and YUKON guaranteed the assembly in a database It is only allowed to be saved once.
The assembly is saved in the database to view (such as Table 3-1) through the Sys.Aassemblies system view (such as Table 3-1), the sequence stream of the assembly can be viewed through the SYS.ASSEMBLY_FILES system view (such as Table 3-2)
Field Name Data Type Description Name SysName assembly (unique in one mode). The owner number of the Principal_ID INT mode. Assembly_id int assembly number (unique in the database). PERMISSION_SET TINYINT assembly CAS configuration The EXPLICITLY_REGISTERED bit assembly adds to the database or relying on another assembly to join the database. CREATE_DATE DATETIME assembly Add time. Version_major int assembly home version number. Version_build Int assembly Component's compiling number and version number Version_Revision INT assembly component revised version number. Culture_info nvarchar (30) A set of modifications (NULL representation neutral) public_key varbinary (8000) assembly public key (NULL indicates weak name) Table 3 -1 -sys.assemblies system view
Field Name Data Type Description Assembly_ID INT The number of files (unique in the database) Name NVARCHAR (260) file name file_id int file number (unique in a program) Content image file byte stream
Table 3-2 - SYS.ASSEMBLY_FILES system view
There is a problem that you need to point out that you need to access the file system to the database, so the assembly can only be loaded with the user account of Windows (you can use pure SQL Server user accounts) or SA users. This is because Windows System Access) Unable to identify SQL Server login users. Using the SA account is because Windows defaults to this account as a system administrator account.
There are many options when deleting a program set, you can only delete the specified assembly, including this assembly and the associated file or this assembly, and all the accessory sets.
Code Access Security (CAS)
Can the assembly code do what you do when an assembly is loaded into the database? One of the main purposes of the database engine integration CLR is to reduce the dependent and risk of extended stored procedures. So I'd like to try to limit the operation of YUKON in the inner assembly to be safe - this is beneficial for at least for the stability of the database.
The CLR has a security layer based on user permissions on the traditional security mechanism. What can I check this code? This layer is called code Security Access CAS (Code Access Security), and CAS is used for the original code-based license, not just who can execute it. In the past, CAS is used to allow different execution license code from different locations. However, all CLR basic code in yukon is loaded from the database because there are different CAS definitions in the original assembly code. YUKON defines three "buckets" to load different assemblies, each "bucket" has different sets of permissions, what can code you can do? The assembly code is loaded to which "bucket" is determined.
These 3 CAS "buckets" are: Safety (SAFE), External_Access, and Unsafe, detailed security information about "bucket", see Table 3-3CAS Security Operation License SAFE Allows Access Data and Use the CLR class. But you cannot access external resources (such as system files and networks), multithreading and single-threaded synchronization, non-read-only static, unsafe code, and internal operations are prohibited. The code of this bucket must be a CLR authentication type safe. External_Access is the same as the built-in SAFE bucket, but increases access to external resources such as file systems, networks, and event logs, as long as they can be accessed through CLR. Use internal operations to obtain external resources is disabled. The code in this unit must be a CLR to verify safe. UNSAFE does not limit the execution code in the CAS subsystem, and the code in this bucket does not require type security.
Table 3-3 - Safety definition of CAS bucket
There is a security issue as the code in the unsafe bucket and the expansion stored procedure. In other words, direct memory operations and locks (including deadlocks) may result in unstable process.
In yukon, the default all the assemblies are loaded into the SAFE "bucket", but this can also be changed by adding the use of perion_set clauses through the Create assembly command, such as:
Create assembly utilities
From 'c: /assemblies/utilities.dll'
WITH permission_set = external_access
Add assemblies to the database to be controlled by permissions, which is loaded into which CAS "bucket" is loaded according to the assembly. Table 3-4 lists the permission information to add an assembly to different CAS "buckets":
CAS Safety Permissions Note Safe Create Assembly; References This permission is assigned to server role DBOWNER EXTERNAL_ACCESS CREATE Assembly; References; External Access External Access Permissions must give Master Database Unsafe Control Server Control Server Assignment to Server Role Sysadmin
Table 3-4 - Join the program to different CAS buckets
Static member and application domain
The concept of application domain is mapped to this environment, which determines that the static member of the CLR type in the YUKON code in the boarding CLR has some special features that make us quickly review what is the application domain? How is it used in the CLR? All code is executed under the CLR control and is also in an application domain. The application domain is equivalent to a process, allowing the code to run in an isolation unit. Codes in different application domains must be exchanged (they all call .NET schema libraries). If you have a serious problem in an application domain (such as an unprocessed exception), only the application domain is Impact, other domains will not be affected.
Under this isolation mechanism:
1. Each assembly will be loaded into a separate application domain unless they are displayed to the application domain neutrality - which means that each application domain has a separate instant computer code copy.
2. A type of static member is proprietary to each application domain. Unless the assembly is loaded into application domain.
In the current YUKON test version, we decided to have a separate application domain. This means that if there are two codes in the same database, even if they run in different transactions, CLRs can not isolate them. Open. In other words, even if the first transaction has been submitted, the code running is still able to change the rest state of another transaction. This breaks a gold rule about the transaction - Isolation therefore, in Safe or External_Access Cas The code running in the "bucket" cannot be set to a non-read-only property, and the code running in the unsafe bucket does not limit.
So, is the problem solved? Unfortunately, this only solves the problem part. Open the unsafe "bucket" does not talk (it can do some potential hazardous operations), allow only what is the problem with the only static field? Question In the CLR type system, the read-only property makes the value type and reference types that cannot be coexist. For value types, the field is the data itself (the memory is assigned when the field is declared). This means a read-only value Once the type is established, it will remain unchanged. However, a reference type field is just a reference to the memory block assigned to the garbage collection stack. The read-only reference type field is only guaranteed that the reference remains unchanged, it points to the object All status will not be affected. Therefore, the code running in a transaction can still see another transaction, even if it is loaded in the Safe "bucket."
Therefore, once any reference type field must be static and readable after being built, that is, this state is not changed. This is the only way we guarantee that the change in a transaction is truly isolated from another concurrent transaction.
So we can join the assembly to the database, just half the entire story. We often say this is a public method of public class, but the code in a program is not automatically accessed by other assessments. We must publicly call a method and type of an assembly, so that this assembly can be referenced by other database code and client programs.
Managed stored procedure
The stored procedure is bread and butter of many database applications because the stored procedure has this advantage. Since we can integrate CLR in the database engine, we can access these CLR objects by hosted stored procedures. Objects that can be loaded by hosting stored procedures must meet some of the necessary conditions:
· The class contained must be public
· Method must be public
· The method must be static
Let's take a look at Example 3-1: The code foo.Method1 method is not accessible because the foo class is not a public class. Bar.Method2 can not be accessed, although the BAR class is a public class but its Method2 method is not public. Baz.Method3 is also accessible, although this class and method are public, but its method is not static - so the database engine does not know how to create a BAZ instance to call the Method3 method. Therefore, only public methods can be accessed (or other constructor exposed to CLR), in the Quux class, the methods and classes are common in the Quux class.
Class foo
{
Public static void method1 ()
{
}
}
Public Class Bar
{
Static void method2 ()
{
}
}
Public Class Baz
{
Public void method3 ()
{
}
}
Public Class Quux
{
Public static void method4 ()
{
}
}
Example 3-1 - Access to CLR functions
When an assembly is loaded into the database, the common methods of these public classes are not simply become hosted stored processes; the portal of each method must be exposed according to the demand display. The specific syntax is as follows:
Create Procedure 
As External Name 
Stored Procedure Name 
Create Procedure mymethod4
As External Name Utils: Quux :: Method4
Parameter passing
In practical applications, there is little stored procedure without parameters. Under normal circumstances, data that needs to be operated will pass parameters to store procedures - at least a key information including these data. So we need to have a method of delivering parameters to the managed stored procedure.
Let's take a look at the C # code of Example 3-2. In this code, we have three ways, and each method is slightly different.
· Method1 passes the value X, and the change in the parameter value in this method is opaque.
· Method2 passes the output parameter X, in the C # compiler, the value of X can only be obtained by the method of outputting the parameters, and the new value of X is transparent to the caller.
· Method2 passes the reference parameter X, and the value of this reference type parameter X is a change in the method of simultaneous parameter values for the caller to be transparent.
Public Class Params
{
Public Static Void Method1 (INT X)
{
}
Public Static Void Method2 (Out Int X)
{
X = 42;
}
Public Static Void Method3 (Ref INT X)
{
X = x 2;
}
}
Example 3-2 - Method for transmitting parameters
Below we demonstrate how to call a method during a managed store (assuming that these code has been loaded as an assembly for the identifier for parameters):
Method1 is directly called, the code is as follows:
Create Procedure Method1
@X INT
As External Name Parameters: Params :: Method1
We can call the stored procedure Method1 like this:
EXEC Method1 5
Method2 is slightly more complex, but TSQL has an OUTPUT output parameter syntax, so this is also simple.
Create Procedure Method2
@x int output
As External Name Parameters: Params :: Method2
We can call the stored procedure method2 as follows:
Declare @x int
Set @x = 0
EXEC Method2 @x Output
SELECT @X
The result of this SELECT is 42.
Finally, Method3 is more complicated than the first look, because although .NET has the concept of passing parameters like IN / OUT, TSQL does not distinguish Output and IN / OUT parameters. Useput clauses simply declare this value is returned Change, it does not have any values in the method or may not be judged. Therefore, the syntax of Method3 is the same as Method2. However, C # compiler syntax analysis will prevent these code to make the same thing. Create Procedure Method3
@x int output
As External Name Parameters: Params :: Method3
This way we can call the stored procedure method Method3:
Declare @x int
Set @x = 3
EXEC Method3 @X Output
SELECT @X
The return value of this code is 5.
It is also necessary to pay attention to a point - because Method2 and Method3, we should set the variable @X to a certain value before the hosted stored procedure calls this assembly, so that the passing parameter value is not empty. If we miss this line, it will lead to an exception nullreferenceException. This also shows that there are other factors that hide the data type of -SQL Server and .NET's data types are different. When our code migrates from an environment to another, the type conversion will occur. Our instance indicates that some of the types that can be empty in SQL Server can be empty in .NET.
return value
The hosted stored procedure can return the return value of the .NET function. However, when a hosted stored procedure is declared, you don't need to specify the type of returned (this and custom functions, the custom function is the type of declaration of the return value). Therefore, this requires declaration in .NET method
Public Class Retvals
{
Public static int getultimateanswer ()
{
Return 42;
}
}
Example 3-3 - Having a return value
We can call this in the database (assuming the assembly containing this code has been loaded into the database, the identifier is ReturnValues)
Create Procedure getultimateanswer
AS External Name ReturnValues: Retvals :: getultimateanswer
We can get the return value of the .NET function:
Declare @x int
EXEC @x = getultimateanswer
SELECT @X
The return value in this case is 42.
User Custom Function (UDF)
Custom functions, very simple, and stored procedures very similar. However, when defined, declare, and use custom functions, there are some additional elements we must pay attention.
Below is the syntax of the user-defined function calls a .NET function:
CREATE FUNCTION 
(
)
Returns 
As External Name 
Function Name 
Public Class Calc
{
Public Static Int Add (int x, int y)
{
Return X Y;
}
}
We can use the following code to call the above .NET code declaration (assuming this assembly) and identifier is: Calculator)
Create Function myadd
(
@X INT,
@Y INT
)
Returns Int
AS External Name Calculator: Calc :: [Add]
Note: Pay attention to exiting the method of calling the Add ADD because there is already a function called Add in TSQL.
This function can be called like this:
Select Dbo.myadd (5, 10)
The return value of this SELECT is 15.
Note You cannot use the Output parameter in the function.
So far, we have seen the simplest user-defined function, but here all hide the complexity of user-defined functions, and user-defined functions have many concept stored procedures.
· data access
This concept is whether the function is accessible to the internal hosting provider. If the function does not have access to the internal hosting provider, the optimizer will not initialize the inner hostess provider. If the stored procedure request data access the CLR architecture automatically initializes the internal stroke provider.
· System data access
If the function accesses user data instead of system data, the optimizer can be generated when constructing a WaitFor query. In this example, the execution function can be recovered by a notification event without the need to add additional resources as needed to system data.
· Accuracy
If a function is exactly returned, it does not use a floating point pointer algorithm and will not have a rounding error.
· Determination
One certainty function return value and the input parameter are consistent - regardless of variable factors, such as current recent and time. For example, a function returned is the addition of two numbers is certain, and a function returns the current time is uncertain.
· External access
Whether the function accesses the external resource of the database, such as a system file system or registry.
Because the concept of optimizer is very important, it is based on these concepts in the index. Functions that can be used for index must be:
1. Determine - At different point points, a particular row index should be constant, which is necessary - otherwise the physical structure of the index page is invalid.
2. Accuracy - This function cannot be determined if there is a rounding error if a function returned.
3. No data access - the input parameter of the function is not unique to determine the output, so this is not determined.
4. No external access - like data access, other factors can also affect the determinism of the function.
For TSQL functions, the database engine can determine these factors based on call internal functions (such as: newid () is a non-deterministic function that causes the call to call its function.) However, in the CLR class library is the number of internal functions. Very huge, therefore, requires the database engine or some "super verifier" to determine if the function is deterministic. Therefore, for the CLR function, you can specify the properties of the function via system.data.sql.sqlfunctionAttribute. This property can be configured according to the following table (Table 3-5).
Name Type Default Note IsDeterministic Boolean False Tag Function Is Deterministic or Non-Deterministic IsPrecise Boolean False Tag Function Is Accurate or Non Accurate DataAccess DataAccessKind None Specifies whether the function is accessing the internal hostess provider, the default value is not access, if it is read Access SystemDataAccess SystemDataAccesskind None Specifies whether the function accesses system data, the default value is not access, if it is read
Table 3-5 - SQLFunctionAttribute attribute list
The database engine can determine if the function has accessed the inner hostess provider, and verifies whether the true match is declared in the property. However, as for other properties, the database allows you to configure, so you may write a function that will be declared as a certainty, but this is not in actual. Setting the ISDeterministic properties to true (TRUE), then you can absolutely confident that the function is a determination function, otherwise the function may make the index page constitute an invalid.
Examples 3-4 and Examples 3-5 showed that the properties of the SQLFunction function were labeled using the attributes of the SQLFunction function, respectively.
[SQLFunction (isdeterministic = true)]
Public Static Int Add (int x, int y)
{
Return X Y;
}
Example 3-4 - Declaring a function with C # to determine
Public Shared Function Add (Byval X As Integer, Byval Y AS Integer) AS INTEGER
Return X Y
END FUNCTION
Example 3-4 - Declare a function to determine with VB.NET
You may notice that attributes using the SQLFunction function are optional. If this property is missing, you can use the default value. The omitted is allowed, so you can reuse a large number of existing CLR code without modifying the code.
in conclusion
In this article, we explained the basic architecture of database applications that supported inner embedded tube codes, and the representation of two managed codes - managed stored procedures and managed functions.
Through the integration of the CLR and the SQL Server engine, the YUKON database has a more powerful feature, we can develop stored procedures with the object-oriented language C #, VB.NET, Visual C , Visual J #, etc. (we even You can use PHP and assembly) so that we can develop database applications by object-oriented methods, which have a greater advantage over the previous T-SQL process, and the development environment also transforms to Microsoft Visual Studio. Net development, which provides a means of adding more convenient, fast and efficiently resolving complex business cases. At the same time, we should also pay attention to the relationship between the database engine and the CLR, the database engine dominates, controls the allocation of the memory, and responds to the client's request. The CLR provides YUKON to access the system and network resource components, so that we can easily call .NET FRAMEWORK, the CLCL (FCL) provided, while the CLR component can also access the database through the process of internal hosting providers.
In terms of security, yukon has both user-based SQL Server security mode, and CLR-based security mode. CAS provides us with a very good granularity to control the security of the CLR code, and also give DBA more means to control the CLR to "external" access.
YUKON will be listed in 2005, we introduced more new features here, and many articles in Microsoft's MSDN have been introduced. You can find that Yukon also provides a lot of very practical new features.

