Use partitions in Microsoft SQL Server 2000 data warehouse
Summary: This article describes how to use partitions to improve the manageability, query performance, and loading speed of data warehouses in SQL Server 2000 Enterprise Edition, and discuss relational databases and horizontal partitions of vector architecture in the analysis service cube. Directory Overview Using Partitions in SQL Server 2000 Relational Data Warehouse Using Partition Summary In SQL Server 2000 Analysis Services Appendix: Copy Partitions VBScript Code Example Overview This article discusses the role of data partitions in the data warehouse. Relational data warehouses and analysis service cubes support data partitions. The logical concept of the partition is the same in the two engines of the Microsoft® SQL Server®: the data is horizontally partitioned by the key (for example, the date). In the relational database, the partition is implemented by creating a separate physical table (eg, a table for each month) and defines a joint view of a member table. Similar to this, the analysis service in SQL Server Enterprise Edition supports explicit cube distribution. In the relational database and online analysis processing (OLAP) engine, the complexity of physical storage is invisible to the analysis user. Advantages of data warehouse partition:
Large shorten query time. Reduce loading time and improve the maintenanceability of the database. Solve data trim problems that appear when deleting old data from the active database. This technology needs to create a more complex data phased application than non-partition systems. This article describes the best way to design, implement and maintain a horizontal partition data warehouse. Because the effective partition plan can greatly improve query performance, we will try to partition large-scale analysis service systems. Despite the maintenance issues for certain data warehouses, partitioning is a valid solution for relational data warehouses, but usually do not recommend this. Use the partition partition view in the SQL Server 2000 relational data warehouse to join the horizontal partition data from a group of members so that the data looks like the same table. SQL Server 2000 distinguishes local partition views and distributed partition views. In the local partition view, all related tables and views reside on the same example of SQL Server. In a distributed partition view, at least one table resides on another (remote) server in the related table. It is recommended that you do not use a distributed partition view for data warehouse applications. Vector data warehouses are constructed around fact (scalar) and vector, from physical usually as star architecture and snowflake architecture, minimally include facts and vectors of complete non-orthogonal plane tables. Since the vector architecture is the most common data warehouse structure, this article focuses on partitions of such architectures. The following recommendations apply to other universal data warehouse architectures. Advantage of the partition Data Trimming Many Data Warehouse Administrators will file an old data periodically. For example, a single click stream data warehouse may only be online for three to four months. Other common rules may be online reserved for 13 months, 37 months or 10 years, when the old data is not archived when the active window is in the active window, and is removed from the database. This scrolling window structure is a practical approach to a large data warehouse. In the case where there is no partition table, the process of deleting old data from the database requires a large DELETE statement, for example: delete from fact_table where date_key <19990101 Execute this statement cost is very large, may be more than the load process of the same table more time. Conversely, for partition tables, administrators redefine the UNION ALL view to exclude the oldest table, then remove the table from the database (assuming that the backup is backed up), this process can almost complete at instant. Later we will discuss that the cost of maintaining the partition table is also high. If the data trim is the only reason for the partition, the designer should consider deleting the old data from the unscident in a data decomposition. Run a buffer each delete 1000 rows (with "SET ROWCOUNT 1000" commands on a low priority process until all desired deleted data is deleted. This technology can be effectively used on large systems, which is more direct than creating the necessary partition management systems. According to the loading amount and system usage, this technology is suitable for certain systems and should consider the reference test on the system. The fastest load speed loading data is to load the data to a hollow list or a table without an index. By loading to a smaller partition table, the efficiency of the gradient loading process will greatly improve. Maintainability Once a data warehouse phase of the support partition has been completed, the entire system will become easy to maintain. Maintenance activities (including loading data, backup, and restore table) can be performed in parallel, which can greatly improve performance. The process of gradient filling down line data flow multi-dimensional data set can be accelerated and simplified. The query speed query speed should not be partitioned for the data warehouse relationship database. The query performance is similar to the fact tables for partitioning and unparallers. In the correctly designed partition database, the relational engine includes the relevant partitions required to parse the query only in the query plan. For example, if the database is divided into a monthly partition, the query criterion is January 2000, the query plan includes only the partition in January 2000. The result query will be performed correctly to the partition table, which is substantially the same as the index merge table with the cluster index on the partition key.
The main disadvantage of the disadvantage of the partition is that the administrator needs to create an application to manage partitions. It is not appropriate to invest formally operational data warehouse in the relational database in the relational database is not properly designed, test and trial running applications. One of the purposes of this article is to discuss issues related to partition management applications and design decisions. Query design constraints To achieve the best query performance, all queries should put the conditions directly on the filter key in the fact table. A query that plays a constrained in the second sheet (e.g., a table with a date vector) will include all partitions. The factor of considering the factor in the design, constructing around the fact (scalar) and vector, from physical usually as a star architecture and snowflake architecture, minimally containing a fully non-orthogonal plane table containing facts and vectors. In a typical case, the administrator of the vector data warehouse is only partitioning the fact table; there is little benefit to the vector table. In some cases, it will be partitioned for a large vector table containing more than 10 million members. You can also partition the non-vector relational data warehouse, and the general views in this article still apply. A effective partition plan can only be developed with the system architecture and design goals. Even with the same architectural design, only the relational data warehouse for filling the service to analyze the cube can adopt a partition structure that is different from the analyst's direct query. The system with scrolling windows must be partitioned by time, and other systems are not necessarily. If the data warehouse includes analyzing service cubes, Microsoft recommends that partitions in the Data Warehouse and Analysis Services database should be parallel. Maintenance applications are simplified: Applications create a new cube distribution while creating a new table in the relational database. Administrators only need to master a partition strategy. However, an application may also have a sufficient reason to partition two databases in different ways, uniquely reduced will be the complexity of the database maintenance application. Partition Design Overview The partition table in the SQL Server database can use the updatable or available partitioned partitions. In both cases, the table partition is created by the Check constraint containing the correct data of each partition. An updatable partition view supports the view for an Insert (or Update or Delete) operation and push the operation to the correct base table. This is a good thing, but the data warehouse application usually needs to be loaded, which is unable to perform through the view. The following table summarizes the requirements, advantages and disadvantages of updatable and query partition view. Require advantages to disadvantages can be updated partition view
CHECK Constraint Forced Partition Key Primary Key Subregard Key No Other Database Limited Partition Key Components Union ALL Decreasing on the Meet
Query performance: The query plan includes only the membership table required to parse the relevant query. Maintaining the simplicity of the application: Data can be loaded to the Union ALL view, then insert into the appropriate member table
Loading performance: The speed of loading data is too slow, so that this approach is un practical for most data warehouse applications. Flexibility: Database design may require additional constraints to the partition keys. Squiry partition view
CHECK Constraint Forced Partition Key Defined with Union All views on a member table
Query performance: The query plan only includes the parsed items necessary for the query. Load performance: Load data directly to a member table efficiently. Store: Although the recommendation of the primary key and creates an index on the primary key, the partition view does not require primary key indexes.
The view can have up to 256 member tables. You must create a maintenance application to manage partitions and load. Microsoft recommends the practice is to define primary keys and design the fact table as partitioned joint views of the local (single server). In most cases, this definition produces an updated partition view, but the data warehouse maintenance application should be designed to load most data batch to a member table (not through the view). Syntax Example The following code example is used to illustrate the syntax defined as a member table and a combination view, and insert the data into the view. Create Table [DBO]. [Sales_Fact_19990101] ([Date_Key] [int] not null check Date_key] Between 19990101), [Product_Key] [INT] NOT NULL, [CUSTOMER_KEY] [INT] NOT NULL, [PROMOTION_KEY] [INT] NOT NULL, [Store_Key] [INT] NOT NULL, [STORE_SALES] [MONEY] NULL, [STORE_COST] [Money] null, [unit_sales] [float] null) alter table [sales_fact_19990101] add primary key ([Date_Key], [Product_Key], [PROMOTION_KEY], [STORE_KEY], [STORE_KEY]); - Create a 2000 fact table Create Table [DBO]. [Sales_FACT_20000101] ([Date_key] [INT] NOT NULL CHECK ([Date_key] Between 20000101 and 20001231), [Product_key] [INT] NOT NULL, [Customer_Key] [INT] NOT NULL, [PROMOTION_KEY] [INT] NOT NULL, [STORE_KEY] [INT] NOT NULL, [STORE_SALES] [Money] null, [store_cost] [Money] null, [unit_sales] [float] null) alter table [sales_fact_20000101] Add Primary Key ([Date_Key], [Product_Key], [Promotomer_Key], [PROMOTION_KEY], [Store_Key]); - Create a Union ALL view.
Create View [DBO]. [Sales_FACT]. [Sales_FACT_19990101] Union All Select * from [DBO]. [Sales_FACT_20000101] - Insert a few lines of data, for example: Insert Into [Sales_Fact] Values (19990125 , 347, 8901, 0, 13, 5.3100, 1.8585, 3.0) Insert Into [Sales_FACT] Values (19990324, 576, 7203, 0, 13, 2.1000, 0.9450, 3.0) Insert Into [Sales_Fact] Values (199990604, 139, 7203 , 0, 13, 5.3700, 2.2017, 3.0) Insert Into [Sales_FACT] Values (20000914, 396, 8814, 0, 13, 6.4800, 2.0736, 2.0) INSERT INTO [Sales_Fact] Values (20001113, 260, 8269, 0, 13 5.5200, 2.4840, 3.0) To verify that the partition is working properly, use the query analyzer to display the query plan, for example: select top 2 * from sales_fact where date_key = 19990324 You should see that only the query plan includes only Table 1999. Comparing the query plan to the same table generated by the primary key has been deleted, we will find that Table 2000 is still excluded. These plans are compared with the query plan generated on the architecture that has been deleted Date_Key. In the case where these constraints are deleted, Table 1999 and Table 2000 are included in the query. Note that in general, when performing a query on a large table, use the "Top N" syntax is a good practice because it can quickly return the results and use the least server resources. This is especially important when viewing the query plan of the partition table, because the query plan generated by the "Select *" statement is difficult to analyze. For those who occasionally observe, although only related tables are used during the query, the surface looks like a query plan includes all component tables for the UNION ALL view. Applying conditions directly to the fact table To get the best query performance, all queries should put the conditions directly on the filter key in the fact table. A query that plays a constrained in the second table (for example, a date vector table) will include all partitions. The standard asterisk joint query works for the Union All fact table is good: set the condition on an attribute of any unasshed vector table, create an asterisk in a standard manner WHERE clause. Includes properties of partition vector (date). Design queries on the partition vector architecture are exactly the same as in the unsatisfactory architecture, but only the date criteria is directly on the date key in the fact table. If the first list of each partition table is a cluster index with a date, it is relatively small to transfer to all partitions to parse a particular query. When writing a predefined query, it should be improved as much as possible, such as queries that generate standard reports or gradient update downlink data stream databases. The selection fact table of the partition key can be partitioned on multiple vectors, but most people may be partitioned by date. As described above, the date partition can support the simplicity "Scrolling Window" management, and the older partition can even be saved in different locations, or reducing the number of indexes. Similarly, most queries to the data warehouse are filtered on a date. For applications that are partitioned by date, the decision variable is:
How many data keep online? The main basis for this decision is business requirements, while considering maintaining a large amount of data online. How to design date keys? The data warehouse is best to use the agency key for vector meters and facts, which is widely recognized. For facts on the date partition, the proposed approach is the "smart" integer proxy button in the form of YYYYMMDD. As an integer, the key is only used with 4 bytes compared to the 8-byte datetime. Many data warehouses use the natural date of the DateTime type. How to determine the size of the partition? Although the above examples use annual partitions, most systems will be more meticulous, such as months, weeks or days. Although we will notice that the user query is usually carried out on a month or week, but the most important factor is the overall size and manageability of the system. You may still remember that any SQL query can reference up to 256 sheets. For data warehouses that maintain more than one month, the Union ALL view of the partition is exceeded by this boundary. As a good rule, if the fact table is only partitioned by date, it is best to partition according to the week. How to define the range of partition? Between grammar is most direct, most readable, and performs the highest execution efficiency. The following form is as an example: Date_Key <19990101 Date_key Between 1990101 and 19990131 Date_Key Between 19990201 and 19990229 ... Date_Key Between 19991201 and 19991231 Date_Key> 19991231 Please pay attention to the first and last partition: even if you think There is no data into these partitions, which is still a good way to define partitions, which can overwrite all possible date values. At the same time, please note that although not a leap year in 1999, February 29 will remain on February 29. This structure does not need to judge whether it is a leap year when the structure creates a partition and constraint. As time passes, is it partitioned? In order to minimize the number of active partitions, when creating partition applications, database administrators can choose to combine the daily district into a weekly partition or month partition. We will discuss this approach in detail below the section of the fill and maintenance partition. About how to discuss the detailed discussion of date partitions equally applicable to the use of other possible partition keys. Data loading: If the new data has a significant tendency to align with other vectors, or for example, if each storage or attachment is distributed by different systems, these are natural partition keys. Cube data query: Although there is no technical reason to partition the relational database and analysis service cubes in the same way, this is commonly used. If this assumption is made, the maintenance application will be simplified. Thus, even if the existence of the relational database is only used to fill the analysis service cube, the general query mode should also be taken into account when the partition key is selected. The rules of the status table of naming agreements should be naturally introduced from partition design. In order to get the biggest versatility, use the complete partition start date in the title: even if the partition is performed once a year, [sales_fact_yyyyyyyymmmdd] is better than [Sales_FACT_YYYYYY ". If the database supports multiple sizes, naming conventions should reflect the time range of each partition. For example, using Sales_FACT_20001101M with Sales_FACT_20001101M, the daily district uses Sales_FACT_20001101D. The name of the member table is hidden on the end user of the view access data, so the name of the member table should be the application that is maintained. Downlink Data Cultivation Data Set If the relational database is only used to support the analysis service cube, you do not have to define the Union All view.
In this case, the application will not be limited by 256 tables, but it is recommended that you do not partition the relational data warehouse with this way of defining a UNION ALL view. The management partition fact table should not be officially put into use before the partition management is automatic and passing the test. The partition management system is a simple application that is generally discussed below. The following discussion assumes that the partition is done by date. The metadata stable partition management system should be driven by the metadata. Just make sure you can program access metadata, metadata can be stored in any location. Most data warehouse systems use custom metadata tables defined on data warehouse SQL Server or Microsoft SQL Server Meta Data Services. Regardless of the storage mechanism of the metadata, the content of metadata must include the following information of each partition: Partition Name Creating a Date Subregion of Data Data Data Data Date Start Online Date (Joining Union All view) Partitions no longer online Date (discarded from the view) Date of discarding the partition as part of the data warehouse's entire management system, when you track and how much data is loaded into each partition. The primary task of creating a new partition partition management system is to create a new partition. The periodic running task should be scheduled to create a new table used as the next partition. There are many effective ways to perform this task. The recommended method is to create a new table with the existing partition with the existing partition, but new table name, index name, partition key constraint definition, file group, etc. :
Get a template table definition (usually the latest partition); modify the Name property of the table and the index, check the constraint Text property and other properties; instantiate the table using the Add method. Use a smart naming agreement to complete this task with a few lines of code. As will be discussed later, your application can analyze the multidimensional data set of the data warehouse system. If so, create a partition in RDBMS and programs can continue to create the appropriate cube distribution using Decision Support Objects (DSO). The front of the fill part is mentioned, and the data can be loaded into the Union ALL view. In theory, this is a large function of the table partition structure, but it is not recommended to use it for data warehouse applications in practice. The data warehouse cannot be loaded into the Union ALL view; for the data warehouse that must be partitioned to the table, the load process will be too slow. Instead, the design of the data warehouse application must make each cycle to quickly load data to the corresponding target table. If the Data Phase Application is implemented in SQL Server Data Conversion Service (DTS), dynamic attribute tasks can easily change the name of the target table of data pump tasks or batch insertions. As long as the new partition does not join the UNION ALL view, you do not need to load data in system downtime. Data warehouse stages applications should be designed to handle new data that is not current partition. This special case may occur if the data warehouse load process is not completed at night. Other systems have to process continuous old data. The design of the system must take into account these exceptional conditions, frequency, and data. If the old data arrives at sufficiently low, the simplest design is to load all the data that is not current partition using the updateable UNION ALL view. Defining a UNION ALL view Once the gradient loading is successfully completed, the Union All view must be revised. Still recommends using SQL-DMO to complete this task: use the Alter method to change the text property of the View object. The list of partitions to be included in the definition in the metadata table described above is the best way. The combined partition surface appears to merge several partitions to a single larger partition. However, for daily loads, the product warehouse is small simultaneously, and the following measures can be remarkably improved.
Create a text file with data to be loaded, order in the order of cluster index. Batch loaded into the empty daily district. Create all non-clustered indexes. Keep the new partition online by recreating the UNION ALL view. Create an index and regenerate the UION ALL view by insertion, re-generate the UION ALL view, create and populate the new surfactant weekly weekly. Then you can discard the daily district. The data has become old and then moved to weekly or even month partitions, so that more partitions can be kept online in the UNION ALL view. In SQL Server 2000 analysis services explicitly support partition cubes in the analysis service in partition SQL Server Enterprise Edition, this partition cube is equivalent to the partition table in the relational database. For medium or large cubes, the partition can greatly improve query performance, loading performance, and make cube maintenance easier. The partition can be designed by one or more vectors, but the cube is usually only simplified by date. The gradient loading of the partition cube (including creating a new partition) should be performed by a custom application. Note: Partitions can be stored locally or distributed on multiple physical servers. Although the distributed partition across multiple servers may be beneficial to very large systems, our test shows that the distributed partition solution provides the greatest benefits when the multi-dimensional data set is tens of trillion bytes. This article only considers local partition cubes. The gradient loading of the partition cube (including creating a new partition) should be performed by a custom application. Advantages of the partitioning query performance will greatly improve query performance to multi-dimensional data sets. Even medium-sized cubes (data based on 100 GB-based databases) will also benefit from partitions. The advantages of the multidimensional data set distribution are more significant in the case of multi-user loading. Each application query performance varies with the multidimensional data set structure, the use method, and the partition design. If you only need a monthly data on a monthly division, then the query is only access to a partition. Under normal circumstances, a large multi-dimensional data set in a single partition is given, and a well-designed local partition strategy is used. We expect query performance to improve 100% to 1000%. Tia old data For relational data warehouses, the analysis service system administrator may choose to keep the latest data only in the cube. If it is a single partition, the only way to clear the old data is to resize the cube. By the date vector partition, the administrator can discard the old partition without shutting down the system. Maintenance From management point of view, the partition is data unit that can be independently added and discarded without affecting other partitions. This helps to manage the life cycle of data in the system. Each cube is stored separately in a set of files. Since the partition file is relatively small, the operation of the backup and restore these data files is easier to manage. This is especially obvious for the partitioned files below 2 GB. In this case, the archive and restore utility will also be effective. If a portion of the cube is corrupted, or discovered that this part contains incorrect or inconsistent data, it can be reproted only the partition, which is more rapid than handling the entire cube. In addition, in order to save space, it is also possible to change and consolidate the storage mode and design of the old partition. Different partitions can use different data sources. A single cube can combine data from multiple relational databases. For example, when establishing a corporate data warehouse, data from Europe and North America can reside on different servers. If the multi-dimensional data set is partitioned, the logical cube can merge these completely isolated data sources. The relationship architecture on multiple source servers defined by a single cube must be almost identical, so it will work properly. The loading performance can be loaded in parallel to load multiple partitions, so the loading speed of the partition cube can be faster than the unassigned cube. Later, we will discuss, to deal with partition parallel, you must get a third-party tool or create a simple custom tool. In multiprocessor computers, performance improvement is obvious. Parallel processing tools should increase CPU utilization to 90%. Typically, each two processors can process one to two partitions to achieve such high performance.
For example, both all processors are used to handle the four-way robots of the cube, you may want to handle two to four partitions simultaneously. If the number of partitions attempts to process is more than the number of processors you have, performance will be significantly reduced. Each two processor processes a partition is a relatively conservative; the ideal number depends on the speed, aggregation design, storage, and other factors from the data stream from the source database. In some cases, recreate partitions more efficient than gradient processing partition. Of course, if the entire cube is kept in a single partition, this situation occurs very small. The main disadvantage of the disadvantage of the partition is that the administrator needs to create an application to manage partitions. It is not appropriate to put the partition cube in the formal operation of the partition cube before managing the partitioning, test and trial running applications. One of the purposes of this article is to discuss issues related to partition management applications and design decisions. Metadata operation With the increment of the number of partitions, metadata operations (such as browsing cube definitions) will fall. For administrators (rather than end users), the management partition cube is a burden, and the multidish data set that manages an excessive partition will be a problem. Factors to consider when designing, an effective query plan must weigh multiple factors:
Number of partitions: Analysis services have no practical restrictions on the number of multidimensional data concentrated partitions, but managing a multi-dimensional data set with thousands of partitions will be very challenging. In addition, when the partition reaches a certain number, the overhead of the plurality of partition merge result sets will exceed the query performance generated by the partition selection. Since it depends on cube design, query mode, and hardware, it is difficult to give clear rules to determine this quantity. However, if each GB cube is stored or divided into a partition per gb, it is safe. In other words, 100 Gb cubes (or 1 billion facts) on appropriate hardware should be easily supported by 100 partitions. If the partition design requires more partitions than the above quantity, we should test the performance of other partition plans. Loading and maintenance: Data may flow into the cube along a vector (for example, time). To support phased applications to populate and gradient update cubes, these vector can be a natural partition piece. For example, the date vector is usually the first partition vector. Other applications may receive data from segments such as geographic regions, customer groups. Because different partitions can use different data sources, cube fill applications can efficiently load data from distributed data warehouses or other source systems. Query performance: Design effective partitions need to understand common modes for user queries. The ideal partition vector should have a good selectivity to the most detailed user query. For example, since many queries are concentrated in the most recent period, they can usually change query performance by date partition. Similarly, there may be many users to query by geography or organization. To maximize query performance, you may want to query the partitions involving as few as possible. If you use static, or if the icon dates are easily predicted, the partition is easier to manage. For example, the partitions conducted in "States" are relatively static, and the application designer can expect to receive a large number of warnings from the fifty-first state. Instead, since the new product may be added more frequently, the partitions divided by the product vector may change over time. Designers can still use dynamic vectors to partition, but it should be recognized that the management system will be more complicated. If a vector is marked as "change", it is not allowed to be partitioned in this vector. In any case, it is wise to create a "other" partition to accommodate unknown vector members. Particatoids and screening are the same as relational partitions, must be defined by administrators to define data to be included in each partition for the analysis service partition. RDBMS performs this feature using Check Constraint; analysis service uses partition pieces to perform this feature. In a vector, the partition piece is set to a single member, such as [dates]. [1999] or [dates]. [1999]. [Q1]. In the Analysis Manager wizard, the partition piece is set in the title "Select The Data Slice (Optional)" screen. In DSO, you can use the SliceValue property in the partition vector level object to access and set partition films. There are syntax examples thereafter in this article. The definition of each partition also includes information that flows into the source data stream of the partition. The information required for partition metadata storage packages. Administrators can use the Partition Wizard to set up data sources and factories, or use DSO programming settings. During the processing partition, the setting of the SLICEVALUE attribute will automatically be a filtering of the data source. Partition definitions may include an optional additional filter, which is a SourceTableFilter property that can be used to refine the query that populates the partition. During the processing partition, the WHERE clause from the query sent to the source data will include the default condition based on the partition chip definition and any additional filing defined by the SourceTableFilter property. To make the partition, partition, and screening must be correctly defined in order. The role of the partition piece is to improve query performance. The analysis service engine passes the information in partitioning card definitions, making the query only in partitions containing related data.
On the partition cube without defining partition pieces, the query will accurately resolve, but because all partitions must be checked, performance is not optimized. The role of screening and source metadata is to define data that flow into partition. These elements must be correctly defined, otherwise the entire cube will contain incorrect data. When processing the partition, the partition service limits the data stored in the cube to match the partition sheet. However, no check is performed to ensure that data is not loaded into other partitions. For example, it is assumed that the multi-dimensional data set is partitioned in the year. You are incorrectly set to [Dates]. [Year]. [1997], but set the filter constraint to 1998. During processing, the partition will contain zero lines: this is not what you want. Conversely, if you have a 1998 partition, add a new partition in December 1998, which is likely to load data twice in December 1998, and service analysis will not prompt you. Case. It is not difficult to align partition and screening alignment, but the designer of the partition management system must realize this problem. Advanced Particatoids and screening Most partition strategies are set a vector level as a partition, putting data of each member of the vector into their respective partitions. For example, "according to the year partition" or "according to the state partition". The partition plan for extracting a cube is also common. For example, newer data may be partitioned on a date or a week, but older data is on a month or year partition. Depending on how different use methods and data bases may require a more complex partition plan. For example, suppose 80% of our customers live in California, 10% live in Oregon, and the remaining 10% of the other regions of the country. In addition, most analytics focus on customers in California. In this case, the administrator may want to create a county-level partition to Calif, to create a state partition to Oregon, create a partition for all other regions. The partition piece may be similar to:
California Counties: [CA]. [Amador] ... [all usa]. [Ca]. [Yolo] Oregon State: [all usa]. [Or] rest of the country: [all USA] As discussed earlier, you must correctly define the source data screening to ensure that these partitions are properly populated. Please note that if a query needs to merge the data of California and Oregon, then it may also need to view the "REST OF THE Country" partition. Although the service analysis is not large if there is a lot of information, if there is a lot of related data, if the cube is unified, then further decompose CA (California), and the inquiry performance will be better. The application logic for maintaining uneven partitions is also more complicated, and this partition method is generally not recommended. However, if appropriately considering the design of the application, understand the search performance, this technology can be used to solve certain design problems. Due to the partition in RDBMS due to the first half of this article, the reader naturally ask if the service partition must be aligned with the relationship partition. These two partition strategies don't need to be identical, but if the partition is similar, the partition management application is easier to design, creation, and understanding. A common strategy is to perform the same partition in the two systems, in addition, selectively define partition pieces in the second or even third vector in the multi-dimensional data set. The easiest strategy is to use the Union All view as a de facto table for all cube partitions. If the multi-dimensional data set partition is aligned, each cubic data set partition can bypass the UNION ALL view directly to its associated relationship partition. In this configuration, the multi-dimensional data set processing query extracted from the relational database will run the fastest. This cost of performance improvement is to maintain the application needs to ensure that the source table is correctly associated with each partition. If the relational database is only used to fill the analysis service cube, the system administrator can select the Union ALL view for other query. The index of the relational table can be appropriately designed, optimize the single query of the data to the cube. In this case, the role of the relational database is more like a phased area instead of a complete data warehouse. Storage mode and aggregation plan each partition can have its own storage and aggregation plan. The data that is not frequently accessed can be mild polymeric, or as ROLAP or HOLAP instead of MOLAP storage. Since changing these parameters requires reprocessing partitions, this function is not widely used in time, which is not widely used by time gradient. In most cases, the overhead of processing time and system complexity seems to make the saving of the minimum multi-dimensional data set almost no necessary. Instead, partitions divided along other vectors may have different aggregation plans. Optimized Wizard Based on the usage can be a way for each partition design. System administrators should make the optimized wizard in the most recent partition, and the aggregate design is as new as the aggregate design of each new partition set on the nearest partition. Managing partition cube developers can create a management system for relational partitions using different tools. SQL-DMO is highly recommended, but the use of stored procedures, extended stored procedures, and even resolving the PERL scripts that contain table definitions have also generated a valid system. Instead, the multi-dimensional data set partition maintenance must use DSO. For developers with traditional database background, the idea of instantiating database objects using an object model seems to be a bit bad. Developers can develop modules using DMO and DSO using familiar scripting programming languages, such as Microsoft? Visual Basic? Scripting Edition (VBScript), Microsoft® JScript?, Perl, or a development environment similar to Visual Basic (VB) or C .
These modules can be performed from the operating system or SQL-Agent, or call from the DTS package. Even if developers have never used object models, they cannot be given away from the use partitions because they are required to create a management system with DSO. This article will provide a VBScript example to illustrate how to copy partitions using the script. If the relational data warehouse uses partitions, the cube partition management system should be designed as part of the relational database partition management system. The multi-dimensional data set management system must have the following functions: Creating the necessary new partitions, typically done in accordance with the date vectors. Load the data into the partition. Discard the old partition (optional). Merge partition (optional). Creating a new partition partition management system To create a new date partition in the relational database, it should create all the necessary cube distributions corresponding to the date. Since the new vectors may be added along one of the restricts, the gradient update cube vector is better to create a new partition. The simplest case is that the cube is only partitioned by date. The partition management system only creates a new partition according to the appropriate time period (day, week, month, etc.). In addition to the date partition, if you partition the cube in a different vector, the partition management system will add a number of partitions once. For example, as an example of a multi-dimensional data set in accordance with the US states in the United States. 50 new state partitions will be created every month. In this case, you can create this month's partitions by copying the necessary properties (eg, partitioned and source tablets) and the partition definitions in the multi-dimensional data set. However, it is assumed to have a multi-dimensional data set on a monthly and brand partition. Brand Posage or province is much easier to change; it is very possible to add a new brand to the product line during the multi-dimensional data set. Maintenance applications must ensure that a partition is created to accommodate data for new brands. The proposed approach is:
Process the vector before creating a new partition. Copy the existing partition to ensure the continuity of the storage mode and the aggregation plan. Search new members in the processed vector, create a partition for all new members of the partition level. The system must specify the default storage mode and aggregation plan. The partition management system must be carefully designed to ensure the definition of partition pieces and screening is aligned, and it remains accurate after a period of time. If the relational database is partitioned, the partition management system should update the multi-dimensional data set partition definition, keep synchronization with the source data as determined by the partition. The multi-dimensional data set partition does not need to be reproced, but it should change its definition when necessary to reprocess in the future. Data Integrity ensures that data is processed into one and only one partition is a task of design and partition management systems. Analysis services do not check if all the rows come from an instantiation in the cube, nor does it verify that a line is only loaded into a partition. If you do not deliver one thing to two partitions, the analysis service will see them as different facts. All aggregates will be repeated to the data and the query will return incorrect results. The processing partition processing partition is basically the same as the processing cube. For processing tasks, the natural work unit is a partition. The Analysis Manager Processing Wizard provides the following three modes for processing a cube or partition:
"Gradient update" adds new data to existing cubes or partitions, updates, and adds aggregation that is affected by the new data. "Refresh Data" discards all data and aggregates in the cube or partition, and regenerates data in a cube or partition. "All Processes" completely recreate the structure of the cube or partition, and then refresh the data and aggregation. Gradient processing requires administrators to define filter conditions on the source query to identify new data sets of cubes. Usually the screening is based on the date (stored in the event date or processing date). The DTS cube processing task provides the exact same functionality. Most systems use the DTS cube processing task to schedule multidimensional data sets. The multi-dimensional cube via gravity uses dynamic attribute tasks to change the source screening. Although the campaign update is more than the code required for refresh data, custom code in DSO also provides the same functionality. When designing a partition management system, pay special attention to the graded cubes that are being processed or the partition that has been processed in the past is required. Do not use gradient processing on an unprocessed cube or partition. The multi-dimensional data sets only on the date partition have the requirements of direct loading management. Typically, each load cycle has a single partition to be updated; the only decision point is whether it is to gradually update or refresh data. Most date vector cubes can be managed from a simple DTS package. Press Multi-dimensional data sets with multiple vector partitions and benefits: Challenge: There are a large number of partitions to handle challenges: the number of partitions may change: can load partition benefits: The performance of selective queries can be greatly improved. Most applications on multiple vector are designed to load the partition in parallel. The parallel load system can initiate multiple simultaneous running DTS packages, and their parameters have been updated with dynamic attribute tasks. Although it is feasible, this structure is inconvenient for use. Instead, many systems choose to update the partition with native DSO code. An example tool for parallel processing partitions can be obtained. Merge partitions For cubes along the date partition, the number of partitions grows over time. As mentioned earlier, the number of partitions increases to a certain extent, theoretically there is a point where a query performance begins to reduce. We tested the development project including more than 500 partitions, but did not reach this limit. Due to the other disadvantages of too many partitions, such as the source data operation, etc., will bring more difficulties to the management database, and the system administrator often does not endure it before reaching the limit. Analyze services to combine partitioning features through DSO and analysis manager. When combining two partitions, data of a partition will be merged into another partition. Two partitions must have the same storage mode and aggregation plan. After the merge is completed, the first partition is discarded, and the second partition contains the merged data. The merge processing only occurs on cube data; the data source is not accessed during the merge process. The combined process of the two partitions is high. If the system design includes a merged partition, the merge process should be programmed, not by analyzing the manager. The combined partition is simple, just like a few lines of code as other DSO operations. The partition merge system must be responsible for verifying the final merged partition contains exact metadata information for source filtering to ensure that the partition can be rescued when necessary. The partition merge process correctly changes the partitioning definition, also merges the filter definition as much as possible. However, the merger process does not require two partitions from the same table or data source, and two partitions that cannot be re-filled are possible. The second question is to consider: Like all partitions, the merged partitions cannot be renamed. These issues can be avoided by using the following good system design methods:
Use a clear naming agreement. Follow the consistent partition merge plan. Be careful when matching partition cubes and relational partitions, or does not partition the relational data warehouse. For example, consider the "Sales" cubes in the weekly partition data. This week is partitioned on day and then merged on this weekend. Name the partition Sales_YYYYMMDD, where the date in the name is the first day of the data in the partition. In November 2000, we will have Sales_20001105, Sales_20001112, Sales_20001119, and Sales_2000126 Siy. Next week, we created and handled Sales_20001203, Sales_20001204, etc. through Sales_20001209. During Sunday processing window (at that time), we can merge from 20001204 to 20001209 to Sales_20001203, leaving only the next circumference. Alternatively, you can consolidate other partitions in a space partition that has newly created a name you want, thereby renamed a partition. Discard the old partition Delete the old data in the cube in the daily district, as simple as discarded the oldest partition (set). Similar to the other operations we have discussed, this process should be managed by programming, not individuals in the analysis manager. If you understand this, you will be happy to write and test this module for a few hours. Summary Recommendations Using local partitions in large-scale analysis service cubes (factors containing 1 billion rows). The query performance of the analysis service database can be improved by partitioning. Maintaining the partitioned cube is easy, especially in the case where the old data is discarded from the cube. However, partitioning of cubes requires an application that manages these partitions. In concept, partitions in relational data warehouses are similar to partitioning in the analysis service. Like analysis services, you must create an application to manage relational partitions. It is not mandatory about the viewpoint of partitioning in relational data warehouses. Partition solves some maintenance issues, such as trimming old data, but this is based on system complexity. The query performance is not improved compared to a single table that establishes a good index. Analytical services and SQL Server relational databases support distributed partitions, that is, the partition resides on a different server. Regarding the problem of analyzing the distributed partitions in the service will be discussed in another article. We do not recommend division distribution relationship partitions for SQL Server 2000 data warehouse systems that support specific queries. Use a large number of partitions to improve the query performance of the partition cube. Developers in large cubes should consider dividing multiple vectors to improve the selectivity of user queries as much as possible, while providing parallel processing opportunities to improve processing performance. It is recommended to partition large analysis service systems. Although the partitioning of the relational data warehouse is a valid and well-performance solution for some specific warehouse maintenance issues, you usually do not recommend this. MORE INFORMATION Microsoft SQL Server Books Online contains information about the index view. For more information, see the following resources.
Microsoft SQL Server Web Site http://www.microsoft.com/sql/ (English). Microsoft SQL Server Developer Center http://msdn.microsoft.com/sqlserver (English). SQL Server magazine http://www.sqlmag.com/ (English). News: //news.microsoft.com/ Microsoft.public.sqlser.Server and Microsoft.Public.sqlserver.DataWarehouse News Group (English). Microsoft Official Curriculum Course on SQL Server. For the latest courses, see http://www.microsoft.com/trainingandservices. Appendix: Copy the VBScript code example of the partition '/ ************************************************** ************************************* 'file: clonepart.vbs'' Description: According to the latest partition in the Foodmart 2000 Sales Cube, 'This script example creates a new partition in this cube. 'The purpose of this script is to display the type of DSO call for copying a partition. 'The resulting partition will be processed, but the cube' does not add any data 'script users to delete the generated partitions after running scripts and viewing results. '' Parameter: no '*************************************************** ********************************* / CALL ClonePart Sub Clonepart () On Error Resume Next Dim INTDIMCOUNTER, Interrnumber Dim Strolapdb, Strcube, StRDB, StranalysisServer, StrPartitionnew DIM DSOSERVER, DSODB, DSOCUBE, DSOPArtition, DSOPArtitionnew 'initialization server, database, and cube name variables. StranAalogSSServer = "localhost" strolapdb = "foodmart 2000" strcube = "sales" 'VBScript does not support direct use enumerations. 'However, constants can be defined to replace enumeration. Const Statefailed = 2 const OlapedUnlimited = 0 'Connect to the analysis server. Set dsoServer = CreateObject ("DSO.Server") DSOSERVER.CONNECT StranAalog'RVER 'If the connection fails, the enumeration is ended.
If DSOSOERVER.STATE = StateFailed The msgbox "Error-Not Able To Connect To '" & Stranalysis Server. "," ClonePart.VBS "Err.clear Exit Sub End IF' Some partition management features only The 'Enterprise and Developer release in the analysis service can be used. If dsoServer.Edition <> olapEditionUnlimited Then MsgBox "Error-This feature requires Enterprise or" & _ "Developer Edition of SQL Server to" & _ "manage partitions.",, "ClonePart.vbs" Exit Sub End If 'OK database There is a valid data source. Set dsoDB = dsoServer.mdstores (strolapdb) if dsodb.datasources.count = 0 THEN MSGBOX "Error-No Data Sources found in '" & _ strolapdb & "' database.", "Clonepart.vbs" err.clear exit sub Endiff 'looks for cubes. IF (DSODB.MDSTORES.FIND (STRCUBE)) = 0 THEN MSGBOX "Error-Cube '" & Strcube & "' is missing.", _ "Clonepart.vbs" Err.clear exit sub) set DSOCUBE variable To the desired cube. Set dsocube = dsodb.mdstores (strcube) 'Find partition if dsocube.mdstores.count = 0 Then MsgBox "Error-no partitions exist for cube'" & strcube & _ "'," clonepart.vbs "err.clear EXIT SUB END IF 'Sets the DSOPArtition variable to the desired partition. SET DSOPArtition = dsocube.mdstores (dsocube.mdstores.count) msgbox "new partition will be based on existing partition:" _ & chr (13) & chr (10) & _ dsodb.name & "& dsocube.name &" "." & _ dsopartition.name, "clonepart.vbs" 'Get reference characters from the data source because' Different data sources use different reference characters.
DiM Slquote = dsopartition.datasource (1) .openquotechar srquote = dsopartition.datasource (1) .closequotechar '*********************************************************************************************************************************************************************************************************************************************** ************************************ Created according to the desired partition New partition. '********************************************************** *************** Create a new temporary partition. StrPartitionnew = "newPartition" & dsocube.mdstores.count set dsopartitionnew = dsocube.mdstores.addnew ("~ temp") 'From the desired partition replication attribute to' new partition. Dsopartition.clone dsopartitionnew 'Change the "~ TEMP" partition name to' the name you want to use for the new partition. DSOPArtitionnew.name = strpartitionnew dsopartitionnew.aggregationprefix = strpartitionnew & "_" set new partition fact table. DSOPARTITIONNEW.SOURCETABLE = _ sl Qote & "Sales_FACT_DEC_1998" & SRQUOTE 'Sets the version of the new partition and JOINCLAUSE' properties. dsoPartitionNew.FromClause = Replace (dsoPartition.FromClause, _ dsoPartition.SourceTable, dsoPartitionNew.SourceTable) dsoPartitionNew.JoinClause = Replace (dsoPartition.JoinClause, _ dsoPartition.SourceTable, dsoPartitionNew.SourceTable) 'will be affected by the level of the properties and dimensions SliceValue' Change to the value you want to change the 'data slice definition used by the new partition. DSOPArtitionnew.dimensions ("time"). Levels ("year"). SLICEVALUE = "1998" DSOPArtitionnew.dimensions ("time"). levels ("quarter"). SLICEVALUE = "Q4" DSOPArtitionnew.dimensions ("Time") .LiceValue = "12" 'Estimate the number of lines. DSOPArtitionnew.estimatedRows = 18325 'Add another screening. SourceTableFilter offers additional 'opportunities to add the WHERE clause to the SQL query you want to pop. 'We use this filtering program to ensure that no' any data line is not included in the new partition. Based on this sample code, we don't want to change the data in the Foodmart cube. If you want a new partition 'to watch data, remove this line.