Friday, December 31, 2004
Today's data processing can be roughly divided into two major classes: online transaction processing OLTP (On-line Transaction Processing), online analysis processes OLAP (On-line Analytical Processing). OLTP is the main application of traditional relational databases, primarily basic, daily transaction, such as bank transactions. OLAP is the main application of data warehouse systems, support complex analysis operations, focusing on decision support, and provides intuitive and easy query results. The following table lists the comparisons between OLTP and OLAP.
OLTP OLAP User Operators, underlying Management Decision Person, High-level Management Function Daily Operation Treatment Analysis Decision DB Design Distribution Facing Theme Data Current, the latest, detail, two - dimensional, discrete historical, aggregated, multi-dimensional , Integrated, unified access / write dozens of records read millions of records work unit simple transaction complex query users number thousands of hundreds of DB size 100MB-GB 100GB-TB
OLAP is a class of software technology that enables analysts, managers, or executives to quickly, consistent, interactively access information from a multi-angle to obtain a more deep understanding of data. The goal of OLAP is to meet decision support or meet specific queries and report needs in a multidimensional environment, and its technical core is the concept of "dimension".
"Dimension" is the perspective of people to observe the objective world and is a high-level type division. "Dimension" generally includes hierarchical relationship, which is sometimes quite complicated. By defining a number of important attributes of an entity as multiple dimension (Dimension), users can compare different dimensions (my understanding: For example, in a two-dimensional report, we can be based on any dimension Compare the other dimensional data). Therefore, OLAP can also be said to be a collection of multi-dimensional data analysis tools.
OLAP has a variety of implementation methods that can be divided into ROLAP, MOLAP, and HOLAP depending on storage data.
ROLAP represents an OLAP implementation based on a relational database. The relationship database is the core, and the representation and storage of multi-dimensional data is performed in a relational structure. ROLAP divides the multidimensional structure of the multi-dimensional database into two types: a class is the fact table, used to store data and Qi Key key; the other is a dimension, that is, at least one table is used to store the level of the dimension, Description information such as a member class.
MOLAP represents an OLAP implementation based on multi-dimensional data organizations (Multidimensional OLAP). The multi-dimensional data organization is the core, that is, MOLAP uses the multi-dimensional array stored data. The multi-dimensional data will form a "cube" structure in the storage, "rotating" "cube" in MOLAP, "cut", "slice", is a major technology that produces a multidimensional data report.
HOLAP represents an OLAP implementation based on a hybrid data organization (Hybrid OLAP). If the low layer is a relational, the high layer is a multi-dimensional matrix type. This approach has better flexibility.
OLAP Tools are online data access and analysis of specific issues. It analyzes, queries and reports in a multi-dimensional manner. Dimension is a specific angle of people to observe the data. For example, a company is usually observing the sales of products from different perspectives of time, regional and products in consideration of product sales. The time, region and product here are dimension. The different combinations of these dimensions and the multi-dimensional number of metrics composed of the measured metrics are the foundation of the OLAP analysis, which may be expressed as (dimension 1, dimension 2, ..., dimension N, metrics), such as (region, time) , Product, sales). Multidimensional analysis refers to the analysis of various analytics, slice, cutting (DRIL-DOWN and ROLL-UP), rotation (PIVOT), and other analytical operations, such as slice, cut-down and roll-up, rotation (Pivot). User users can observe data from multiple angles, multi-sidewalk, thereby in-depth understanding of information included in the data. According to the organization of syndrome data, common OLAP mainly has two types of multi-domain-based MOLAP and ROLAP based on relational databases. MOLAP is organized and stored in a multi-dimensional manner, and ROLAP utilizes existing relational database techniques to simulate multi-dimensional data. In data warehouse applications, OLAP applications are generally front-end tools for data warehouse applications, while OLAP tools can also be used with data mining tools, statistical analysis tools, enhance decision analysis.
Saturday, March 19, 2005
Derivative data includes various summary, allocation, difference, ratio, sorting, and product, OLAP, is to create a derived variable.
Olap is good at displays with distribution data. As its "dimension" meaning, it allows decision makers or analysts to better observe data (personal understanding).
The difference between OLAP and data mining is the difference between type and exploration modeling. Functions and algorithms in the OLAP tool (such as aggregation, allocation, ratio, product, etc.) are description modeling functions, while the functions in data mining tools (such as regression, neuronal networks, decision trees, clusters, etc.) are Mode discovery and explore modeling function. In addition to providing description modeling features, OLAP tools also provides functions of building complex structures, such as tape level dimensions and cross-dimensional references, which are not available in data mining tools.
The complementarity of OLAP tools and data mining tools is very strong.
Sunday, March 27, 2005
The term OLAP is used to represent a group of products, refers to the descriptive modeling modeling based on the analysis of decisions.
The core requirements for OLAP include: a wealth of dimensional structures with hierarchical references, the effective specifications, flexibility, structure and performance of peacekeeping calculations, to support any query enough speed, multi-user support.
Analytics work is not just a simple figure summary. It is important to summarize the summary of a large amount of data correctly, but the most important information is from a variety of ratios, and the inferences and other aspects of the trend of time.
The polymerization table is a table containing the "Summary Information of the Fact Table".
When using SQL for aggregation, a combination of each summary level (personal understanding: combined with different hierarchy of multiple dimensions) is created. For example, it is assumed that there is 3 levels in the time dimension. The store dimension has three levels, and the product dimension has 4 levels, so there are 36 different aggregation hierarchies, which requires 36 polymerization tables.