Mobile communication customer class topic data mining
1. Business Classification Big Customer: Mobile large customer definition According to the unified definition of the headquarters, the customer point is the basis for judging the big customers. The big customers are recalculated at the end of the year, determine the next year's large customer integration evaluation threshold. Big customer qualifications are only lifted during the year. The standards to achieve large customer points will be given to their corresponding large customer qualifications per month. It is incremented by large customers, and the mobile large customers have 4 VIP cards, namely diamond card, gold card, silver card, and VIP card. Ordinary Customer: In addition to the non-Shenzhou personal users outside the big customers. 2. Other classification methods can have the following two types: use clustering algorithm; human classification method. Customer Topics Top 3 Class 3 Problems: Association, Classification, Prediction, Customer Data Mining Theme can also be divided into this 3 category. l Association issues related to the relationship between the various attribute characteristics of the customers and the cross-sales of telecom products. Because of the product, customer service, etc. can be summarized into the properties of the customer, so the relationship between the client entity and other entities is also studied. More typical association:
1. Telecom cross-selling 2. Package selection problem 3. Business interaction problem l predicting that customer prediction issues are predictable attribute changes such as customers' behavioral changes or consumption. Customer's typical behavior changes have lost, increase, call behavior, change, customer information change, and other behavioral changes. Typical prediction problem:
1. Customer loss / large customers from the network 2. Potential customers forecast 3. Customer level change 4. Customer development 5. Market effect prediction L clustering problem customer clustering problem is customer characteristics. There are many problems with clustering research, and many subjects can be applied. Typical clustering problem: 1. Customer characteristics analysis 2. Consumer model 3. Abnormal customer analysis is only a typical topic in the customer topics, the telecommunications industry as a huge technical and business system, can propose research themes More than this, these can be discovered by the business person to discover the actual business, and the analyst is transformed into data mining problems, using data mining tools such as Clementine.
----------------------------
DW model for operating analysis system
Author: Happysboy
Date: 2004-04-11
First, the purpose of writing in the previous article is to introduce the Unified DW Model of China Unicom's Management Analysis System. This model has been released in the form of a concept model (CDM). In order to understand the design ideas, it is a general explanation. . This article first replied why there would be a unified DW model, then briefly describe the business sector of business analysis, and finally introduce the structure in the unified DW model. It is hoped that comrades in the construction of China Unicom's operations analysis systems or to build data warehouse design in other fields.
Second, why have a unified DW model? In China Unicom's operating analysis system, the analysis theme of the first phase of the main concern is to start around the user. Previously, for the user, the customer's concept is unclear, often confused. This is also true even in the business specifications of the headquarters unified business analysis system. Customer-related analysis topics are also, such as the age composition of customers, occupational composition, but due to the data source is not clean, the accuracy of customer data cannot be guaranteed, this part of the analysis is not the focus of the current phase. Therefore, the unified DW model we propose is designed to be designed around the user's information. The differences on users and customers have been described in detail later.
First, we have to clarify the purpose of the DW model. In the operating analysis data warehouse, we generally divide it from the layers ODS, DW, and DM layers, and the DW layer refers to the data structure under the ODS, the DM layer data, the main data structure of this layer. The task is to complete the pretreatment and precipitation of the data. The full name of ODS is Operational DataStore, which is basically stored in the data source mode, which is a data store that is biased to transaction, from this layer of data structure, we have to perform OLAP analysis is difficult because it is not a dimension modeling Although we have the main task of the process of data source to ODS to perform code transformation (converting code to ID) and data cleaning. The purpose of the ODS layer is to reduce the degree of coupling of the data warehouse system and the OLTP system, as the rules from the OLTP data source to the ODS layer are generally very simple, and the data source extracted is very low. Another purpose of the ODS layer is to provide sub-inquiry, such as a list of networked guests, high-value customer list, etc., the data must be exported from ODS when detailed. The purpose of the DM layer is simple. It is not a data precipitation, but as a temporary table required to load CUBE, mainly for performance, facilitating process processing, storage data is also the most recent period of data. Because the demand for multi-dimensional analysis is often changed, the structure of this layer is also frequent, and we consider placing this layer of data outside the logic design when designing (to keep logic design as stable). Fortunately, the structure of this layer is very simple. When a analytical topic is determined, that is, the dimension, the metrics are determined, the structure of the DM table is determined, so the creation of this table is currently the creation and data loading of the data. Work attributed to multi-dimensional analysis.
It is because of the frequent changes in user demand, we must have a powerful and stable DW layer table, which contains vast majority of analytical dimensions and metrics, and has sufficient time span data precipitation. This data precipitation must be stable, in the initial stage of operating analysis system, the concept of data precipitation is still unclear, many of the cases are as a number of dimensions, as a precipitate, but once a new dimension analysis The metric, then this summary data is not used. In addition, this DW layer must be absolutely low coupling with the OLTP system, DW should only be related to the business topic, analyze logic, and should not be unknown with a field of the OLTP system. Only in this way, our DW layer can be reused. For the continuous demand for users, DW design is also able to adapt to this change within a certain range, which requires the requirements for extensibility design. In summary, the design DW layer is to achieve the following objectives:
• Stability of data precipitation. Can guarantee data precipitation is not a white fee space, time;
• Reuse of the data structure. Can guarantee the versatility of the data structure, and the data source is not related;
• Data structure scalability. Ability to adapt to changes in demand;
It is said that the data warehouse is the theme, integrated, relatively stable, reflects the changes in history, but what is the meaning of these characters? We believe that before the theme, we must first face objects. The object here is that the entity in the telecom service, the user can be an entity, others such as customers, products, channels, competitors, etc. are entities. For each entity, to integrate their basic information, consumption information, form a stable data structure, and precipitate at time. As we mentioned in front of us, the key entity of current operating analysis is the user, and the user is divided into the user according to different dimensions, analyzing their quantity, consumption, business usage.
Third, the business management analysis of business management system is mainly the result, such as paying attention to the user's credit, not how to credit control; pay attention to the user's bill, not the user's billing offers; pay attention to the user Double price details, rather than how secondary batches, etc. At present, the data source acquired by the business analysis comes from the integrated camp system, billing system, customer service system, and settlement system, mainly from the top. For different manufacturers, their systems are different. Through analysis, their structure is naturally similar, because after all, there is a technical specification of a comprehensive accounting system there, and after a period of baptism, manufacturers can't go Have too far. They may not be the same for detailed processing details, but they can be unified for the resultability of the resultability of our attention. At present, in the ODS layer, we have not unified, because of historical reasons, we are now difficult to unify this layer, but there is no doubt that this layer can be unified to a certain extent, which will reflect the data source to ODS. Rule, not to be structural, but currently doing a unified DW layer or more realistic. Here, these resultability data that require attention can be classified as follows:
Customer Data Class here has a series of tables to store customer information, core concept is three households, customers, accounts and users. It divides the basic information of the customer, the information of the customer consumption, and the contract information of the carrier's business. The relationship between these three can imagine a tree, the customer is located root of the tree, the account is in the middle, the user is the third layer of leaf nodes. A customer has multiple accounts, with multiple users under an account, the user is the finest unit, in the business analysis, most statistical analysis is aimed at users, but because the terminology is not strict, there are many Under the occasion, mixing customers and users to talk, for example, the number of customers often said is usually the number of users (not absolute), and the user level is actually a customer level.
When a customer is entry, such as selecting a CDMA service, a set of three-family relationships will be established for this customer, of course, if a customer applies for a new business, such as 193 long distance, if you want to pay for existing accounts, you can Just build a new user if it wants to pay in another account, then build a new account. In customer information, the information we pay attention is customer type, customer level, gender, age, occupation, etc. In the account, we usually pay attention to the type of payment, pre-deposit, etc., you can first specify the information such as cash payment or bank deduction when entering the network, and record the bank name, account, etc. . In user information, there is a user's mobile phone number, service status, access time, and whether it is effective and other contract information.
Regarding the user, there is a sub-service information, or a special service, such as an incoming call display, a mobile secretary station, and the three-party calls that include which users have special services in addition to basic call functions, they usually and service properties Stored in one (in fact, it is a service attribute), such as roaming level, long distance level, etc. Some systems use mask to indicate which special service or which service properties (such as East Soft, Code of Code) are used to maintain the user's opening of which special service (such as Yadixin).
The account class has a monthly month, and the system has an on-site business link. It calculates the amount of each user's last month's consumption. This calculation is different from the user's details, his package, user type information for different benefits. Reduces. With the user's pre-deposit or payment, the system completes the account operation. If the user has no payment if the user pays the fee period (the first month of the general accusation period), this user is arrears. In the account information, we are concerned about receiving, receiving, paying, and arrears. • After the payment is completed, the cost that all users should pay, these information are stored in the monthly bill. We receive the bills sent by Unicom every month, and how much local calls are listed, long-distance fees, and how much roaming fees are available. This is from the monthly bill. The monthly bill is stored all the cost of each cost type in a certain account. Usually, the preferential cost and reduction fees are also stored here. The preferential fee is available level discount, the account level discount, the detailed level of the concession refers to the discount in the second batch of detail, and these values are also copied in the monthly bill. The account level discount refers to the user as a user. The preferential fee is in two ways in the monthly sheet, which is reflected in the column, and the monthly bill has a discount field, where a value is stored here, which is generally positive; there is a positive value; The cost type is reflected in the fee is a billing, generally negative value, so that the accumulated receivables will remove this offer.
In addition, there are also because of the price mistakes, resulting in the error of the user, the processing of different camp is different, and some may modify the monthly bill, and some are on the next month.
• Receiving information from a pin account statistics, not from the payment table, because the user pays the fee may be as a pre-deposit, this part should not be considered.
When the account is opened, the operation of the account is based on the user's pre-deposit, and the user's payment is sold through the user's payment during the next month. For pre-deposit sales, there is often a half-sales, such as at 100 yuan, but the pre-deposit is only 50 yuan, for this situation, different camp is also different processing, some is a semiler process, will be a special The cost type righteousness, some are in secondary type fees according to the predefined amount of time, such as current monthly rent, and then sell local fees.
The structure and monthly list of the sales list is similar, and each of its records indicates which fee is the fee. Because there is any arrears, it may sell users a few months in a month. So important information in the pin list includes the monthly month, cost month, cost type.
It is meaningless after the account is statistically, because the payment period has just begun, so we pay attention to the real estate fee is to realize the monthly fee, the latter is also called the arrears recycling, because of the receivance In the moon, it refers to the owed footbook.
• Paying users can pay at any time, some users have not taken a rainfall, first pay enough costs, and some do not see tears of coffins, and arrears stop. The payment is stored in the user's account, used for sale or as a pre-deposit.
Users can pay, cash, checks, credit cards, banks to pay, deductible, etc., now have more recharge cards, called payment methods. Users can pay at the business hall, or pay by the bank, through the customer center, this is called payment channel, and there is also a message, payment time, and the information we pay attention to, this is mainly used for the analysis of recharge card payment. The payment table is stored by payment, and a user can pay in a variety of ways or a variety of channels in a day. Therefore, when statistics, when analyzing payment users, pay special attention to the payment period, payment method, channel is not Summary.
• Arrears a user does not sell in a certain account, it is owed. Because of the different ways of credit control, some users arrears immediately, and some may arrears for a few months. In the arrears table, which type of fee is stored in which month owe it is. There is a difference between the arrears and the above receivables, the payment, and the above three are relatively static in one period, and the cost of occurrence, such as the receipt of a certain month, still so much. But for arrears, if you want to see the arrears, it should be reduced, and we are more concerned about the accumulated arrears. It is dynamic, and users owe 100 yuan in January, and they may owe it in February. 200 yuan or no longer arrears. If the user arrears more than a certain period of time, such as 1 year, the billing may be dull.
Some of the arrears forms of the system cannot be arrears, but they have not been recorded after the account, but in fact, there is no expense of the expense of the previous account. Pay special attention to this point when calculating the arrears, and use the biggest arrears to subtract the minimum arrears month. For example, the minimum arrears month is January 2004. The biggest arrears month is April 2004, then the arrears It is three months. If calculated by statistics, the statistics at this time should be in May 2004, the owed time is = MonthbetWeen (minimum arrears, statistics) -1.
The billing charging type data is generally not in the operating system, but a separate professional billing system, they are all available data, such as GSM details, cdma detail, SMS details, CDMA1.x traffic, 193, IP details, etc., of course, we are most concerned about GSM, CDMA details. These details are data after quadratic data. It records call information, network information, billing information, and call information, such as the primary called, long-distance type, roaming type, and other party operations. Commercial, call start time, call time, network information such as base station, MSC, etc., billing information such as basic fees, long distance charge, long-distance surcharge, special service fee, billing time, etc., there is also a corresponding preferential fee, This is the detailed level of the previously mentioned. Differential calls are different from long-distance calls and local calls, local calls, such as 70 seconds of local calls, 2 minutes, long-distance billing, for 6 seconds, such as 70 seconds long-distance call Long-distance accounting is 72 seconds.
Users can access their live calls every day or call the special service phone. The account will be counted every day, and they store them in a real-time phone bill. This table is not just simply summarizing the cost summary, will also The user's SMS fee, the monthly rent cost is added. When opening the month, it is a basic basis for the bill. In this table, there is generally a fee type. In order to distinguish it, we call it a detailed fee type. The cost type in the bill is called billing cost type, from the detailed fee type to the billing fee type presence mapping relationship.
Operating categories include business acceptance, business charges and a range of logs. Operating acceptance refers to each acceptance record in the business hall, such as user account, change number, etc., but not every acceptance is charged, or an acceptance will charge different types of costs, and the cost type here is called business expense type For example, the account fee, SIM card fee, etc. These cost information are stored in business charges.
For users, customers, accounts, there must be logging, especially for package, user status, special service, etc. The log list should record the information before and after this information, and some of the accounts record all logs, and the unified description is more difficult, and some is a specific log for specific types.
In addition to these categories, there are also a series of services such as resources and intelligent networks, and resources are mainly including cards and numbers. The smart network is actually one of mobile services, but it is a separate platform. Data, the main concern is the recharge data, the detailed data, etc. What is the business analysis activities of China Unicom? The operating analysis system is to provide decision-making support for Unicom's business activities. In this process, it is necessary to provide a range of important indicators to measure the effectiveness of business activities and guiding future strategies. At the current front end, even if the needs of users are not very clear, which indicators are concerned about business activities? This has gradually been clarified by chaos.
• The first thing of the business development Unicom is the number of users of each business. These users mainly include the number of net users, the number of users, the number of users, the number of speech users, the number of call users, etc. This is particularly concerned about the seizure of the market.
Business income Even if the number of users develops rapidly, income is not a year-on-year growth, especially the development trend of the Item CDMA business, initial users' growth is particularly rapid, but because of many preferential policies, income is slow. Business revenue is mainly analyzed around receivable, real increasing, and arrears. Currently, this piece of data is difficult to obtain, which greatly limits the role of business analysis.
• Business is concerned about the use of services, such as mobile services, call duration, number of calls, and billioning time are more important indicators. For CDMA1X services, traffic is an important indicator.
In response to the above indicators, business analysis is special for their distribution on the user package, the user's networking channel, and the former provides decision-making basis for the development of the package (product), which provides a basis for optimized channel construction.
4. Unified DW model structure By understanding the above business introduction, we can now look at the design idea of unified DW model. As mentioned earlier, at this stage our analysis focuses on user themes. Therefore, in the DW layer, a series of core data structures must be designed around the user.
To illustrate the role of the DW layer, we envisage some typical analysis:
1. On April 1, 2004, the number of users in the C network in Java, the number of users from the network;
2. On April 6, 2004, the income income of the C-net C network in Java City and the income of the above-month
3. In February 2004, the C network of Java City was received, and the growth rate was much in the same period of March;
4, in March 2004, Java City C N-net payment charge, monthly rent, local call charge, long distance fee, roaming fee, SMS fee, etc. What is the proportion of subdivision on the package?
5. As of the end of February 2004, how much is the accumulated arrears of the C network in Java? How many users are owed? Which package is wrong? Which agent has developed more arrears?
6. As of March 1, 2004, how much is the real estate of the C network in Java City (January)? What is the recycling?
7, on April 4, 2004, how much is the number of C-net calls in Java? What is the length of charging? What is the distribution on the package?
User-level data precipitation In order to meet the requirements of data precipitation stability, DW design cannot be simply summarized as a number of metrics in probably possible dimensions, as demand is constantly changing, and the demand proposed is not really useful. Demand, when this analysis needs need to join a new analysis dimension, then this summary data will be ruthlessly abolished, or you will put as much dimension as much as possible, but I am afraid it is unrealism, because Summary performance and space will not be allowed. Therefore, we need another idea to precipitate data. Very natural, the precipitation method of the aggregated data is to remove the user ID, then if we reserve the user ID? Such precipitation data can ensure stability to a certain degree, and at least the data from which to precipitate will not be white. However, pay attention to, for a user corresponding to multiple recorded data sources, you can imagine a narrow table, such as a monthly bill, which distinguishes different costs through the cost type, in the DW layer, we need to form a wide table, Each user records a month, form a number of fields, such as call, monthly rent, etc. according to the cost type. We call them as a user information table (a series). Such data precipitation can greatly guarantee the stability of the data, and if there is a new requirement, these user information tables can be expanded. The business analysis system surrounds users, focusing on user service, consumption, business use, payment behavior, arrears. Therefore, we can establish different user information tables for these user themes. Please pay attention to the user collection of each user information table. It is assumed that the following symbol description:
? U (w): Represents the entire user collection, all recorded users;
U (n): Indicates the collection of net users, indicating that all users who have not left the network are proof;
? U (b): means a user collection of the bill of accounts, that is, the user aggregation in this month's billing table;
? U (c): means a user collection that generates a call behavior of a month (day), that is, the user collection in the month (day) call details;
? U (m): means a user collection of a SMS (day), that is, the user collection in this month (day) SMS detail;
? U (x): Indicates the user collection of the CDMA1.x service, which appears in this month (day) CDMA1.X, which appears in this month.
U (a): Indicates that the user collects the user's point of time, generally pointed out the user in the arreled table;
The Unified DW layer has a series of tables for all kinds of information, and the user information we refer to us is a general meaning, and the user information table of the camp system is different. The generalized user information has a feature, the user and it is One-on-one relationship, such as the user's mobile phone number, one user has only one mobile phone number, and then the user's bill of charge is only one month. For similar cost types, call types, users, and they are all one-to-many relationships, and cannot be called user information.
User information can be divided into two categories, one is an inherent information, one is statistically out of information, the former is generally analyzed as a dimension, such as a user package, user arrears, the latter can be derived as a dimension, or act Measure, such as the user's receivable call, can be used as a cost subsidence dimension or as an account receivable. The user has two of the most basic information is the home type, and the two information acts as the dimension of almost all analytics themes, and the following user information tables contain both information, this redundancy can make it easy for statistical data . The user information table has similarity on naming, usually in fw_userinfo as a prefix, followed by 1-2 characters to represent the information type. This series of user information tables will be introduced in accordance with the classification of user information.
1, service information users are a contractual relationship of a customer ordering a certain business. We generally save a full user table in ODS, and we have to further integrate.
User Static Table (FW_USERINFOS), the user static table is placed in the user's basic information, the user ID is used as the primary key, and each user records. In the business introduction, the customer, account and the user are one-to-many relationship, so the information of the customer and account can be uniquely mapped to a user, in a static table, we are integrated with the user table in the ODS layer. , Client tables and account tables, in general, the information we have to take in the customer table includes:
? Customer type: such as personal customers, unit customers, group customers, etc .;
? Customer level: such as ordinary customers, big customers, etc .;
? Customer age: can generally calculate from the ID card ID, but the data is not very neat;
? Customer gender: can generally be calculated from the ID card ID, but the data is not very neat;
Customer occupation: such as teachers, students, civil servants, etc .; data is not clean;
From the ODS account table, the information we generally taken includes an account payment method. When the user can enter the net, you can set a payment method, such as selecting a bank account and bank code, etc .; from the perspective of analysis This dimension is actually rarely used, so in the actual project, you can properly pay the association of this table;
From the ODS user table, we have to move most of the user information directly. The main information includes the land, user status, user package, user access channels in the user, and users in the net (in the network duration). There are several special information that requires emphasis:
• Last user status, because we need to update the DW user static table daily, for users who change the state, we need to record the status before the changes in this information.
• The status is maintained, from the last state change so long, which is useful for analyzing the number of users, such as the stop for 1 month.
Is it in the network, whether the network usually needs to be judged by several fields, and different cash system judgment methods is still different, so in order to block this judgment algorithm, use this identity to distinguish whether the user is in the network, this is statistical It is very helpful in the number of network users;
User static tables daily incremental updates, its user collection is U (W). The table is generally involved in day analysis, such as business use day analysis, from the user package, user agent, call type, roaming type, etc., the number of user static tables must be associated with the number of times the time is measured. Complete statistics. In addition, for the development of daily users, the number of online users in a certain day is needed to be statistics. In the last day of the month, in order to provide support for monthly analysis, you need to back up a static table, called the moon attempt backup table (FW_USERINFOSM), it is the same as the structure of the static table, because the menstrual statistics usually go to the end of the month On the 6th, it can be completed. At that time, the user's static table was already next month, and it could not represent the last month. The backup user table is very important for each month, which is also very meaningful for supplementary historical data, so it is recommended to pay attention to the monthly backup user table even when the ETL work is not completed.
2, consumption information here consumption information refers to the cost of each user's handle, usually you will receive the bill of mobile phone monthly. It comes from the ODS monthly bill, in the bill, you can see how much local call consumption last month, how much roaming call consumption is. Information such as local calls, roaming calls, etc. is the type of bill, which is logically a primary key. In order to form user billing information, it is necessary to operate a narrow table variable width table.
User Account Information Table (FW_USERINFOB), this is the formed wide table, month_id and userid as a joint primary key, a monthly record per user, the user collection in this table is u (b). Account information includes:
• Billing should receive: After the account offer, the payment fee is generally the total cost of your bill, indicating that you have to pay so much money;
? Call charges: The cost type is the fee only and the cost type of call charges, such as long distance charges, roaming fees, etc .;
• The type of account: indicates whether there is a monthly rent fee in the account fee of the user, is there a call fee? This is also distinguished by the type of cost.
Offers: Offers typically include the details of the detailed level, account level discount, this is not the same in the ODS monthly bill, and some may use the field to represent preferential fees, and some may use a fee type.
In the unified DW model, there is no way to process all the cost types, which requires the need to have this demand according to actual needs, and add user billing information as appropriate. For example, there is a analysis requirement to count the number of account users equal to 0, then you can add a field of monthly month.
3, arrears information When the user exceeds a certain period, there is no payment. In the ODS arrears table, it usually stores how much money owed, and can be fine to each cost type. However, there is no significance of analyzing the arrears of some cost type, because the implementation of the account is not owed, there is no internal sales rule, this detail is not our attention.
User arrears information table (FW_USERINFOA), the primary key of the table is MONTH_ID and UserID, one record per user per month, where the user set here is U (a), and the U (a) may also contain users who have been away from the network. User arrears information mainly includes, in order to illustrate, it is assumed that a user began arrears from December 2003, owed by 80 yuan, owed by 70 yuan in January 2004, and in February I owe 110 yuan, then in December, January and February, in December, January and February, in March, because of April 1, March is not yet. • The minimum arrears month, because in the ODS arrears table, store the user's own arrears, these records are continuous on the month, there is no sales only the month-month month without the earliest monthly sales account. Therefore, this month can be used to calculate the arrears. For example, in the example above, the minimum arrears month is 200312, and the arrears are 3.
? Accumulated arrears, the user's arrears of arrears in the month; in the example of the above, it is accumulated in the two months, 260 yuan.
? The amount amount owed last month, the user owes the last month? For the above example, the last month refers to February, owed by 110 yuan.
This year's arrears, which means that the user owes this year. It is only the arrears amount of the owner of the year. In the above example, the value is 70 110 = 180 yuan.
The arrears information table and the above billing information table are different, and the bill is an incremental information, handled once a month, and the previous month's billing information is hardly changed or ignored. However, the arrears information is not incremental, and each month may be recycled in the previous month's arrears, so every month is based on the entire arrears table analysis.
4, business usage information refers to the user's use of mobile voice services, SMS services, data services, etc., all from ODS details, and detailed data is huge, usually need to be counted daily, monthly Statistics based on day information. When using information statistics, we have to take into account some practical data collection difficulties, because of the reason for the pass, we cannot simply follow the call time, and if you are statistics according to the collection date, it will also cause data. Inaccurate, there is some compromise here. The previous information table is generally the user ID and time as the primary key, but the information table used by the service is generally used as the primary key, that is, the user may have more than one record in one cycle class.
User call information table (FW_USERINFOCD, FW_USERINFOCM), first for daily bill, perform daily general summary, summarize according to the collection date, as the primary key to GatherDay_ID, CallDay_ID, and UserID. For monthly bills, on the basis of day call information, the monthly call information is obtained as the primary key with GatherMonth_ID, CallMonth_ID, and UserID. Plus the acquisition time and talk time is to have two statistical calibers, in fact, the final analysis must choose a time to summarize. User call information includes:
? Call: How many words have been generated, including calling some special service numbers;
? Call time: This user has a cumulative call during a certain period;
• Local billing time: Local billing time is charged in 1 minute, the user accounts in a certain period of time, the length of time is long;? Long-distance billing time: long-distance billioning is 6 seconds, the user Cumulative long-distance billing during a certain period;
This information table can be extended to a large extent, performing the processing of narrow table revolutions, because in the details, call types, long-distance types, roaming types, call time, etc. are not reflected in the current unified DW model, If there is a number of analytics and roaming calls, you can extend two fields for call information: long distances and roaming times.
User SMS information table
User value-added business information table
5. User monthly information general on the day analysis theme, we can analyze the user static information, for monthly analysis, for convenience, avoid each big table association, in advance, static backup table, user billing information table , User arrears information, user monthly call information, etc., form a very wide user monthly information table (FW_USERINFOM). This table contains almost all fields that appear in these information tables, and it should be noted that because the user collections of these information tables are different, U (W), u (b), u (a), etc. . The user collection of the user monthly information table is U (W), that is, the user's static table as a primary table, Left Join Other Table, this is bound to have some cases, this, a series of fields in the monthly information table Whether a user is on, whether it is arrears, whether it is called. These iconic fields have a significant role in statistics such as the number of account users, the number of arrears users, and the number of call users.
User monthly information is a very important table, which can count many indicators, which are simply inconvenient, such as several analysis examples given above, 3 and 5 can statistics, 1 and 2, because of day analysis, 4 involved The cost type dimension, in the user billing information form, according to the cost type, the type of time, long distance fee, roaming fee, monthly rent, etc., it is possible to count here. 5 Reporting the fee month, cannot be statistics from here, 7 involving the type of call type, roaming type, and long-distance type, and cannot be statistically.
Summary The user's series information form on the top of the sediment will provide good support for OLAP and data mining for OLAP and data mining, but at the same time, management is not only for users. proportion. In particular, some dimensions are not fully expanded in the narrow surface width of the above, such as cost types, we cannot use each fee as a field of the user billing information table. Therefore, for this data, a particular manner is performed.
Similarly, the purpose of data precipitation is to maintain stability, spreadability, and reuse, which mentioned above if the stability is not guaranteed if certain metrics based on several dimensions, because these measures are analyzed in new dimensions. This summary table will not be used. But we also inevitably use this precipitation method, which requires us to pre-know what several dimensions most concerned about the analysis theme. Below are these summary tables, each summary table has some dimensions that are not user information (the dimension of a pair of multi-relations). These summary tables combine some dimensions in the user information table, and the relevant dimensions in the user information table are performed by the user information table. We believe that the user's package type and networking channel are more commonly used dimensions.
Since this summary sedimentation is unstable, why should we deal with it? Here is just a way of thinking, in fact, from demand, some special dimensions combined with different dimensions of the user's related dimensions, we will refer to these special dimensions as subject dimensions, usually, if you don't want to be on the subject dimensions When analyzing the subject metrics, you can turn from the user information table to statistics, for example, you want to analyze the accounts for each city C network, which can be statistics from user monthly information. Here, these summary tables will be described one by one: • The important dimension of the business should pay the sum of the total receiving analysis is the type of cost, you can look at your monthly mobile phone bill, will be charged by the fee, local call fee, long distance Fees, roaming charges, etc. are listed item by item. Each user will produce multiple types of fees per month, this table is associated with monthly bills and user monthly information tables:
Subject dimension: cost type
User Dimensions: Land, Business Types, Packages, Network Channels, Fee Different Unit
Measure: Account receipt, preferential fee, etc. If the number of users is added, the user's meaning is the number of users of a particular fee, and cannot be separated from the cost type.
For analysis example 4, you can analyze the purpose of the initial analysis by the type of month, land, business type, package and cost type.
2. 2. Important dimensions of business arrears summary arrears analysis are arrears month and cost types, and one user can owe a month (continuous month), each month can also owe multiple types. This table has arrears table and user monthly information.
Subject dimensions: arrears month, cost type, general cost type is not very significant, almost owne
User dimension: land city, business type, package, entry channel, long-term time, arrears amount
Measure: Accumulated arrears, the amount of arrears this year, in addition to the number of users, the number of users is almost meaningful, because its meaning is the number of users who owes specific fee items, and the arrears of the month and cost type are all any dimension. It can't be accumulated.
3, the business real consolidated exchange is the expense of the payment, not the payment fee. When a user is sold, according to its arrears, you can sell a few months, the sales will be sold according to the fee type, but usually, if it is not a sales, or the internal sales Order, there is no analysis. The expense month information can be divided into two topics, one is the real estate topic, one is the arrears recycling theme, that is, to the month.
The real estate analysis can be performed daily, or monthly analysis, so you can have a summary summary and monthly exchange, but note that the daily collection must not contain dimensions in user monthly information, such as cost component. The former is associated by a pin account and a user static table, and the latter is associated by a pin account and user monthly information.
Subject dimension: cost month, cost type, we will go to the cost type dimension in the unified DW model;
User Dimensions: Land, Business Types, Packages, Network Channels
Measure: Real income fee
For analytical example 6, you can count the property on the market, business type, and cost month;
4, business payment summary business payment summary The finest granularity is also a daily collection, and a user can pay multiple payments through different payment methods, even the same payment method. The daily general table comes with the user's static table by payment table, the monthly exchange association payment table and user monthly information.
Subject dimension: payment method
User Dimensions: Land, Business Types, Packages, Network Channels
Measure: Payment fee, payment pen
5. Mobile business use summary a user can generate a variety of call behavior every day, can be a call or called, or a long-distance or non-long-distance call, can be roaming or non-roaming calls (such as you are a Empty man flying), can call different base stations, etc. Unified business analysis For business use, the most fine granularity is the day, daily collection can be combined with a day-to-talk detail related user static information table, monthly aggressive must be associated with user monthly information, you can think And the calculation of this table is time consuming. In the details, there are many dimensions, and if the summary is always added to all dimensions, the amount of data is close to the details, so we will have a selection to determine the subject dimension. Subject dimensions: time period, call type, long-distance type, roaming type, other operator, long distance route
User Dimensions: Land, Business Types, Packages, Network Channels
Measure: basic fees, long distance charges, preferential fees, call duration, number of calls, long-term duration (local, long distance), etc .;
In the detailed order, there are some dimensions, such as base stations, this part of call, and other dimensions such as the other party, the other party calls, etc. In the business analysis DW design, we recommend to form a separate summary table.
Other entity data precipitate outside user-level data precipitation, other entities need to be precipitated such as customers, products, channels, competitors, resources, etc., but the importance is not high, and most of them can also be from user-level The data is again summarized for precipitation. For example, users and customers are N: 1 relationships, and channels, the product is 1: 1 relationship, and the user entity is better than their particle size. It is also not described here, but also hopes to combine some specific needs and will gradually improve this part.