Data Mining Concept and Technology Reading Note 2

zhaozj2021-02-16  47

2. General flow of data mining

7.1 Data Mining Environment

The data mining environment can be schematically shown below:

Database 1

Database 2

Database N

Data mining tool

Visualization tool

7.2 Database Mining Process

1. Determine business objects

Clearly defined business problems, recognizing the purpose of data mining is an important step in data mining. The last structure of mining is unpredictable, but the problem to explore should be foreseen, and the data mining is blind for data mining. Sex, it will not succeed.

2. Preparation of data

1) Data cleanup

Eliminate noise or inconsistencies.

2) Data integration

A variety of data sources can be combined

3) Data selection

Search all internal and external data information related to the business object, and select data for data mining applications.

4) Data transformation

Convert data into an analysis model. This analysis model is established for mining algorithms. Establish a real-fitting algorithm to analyze the key to data mining.

3. Data mining

Mining the resulting transformed data. In addition to improving from choosing a suitable mining algorithm, the remaining work can be done automatically.

4. Result analysis

Explain and evaluate the results. The analysis methods used are generally determined by data mining operations, usually used.

5. Association of knowledge

The knowledge obtained is integrated into the organizational structure of the business information system.

3. Data mining hotspot

8.1 Data Mining of E-Commerce Website

When data is performed on the website, the required data is mainly from two aspects: one aspect is the background information of the customer. This part of the information is mainly from the customer's registration form; and the other part of the data is mainly from the viewer's click stream. This part of the data is mainly used to examine the behavior performance of customers. But sometimes, customers are very tremendous to their own background information, refused to fill this part of the information on the registration form, which will bring inconvenience to data analysis and mining. Under this circumstance, the background information of the customer has to be speculated from the viewer's performance data, and then use. For the technical and algorithms of the analysis and establishment model, the data mining of the website and the original data mining difference are not particularly large, and many methods and analytical ideas can be used. The difference is that the website's data format has a large part from the click stream, and the traditional database format is different. Therefore, the main job made by data mining of the e-commerce website is data preparation.

8.2 Data Mining of Biological Genes

Biological data mining belongs completely to another area, it is difficult to say how much value is very difficult, but it has benefited shallow people. For example, the genetic combination is ever-changing, how much is the gene and normal human genes of some diseases? Can you find different places, and then change it differently, make it a normal gene? This requires support for data mining technology. Compared to the data mining of biological information or genes, it is much more complicated compared with the current complexity of the data, the amount of data is analyzed and established in the data. From the analysis algorithm, some new and good algorithms are needed. It is far away from mature.

8.3 Data Mining of Text

In the real world, most of the information available is stored in a text database, consisting of a large number of documents from various data sources. Due to the rapid growth of the amount of information in electronic form, the text database gets a rapid development. The most data stored in the document database is a so-called semicontructure data, which is neither complete, nor is it completely structured. Research on the modeling and implementation of the semi-structured data in the recent database field research. Moreover, information retrieval technology has been used to handle structured documents. Traditional information retrieval is not suitable for the needs of growing large amount of text data processing. Therefore, document excavation is an increasingly popular popular topic in data mining.

8.4Web data mining

There are massive data information on the web, how to complicate these data into research hotspots in today's database technology. Data mining is to find implicit regular content from a large amount of data to solve the application quality problem of data. It is the most important application of data mining technology to make full use of useful data. Obviously, the Web-oriented data mining is much more complicated than data for a single data warehouse. Because it is facing many challenges: 1. For effective data warehouses and data mining, the amount of storage of the web is really large.

2. The complexity of the web page is much more complicated than any traditional text document.

3. The Web is a strong source of information.

4. The web is facing a wide variety of user groups.

5. The information on the Web is only a small part of the associated or useful.

In general, web data mining can be divided into three categories: web content mining, web structure mining, web usage mining.

Web-oriented data mining is a complex technology, because the presence of the above challenges, thus facing the Web data mining into a problem that is difficult to solve. The emergence of XML has brought opportunities to solve the problem of Web data mining. Since XML can make the structured data of different sources easily combined, the search diverse incompatible database can be possible, thereby bringing hope to solve the WEB data mining problem. XML scalability and flexibility allow XML to describe data in different kinds of applications, which can describe data records in the collected web page. At the same time, since XML-based data is self-description, data does not need to be exchanged and processed internally. As a industrial standard representing structured data, XML provides many advantageous conditions for organizations, software developers, Web sites, and end users. It is believed that the Web-oriented data mining will become very easy as XML as a standard way of exchanging data on the Web.

4. Future of data mining

At present, DMKD research is in the ascending, the overall level of research and development is equivalent to the status of database technology in the 1970s, urgently need to be similar to the theoretical model, DBMS system, and SQL query language, etc., to make DMKD applications It is widely promoted. DMKD research also forms a bigger climax, and the focus may be concentrated in the following aspects:

Find the formal description of the language, that is, the data mining language specifically used for knowledge discovery, may be as formulated and standardized as the SQL language.

Seeking visualization methods during data mining, enabling the process of knowledge discovery to be understood by the user, but also facilitates human machine interaction during knowledge discovery.

Research on data mining technology in a network environment, especially in the Internet, establishes DMKD servers on the Internet, and cooperates with the database server to implement Webmining.

Strengthening the mining for various non-structured data, such as mining of text data, graphic data, video image data, sound data, and even multimedia data.

Interactive discovery.

Knowledge maintenance update.

However, in any case, demand traction and marketing are eternal, DMKD will first meet the urgent need of information, and a large number of DMKD-based decision support software products will be available. Only the information is effectively extracted from the data, and the knowledge is discovered in time to serve human thinking decisions and strategic development services. Only when it is, the data can really become a resource comparable to the material, energy, and the information era will really come.

10. Postscript

转载请注明原文地址:https://www.9cbs.com/read-25175.html

New Post(0)