XML and Web Data Mining Technology
Author: fuyiping Date: August 22, 2004 Views: 179
A new generation of WWW environments based on XML is to face Web data, not only well compatible with the original web applications, but also better implement information sharing and exchange in the Web. XML can be seen as a semi-structural data model that can easily describe the XML document description with the attributes in the relational database, perform accurate query and model extraction. 1.XML's production and development of XML (Extensible Markup Language) is an important branch designed by the World Wide Web Association (W3C), especially for the SGML (Standard General Markup Language) of Web Applications. In general, XML is a Meta-Markup Language that provides a format describing structured data. In detail, XML is a language similar to HTML, designed to describe data. XML provides a separate running program to share data, which uses a new standard language from the motion description information that enables computer communication to expand the functions of the Internet from information to other variety of people. Go in the event. XML consists of several rules, which can be used to create tag language and can use a concise program called the analyte program to process all newly created tag language, as HTML provides a display for the first computer user reading the Internet document. Like the way, XML has also created a Chinese language that anyone can read and write. XML solves the two web issues that HTML can't be resolved, that is, the Internet has fast development speed and slow access speed, and more information available, but it is difficult to find the part of the information you need. XM L can add structural and semantic information that allows computers and servers to process multiple forms of information. Therefore, using XML expansion features not only download a large amount of information from the web server, but also greatly reduce network traffic. The flag in the XML is not predefined, and the user must customize the required flag, and XML is a language that can be self-explanation (Self Describ). XML uses DTD (Document Type Document Type Defination) to display these data, XSL (Extensible Style Sheet Language) is a mechanism to describe how these documents display, it is a style sheet description language of XML. XSL history is long than HTML CSS (laminated style sheet cascad ing style, XSL), including two parts: a method for converting an XML document; a method for forming an XML document. XLL (Extensible Link Language) is an XML connection language that provides connectivity in XML, similar to HTML, but more powerful. Using XLL, you can connect in multiple directions, and the connection can exist in the object level, not just the page level. Since XML can mark more information, it can make users easily find the information they need. With XML, Web designers can not only create text and graphics, but also build multi-level, interdependent systems, data trees, metadata, hyperlink structures, and style sheets defined by document type. 2. The main feature of xml is the characteristics of XML determines its superior performance. XML as a marker language, there are many features: (1) Simple. XML has been carefully designed, and the entire specification is simple, it consists of several rules, which can be used to create tag language and can use a concise program that is often referred to as an analyte program to process all newly created tag language. XML can create a world language that anyone can read and written, which is a unified functionality of this world language.
As the tag created by XML is always pair, and the new coding criteria that relies on unified code. (2) Open. XML is SGML has many mature software on the market to help write, manage, etc. Many industry top companies, cooperate with W3C's work group to assist in ensuring interaction, support developers, authors, authors, authors on all systems and browsers, and improve XML standards. The XML interpreter can load an XML document using a programming method. After this document is loaded, the user can obtain and manipulate the information of the entire document through the XML file object model, speed up the network operation. (3) Efficient and expandable. Support multiplexed document pieces, users can invente and use their own labels, can also share with others, extensible, in XML, can define unlimited set of labels. XML provides an architecture that mark structured materials. An XML component can declare information related to the retail price, business tax, book name, quantity, or any other data element. As many institutions in the world have gradually adopted XML standards, there will be more related functions: Once locked, use any way to pass through the cable, and rendered in the browser, or transferred to other applications The program makes further processing. XML provides an independent method of using the program to share data, using D TD, people in different groups can exchange data using common DTDs. Your app can use this standard DTD to verify that the data you receive is valid, you can also use a DTD to verify your own data. (4) Internationalization. Standard internationalization and support most of the world. This comes from new coding standards that rely on its unified code, which supports all the mixed texts written in the main language in the world. In HTML, in most digital processing, a document is generally written in a special language. Whether it is English or Japanese or Arabic, if the user's software can't read the character of the special language, then he can't use it. Documentation. However, it is possible to read the XM L language software to handle any combination of these different language characters. Therefore, XML can not only exchange information between different computer systems, but also exchange information in cross-border and transcending different cultural borders. 3. XML Application XML in Web Data Mining has become a formal specification, and developers can mark and exchange data with XML format. XML provides a good method for data processing on a three-layer architecture. Using the available three-layer models, XML can generate from the existing data, and data using XML can be separated from commercial specifications and expressions. Promoting XML applications are WEB applications that cannot be done with standard HTML. These applications can be divided into the following four categories: Web clients are required to communicate between two or more heterogeneous databases; trying to transfer most of the processing load from the web server to the web client application; A web client requires the same data to provide different users in different browsing form; requires the intelligent web proxy to cut information content according to the needs of the individual users. Obviously, these applications and Web data mining technologies have important links, and web-based data mining must rely on them. XML gives powerful features and flexibility to web-based applications, so it brings many benefits to developers and users. For example, a more meaningful search, and web data can be uniquely identified by XML. Without XML, search software must understand how each database is built, but this is actually impossible because each database describes the format of the data is almost different. Due to the existence of integration of different source data, the search for a variety of incompatible databases now is actually impossible.
XML enables structured data from different sources to be easily combined. Software agents can integrate data from the backend database and other applications on the intermediate layer. The data can then be sent to the customer or other server for further collection, processing, and distribution. The extensibility and flexibility of XML allows it to describe data in different kinds of applications, from the description of the web page to the data record, thus obtaining data by a number of applications. At the same time, since XML-based data is self-description, data does not need to be exchanged and processed internally. With XML, users can easily perform local computing and processing, and the data of the XML format is sent to the customer, the customer can use the application software to parse the data and edit and process the data. Users can process data in different ways, not just to display it. XML Document Object Mode (DOM) allows you to handle data with scripts or other programming languages, and data calculations can be done without returning to the server. XML can be utilized to separate the interface of the user to watch data, using simple? Flexible open format, you can create powerful applications to the Web, and these software can only be built on high-end databases. In addition, after the data is sent to the desktop, it can be displayed in a variety of ways. XML can also describe the structured data in a simple open extension, XML supplements HTML, is widely used to describe the user interface. HTML describes the appearance of the data, while XML describes the data itself. Since the data display and content is opened, the data defined by XML allows the specified display mode to make the data more reasonably. Locally data can dynamically manifest by customer configuration, user selection, or other standards. CSS and XSL provide an announcement mechanism for the display of data. With XML, the data can be updated granularly. Whenever a part of the data varies, it is not necessary to re-send the entire structured data. The changing element must be sent from the server to the customer, and the changed data does not need to refresh the entire home interface. But at present, as long as a data has changed, all pages must be rebuilt. This strictly limits the upgrade performance of the server. XML also allows additional data, such as predicted temperatures. The added information can enter the existing page and does not require a browser to re-send a new page. XML Application When you need to interact with different data sources, data may come from different databases, which have their own different complex formats. But customers with these databases interact with one standard language, that is, XML. Due to the customity and scalability of XML, it is sufficient to express various types of data. After the customer receives the data, it can be processed or transmitted between different databases. In summary, in such applications, XML solves the unified interface problem of data. However, with other data delivery criteria? The same is that XML does not define specific specifications in data in the data file, but is attached to the data to express the logical structure and meaning of the data. This makes XML a specification that can be automatically understood by a program. XML is applied to a large number of computational loads on the client, ready to select and create different applications according to their own needs to handle data, and the server only needs to issue the same XML file. If the customer issues a different request to the server according to the traditional "client / server" mode of work, the server responds separately, which not only increases the load of the server itself, but network managers must also investigate a variety of different user needs. Corresponding to different programs, if the user's demand is complicated, all business logic is still concentrated in the server side, because the programmers of the server may not meet many application needs, and they will not keep up with the changes in demand. Both parties are very passive.