Basic Technologies: Winsocket Communication COM objects based on TCP protocol (specifically MSHTML)
Basic process: Get the desired HTML document with the server-side communication with the server-side-based socket, with the server.
Then by using the interface provided by the MSHTML object, the target document is analyzed, and the corresponding element is extracted.
Content. PS: Socket communication method is blocked.
Pseudo code: WSAStartup () (Initialization of Winsock Network Protocol)
||
getHostByname () (parse the resulting domain name, get the server's IP address, the port is generally the default 80)
||
Socket () (Create a socket object for connection communication credit)
||
Connect () (connected to the server via the Socket)
||
Constructing the "GET" package (the format of the full GET command package is as follows: get http://bbs.uestc.edu.cn/main.html / r / n)
||
Send () (send a request package to the server)
||
RECV () (get specified on the server, and deposit into the data buffer)
||
Shutdown (); wsacleanup () (close connection, clear socket resources)
||
Coinitialize () (Initializing the COM library, preparing for using COM objects - MSHTML, this can be placed
COM object anywhere before
|| mshtml :: htmldocument2ptr pdoc; mshtml :: htmldocument3ptr pdoc3; mshtml :: htmlelementcollectionptr pcollection; mshtml :: htmlelementptr pelement;
CoCreateInstance () (Create a MSHTML object, and send an Document interface pointer)
SafeArrayCreatevector () (stored the content of the obtained document data buffer into the SafeArray and read into the document);
||
Pcollection = PDOC-> getElementsBytagname (L "P") (Getting the elements "P" in the document)
||
Then traverse Pcollection, take out what needs
(Several key interfaces: pelement = pcollection-> Item (i, (long) 0);
Pelement-> GetAttribute ("ID", 2)
BSTR TMP; Pelement-> Get_INNNNERHTML (& TMP);
And note: The string used by COM is based on Unicode, pay attention to the conversion of the string:
_COM_UTIL :: ConvertBSTRTSTRING () (Convert Unicode into Asible)
The above is a simple process, the details of the function, check the MSDN.