Remotely get the server HTML text, and resolve its contents (initial)

zhaozj2021-02-16  72

Basic Technologies: Winsocket Communication COM objects based on TCP protocol (specifically MSHTML)

Basic process: Get the desired HTML document with the server-side communication with the server-side-based socket, with the server.

Then by using the interface provided by the MSHTML object, the target document is analyzed, and the corresponding element is extracted.

Content. PS: Socket communication method is blocked.

Pseudo code: WSAStartup () (Initialization of Winsock Network Protocol)

||

getHostByname () (parse the resulting domain name, get the server's IP address, the port is generally the default 80)

||

Socket () (Create a socket object for connection communication credit)

||

Connect () (connected to the server via the Socket)

||

Constructing the "GET" package (the format of the full GET command package is as follows: get http://bbs.uestc.edu.cn/main.html / r / n)

||

Send () (send a request package to the server)

||

RECV () (get specified on the server, and deposit into the data buffer)

||

Shutdown (); wsacleanup () (close connection, clear socket resources)

||

Coinitialize () (Initializing the COM library, preparing for using COM objects - MSHTML, this can be placed

COM object anywhere before

|| mshtml :: htmldocument2ptr pdoc; mshtml :: htmldocument3ptr pdoc3; mshtml :: htmlelementcollectionptr pcollection; mshtml :: htmlelementptr pelement;

CoCreateInstance () (Create a MSHTML object, and send an Document interface pointer)

SafeArrayCreatevector () (stored the content of the obtained document data buffer into the SafeArray and read into the document);

||

Pcollection = PDOC-> getElementsBytagname (L "P") (Getting the elements "P" in the document)

||

Then traverse Pcollection, take out what needs

(Several key interfaces: pelement = pcollection-> Item (i, (long) 0);

Pelement-> GetAttribute ("ID", 2)

BSTR TMP; Pelement-> Get_INNNNERHTML (& TMP);

And note: The string used by COM is based on Unicode, pay attention to the conversion of the string:

_COM_UTIL :: ConvertBSTRTSTRING () (Convert Unicode into Asible)

The above is a simple process, the details of the function, check the MSDN.

转载请注明原文地址:https://www.9cbs.com/read-15161.html

New Post(0)