Introduction to several ways to implement and interact with IE browser

zhaozj2021-02-16  63

Introduction to several methods of implementing and interacting IE browser ---- 1. Introduction ---- How to implement the operation of the object in the IE browser is a very practical problem, the DLL bound to IE, we can record the order of the webpage browsed by IE, analyze the user's behavior and mode. We can filter and translate the contents of the webpage. You can automatically fill in the webpage often need the FORM content that users to fill in. All of our examples are expressed by VC. The principle adopted is through the interface of the IE object. Interaction to achieve access to IE. In fact, using COM technology, we know that COM is a binary object interaction with language-independent, so it is actually implemented in other languages, such as VB, Delphi, C Builder, etc. Wait. ---- 2. The IE instance traversal implementation - First let's see how the system knows how many IEs currently run. ---- We know that an application can interact with these applications under the Windows Architecture. However, IE current implementation mechanisms are not registered in the run object table, so other methods are required. We know that you can represent the collection of the current open window belonging to the shell through the shellwindows collection, and IE is an application belonging to the shell. ---- Below we describe how to use VC to spread the current IE instance. IShellWindows is an interface on the system shell, we can define a variable interfaces as follows: SHDocVw :: IShellWindowsPtr m_spSHWinds; instance and then create variables: m_spSHWinds.CreateInstance (__uuidof (SHDocVw :: ShellWindows)); GetCount method by IShellWindows interface can be Number of current examples: long ncount = m_spshwinds-> getCount (); method ITEM can get every instance object IDispatchptr SPDisp; _variant_t va (i, vt_i4); spdisp = m_spshwinds-> item (va); then We can determine that the instance object is not a IE browser object, implemented by the following statement: shdocvw :: warebbrowser2ptr spbrowser (spbrowser! = Null) ---- After getting the IE browser object, we can Call the iWebBrowser2PTR interface to get the pointer of the current document object: mshtml :: htmldocument2ptr spdoc (spbrowser-> getDocument ()); ---- then we can operate this document object through this interface, such as getting gettitle The title of the document. ---- We will always open a lot of IE instances at the time of browsing the network. If these pages are very good, we may want to save on the hard disk, so we need to save every instance, and If we use the above principle, we can get an instance of each IE and its web object, so you can save all the currently open web pages through a simple program. The above introduction method implements the traversal of the current IE instance, but we want to get an event generated by each IE instance, which requires the DLL mechanism. ---- 3. The implementation of the DLL bonded to the IE - we introduce how to create and IE the process of the DLL implementation.

In order to bind to the running instance of IE, we need to establish a DLL that can be bound to each IE instance. The startup process of IE is this. When each IE is started, it will look for this CLSID in the registry, the specific registry key position is: hkey_locall_machine / Software / Microsoft / Windows / CurrentVersion / Explorer / Browser Helper Objects ---- When there is a CLSIDS in this key location, IE will create an instance of each object listed in this key by using the CocreateInstance () method. Note that the CLSIDS of the object must be expressed in the form of sub-keys rather than named values, such as {DD41D66E-CE4F-11D2-8DA9-00A0249EABF4} is a valid subkey. The reason why we use the form of DLL instead of EXE is because the DLL and IE instances are running in the same process space. Each such form of the DLL must implement interface IObjectWithsite, where the method setsite must be implemented. With this method, our own DLL can get a pointer to iUnknown to the IE COM object. In fact, we can pass this pointer to the method QueryInterface in the COM object queries, this is the basic COM's basic mechanism. Of course, we need only this interface of IWebBrowser2. ---- In fact, we build a COM object, DLL is only a form of expression of COM objects. Our COM objects need to be established and implemented: ---- 1. The method setsite of the IoleObjectwithSite interface must be implemented. In fact, IE instance passes the pointer to our COM object through this method. Suppose we have a variable of an interface pointer, you may wish to set it: 2. After we get the interface to the IE COM object, we need to connect the event that happen to the IE instance. In order to achieve this, you need to introduce two interfaces: ---- (1) iconnectionPointContainer. The purpose of this interface here is to establish a specific connection to the DLL according to the IID it gets. For example, we can do the following definition: ccomqiptr spcpcontainer (m_mywebbrowser2); ---- then, we need to communicate all the events that occur in all IE, you can use IconnectPoint. ---- (2) iconnectPoint. With this interface, customers can start or terminate an Advisory loop on the objects. IConnectPoint has two main methods, one for advice, and another is unadvise. For our application, Advise is used to create a channel between events and DLLs that occur in each IE. Unadvise is used to terminate the notification relationship established by Advise.

For example, we can define the iconnectPoint interface as follows: ccomptr spconnectionpoint; ---- then we must make all events that occur in IE instances and our DLL, you can use the following method: hr = spcpcontainer-> FindConnectionPoint DiID_dwebbrowserevents2, & spconnectionpoint; ---- Then we pass the iConnectPoint interface method Advice to make our DLL know every time IE has a new event. You can use the following statement: hr = spConnectionPoint-> Advise (iDispatch *) this, & m_dwidcode); ---- We can pass the IdisPatch interface after connecting the events in the IE instance and our DLL Invoke () method to process all IE events. ---- 3. Idispatch interface invoke () method. IDispatch is a type of an interface inherited from iUnknown, any service provided through the COM interface can be implemented by the iDispatch interface. Idispatch :: Invoke works with the VTBL behind the scenes, INVOKE will implement a set of functions to access by an index, we can make dynamic customization of the Invoke method to provide different services. Represents Invoke method is as follows: STDMETHOD (Invoke) (DISPID dispidMember, REFIID riid, LCID lcid, WORD wFlags, DISPPARAMS * pdispparams, VARIANT * pvarResult, EXCEPINFO * pexcepinfo, UINT * puArgErr); ---- wherein, DISPID is a long Integer, it identifies a function. DISPID is unique for a particular implementation of Idispatch. Every implementation of IDispatch has its own IID, where dispidmemeber can actually be considered to be related to each event that occurs in the IE instance, such as DISPID_BEFORENAVIGATE2, DISPID_NAVIGATECOMPLETE2, etc. Another important parameter in this method is Dispparams, which is as follows: typedef struct tagdispparams {variantarg * rgvarg; // variantarg is the same as Varaiant, can be found in //oaidl.idl. So actually rgvarg is a number of parameters // group DISPID * RGDISPIDNAMEARGS; // DISPID UNSIGNED INT cargs; / / indicate the number of elements in the array Unsigned int cnameargs; // Number of naming elements} Disprams --- - Note that the type of each parameter is VarianTarg, so the number of parameter types that can be passed between IE and our DLL is limited. Only those types that can be placed in the VarianTarg structure can be passed through the scheduling interface. For example, for events DISPID_NAVIGATECOMPLETE2: The first parameter represents the value of the URL accessible in IE, the type is vt_byref | vt_variant. Note Dispid_navigateComplete2, etc. DISPID has been defined in the VC, we can use it directly.

As described above, we can get the events that occur in all IE instances in the method invoke, and we can put these data into the file after the analysis, or it can be placed in real time in a list box. ---- 4. Microsoft's HTML Document Object Model and Application Analysis ---- Let's see how to get the web document: The interface of the web document is IHTMLDocument2, which can get the interface of the web page by calling the Get_Document method of the IE COM object. Use the following statement: hr = m_spwebbrowser2-> get_document (& SPDISP); ccomqiptr sphtml; Sphtml = SPDisp; --- In this way, we get the interface of the web object, then we can analyze the web page, For example, however, how to get the address value of the URL associated with the web page, through the Get_Forms method, through the GET_FORMS method, can be collected from all FORM objects in the web page. In fact, the W3C organization has developed a DOM (Document Object Model) standard. Of course, this standard is not only for HTML, but also for XML. W3C organizations only define interface objects, different companies can perform specific implementations in different languages ​​and methods. According to the web object defined by the W3C organization, it is considered to be dynamic, that is, the user can perform each object contained in the web object. The object here can refer to an input box or an object such as an image and sound. At the same time, the web page object can be dynamically increased and deleted according to the formal documentation of the W3C. In fact, there are very few vendors to implement all the features of the DOM definition. Microsoft's definition of web page object is basically implemented in accordance with this standard. However, the current interface does not support dynamic increase and deletion elements, but the basic elements in the web page can be modified. For example, IHTMLELEMENTCOLLECTION indicates a collection of some basic elements in the web page, and IHTMLELEMENT represents one of the basic elements in the web page. And like the IHTMLOPTIONELEMENT interface represents a specific element Option. Basic elements have the SetAttribute and GeAttribute methods to dynamically set up and get the names and values ​​of the elements. ---- More common application is what we can analyze if there is a need to fill in in the web page. If the Forms of this URL has been filled out before, we have saved it, we can automatically put the data automatically and the URL. Go to the relevant location of the Forms. In addition, we can summarize the FORM data items that need to be filled in, first assign the data items, and then fill in the content we assign us when you encounter the same data items. In fact, FORM is an object, the elements contained in the Form, such as input elements such as input, option, and select are objects. ---- Another application that can be thought of is to translate the text in the webpage, because we can modify the properties of any object in the web page, so we can automatically translate the other language in the national language into the national language, of course true The implementation also relys on the natural language understanding of technology breakthroughs, but the IE interface and object form make us flexible control of the entire IE, whether from event objects or web objects. ---- 5. Summary ---- Above we analyzed how to get all IE examples, and introduce the detailed implementation mechanism of the DLL bundled with the IE instance, and analyzed the objectization of the web page. And introduced several related applications and implementation methods and technical issues.

转载请注明原文地址:https://www.9cbs.com/read-20159.html

New Post(0)