HTMLStreamTokenizer is a HTML parser written by pure Java. HTML is processed into three types of tags, comments, and text, and Test on StreamTokenizer Class, but HTMLStreamTokenizer processes HTML Stream data streams, which can be used to process HTML files, below is one examples import adc.parser *;. // HtmlStreamTokenizer tok = new HtmlStreamTokenizer (inputstream); HtmlTag tag = new HtmlTag (); while (! tok.nextToken () = HtmlStreamTokenizer.TT_EOF) {int ttype = tok.getTokenType (); IF (TTYPE == HTMLSTREAMTOKENIZER.TT_TAG) {TOK.PARSETAG (Tok.getstringValue (), TAG); System.out.Println ("Tag:" tag.tostring ());} else == == HTMLSTREAMTOKENIZER. TT_Text) {system.out.println ("text:" tok.getstringValue ());} else if (ttype == htmlstreamTreamEnizer.tt_comment) {system.out.println ("Comment: ");}} download address http://sourceforge.net/projects/htmltok/