I have always wondered, how many years have been released, why are there so many email that does not meet the norms? One may be a server problem, two is the wrong programmer's fault. So I suddenly realized that it was not only the cold-blooded boss and aggressive customers to make the programmer's physical and mental devastation, and the unrecrimaneous or intention of peers was also intensive.
I have faced an emotion in the original text of the mail, and the customer complained that the text was displayed in the system when the email was received. This is a typical non-standard email: Content-Type: text / plain, no explanation of Charset, and the next body is directly unforgettable Chinese text. However, SUBJECT is in line with the normative (=? GB2312? B? XXXXXXX? =).
The colors of the colors have a lot of emails, the most common thing is that some header is not encoded, and some may be, Body is encoded and the Subject is not encoded. The most annoying is that the entire Email is not encoded.
I hate, the problem is still resolved, I have modified the code, the processing logic is as follows:
1. When you start resolving the message, first parse some headers that may have encoded information, and record it as headercharset; part of the code is as follows:
Private statin = pattern.Compile ("= //? (. ) //? (b | q) //? (. ) //? =", pattern.case_insensitive | Pattern.dotall);
Private final string [] chartset_header = new string [] {"Subject", "from", "to", "cc", "delivered-to"};
........
Enumeration enum = message.getmatchingheaderlines (chartset_header);
While (enum.hasmoreElements ()) {
String header = (string) enum.nextelement ();
Matcher M = encodestringpattern.matcher (header);
IF (m.find ()) {
this.Headcharset = M.Group (1);
Log.debug ("Guess Mail Charset IS" this.Headcharset;
Break;
}
}
2. Next to resolve the mail body, see if charset information is specified; if specified, record it is bodycharset; if not, use the headercharset, if HeaderCharset is also null, use the default charset. Usually ISO-8859-1.
3. Finally processes the mail header, if there is no Charset information, use the bodycharset, otherwise use the default Charset.
The above solution, as long as one of the Body or Header of the message provides encoding information, it may be possible to avoid garbled generation. If a thousand killed mail, Body is encoded with GB2312, and the Subject is not encoded. Then I can only sigh being defeated. If the entire message is not encoded, unless you determine that the email is the specified encoding and transcoding, only the heavens live. Finally, I still have to call for, please follow the specification!