Spam origins and technical roots
The SMTP protocol itself is a simplified mail submission agreement, lacks a lot of necessary identity authentication, which is one of the consequences of SMTP protocols cause spam. Due to the SMTP protocol, the sender is allowed to fake most of the sender's feature information, such as sender, letters, etc., and even after the means of anonymous forwarding, open forwarding, and open agency, you can nearly completely erase spam. Sender character. At present, the vast majority of spam have fake the source of its real letter, which has caused great difficulties for discovering the spread of spam.
The SMTP protocol is still lacking some necessary behavioral controls, which cannot be effectively identified normal mail sending and spam sending behavior, which is the second cause of spam flooding. The transmission of spam is usually a certain behavioral feature, such as transmitting extremely large amounts of email in a shorter time, usually there is a specific communication feature or the like.
Anti-spam technical means either a certain defect in itself. For example, it cannot be absolutely accurate in the judgment of spam, or requires a lot of costs, etc., or because of the actual environment, it cannot be applied, such as can't completely Event the original SMTP protocol and use a new mail protocol that avoids spam generation and dissemination. Therefore, simple dependent technology means do not completely solve spam.
Analysis of anti-spam technology
We use the anti-spam firewall to analyze which techniques used by anti-spam firewalls. The following is the protection technology used by this anti-spam firewall:
1 Refused service attack and safety 2 IP block list 3 Rate control 4 Double virus scan 5 User custom rules 6 Spam fingerprint check 7 Mail intent analysis 8 Bayes smart analysis 9 Based score system 10 decompress file Virus protection
Due to rate control, viral scanning and decompression files for virus protection for viruses, we are not discussing, it is worth mentioning that the firewall prevents DDo attack and anti-spam firewall to prevent DOS attacks. of. Anti-spam firewall prevents DOS attacks mainly to prevent a large amount of spam from a short period of time to form a DOS attack.
In response to the core technology of spam, Bayesian intelligent analysis, spam fingerprint check, rule-based scoring system, user-defined rules, its core is Bayesian intelligent analysis, spam fingerprint inspection technology. Let's analyze anti-spam filtering technology one by one.
1 Spam fingerprint check
Talking about the fingerprint of spam, many people feel some mysterious, in fact, the so-called mail fingerprint is a combination of some strings in the message content, also known as snapshots. It is similar, but different information, identifies information that has been confirmed as spam. For example, if you are often spam this, you will not be unfamiliar with the words below: "Agent Service", "Admissions", "Cash", is it in your visit to see them?
In fact, this is the fingerprint of spam, and the idea of characteristic code identification of anti-virus technology is common. Anti-spam firewall identifies similar, but different information, identifies information which has been confirmed as spam, and finally complete identification of spam.
Of course, the accuracy of fingerprint inspection depends on the fingerprint library of spam. The anti-spam firewall gives a value to each character that appears in the email. It is worth mentioning that this value is determined in accordance with the territorial characteristics of specific garbage. Classify, then use the statistical method and then calculate a comprehensive value to this email. It is also possible to determine whether it is similar to other messages received many times (many emails that have been received many times is probably spam).
2 Bayes intelligent analysis
Bayesian intelligent analysis said that there is a suspicion of hipster, mainly by the poison of artificial intelligence courses during school, and the visual fatigue for the smart word full day, after all, if a technology can be hooked with smart, how much is high A lot. In fact, this intelligent analysis is a statistical law application. Of course, this statistical application does make anti-spam have smarter. Good gossip said more, we will not talk about Bayes the law today, directly introduce Bayesian anti-spam algorithm, through algorithm we can see that this intelligent analysis is actually a list of IP blocks, spam fingerprint check The statistical law combines the intelligent analysis of anti-spam. Bayesian anti-spam algorithm is as follows:
1) Collect a large amount of spam and non-spam, establish spam sets and non-spam sets. 2) Extract the number of token strings, such as ABC32, ¥ 234, such as ABC32, ¥ 234, etc., such as ABC32, ¥ 234, such as ABC32, ¥ 234, etc. Treat spam sets and non-spam concentrations in accordance with the methods described above. 3) Each mail set corresponds to a hash table, and hashtable_good corresponds to the non-spam set and has a spam set. The table is stored in the table to be mapped to the word frequency. 4) Calculate the probability p = (a token string of the word frequency) / (corresponding to the length of the hash table) / (corresponding to the length of the hash table) 5) Comprehensively consider HashTable_Good and HashTable_bad, inference to appear in the new email This new message is the probability of spam. Mathematical expression is: A event ---- mail is spam; T1, T2 ....tn represents token string P (a | ti) indicates the probability of spam as spam when the Token string Ti appears in the message. . Set P1 (Ti) = (Ti in HashTable_Good) P2 (Ti) = (TI) = (value in HashTable_BAD), P (A | Ti) = P1 (Ti) / [(P1 (Ti) P2 ( Ti)]; 6) Establish a new hash table Hashtable_Probability store maps of Token string Ti to P (a | ti) mapping
7) Then, the learning process of spam sets and non-spam sets ends. According to the established hash table, HashTable_Probability, it is estimated that a new mail is the possibility of spam as spam.
When new to an email, create a Token string in step 2). Query HashTable_Probability to get the key value of the Token string. It is assumed that the value corresponding to the N token strings, T1, T2 ... .tn, the value in the HashTable_Probability is P1, P2 ,. . . . . . Pn, P (A | T1, T2, T3 ... TN) indicates that multiple TOKEN strings T1, T2 ... when the mail is simultaneously displayed, and the email is the probability of spam. P (A | T1, T2, T3 ... TN) = (P1 * P2 * .... Pn) / [P1 * P2 *. . . . . Pn (1-p1) * (1-p2) *. . . (1-pn)] When p (a | t1, t2, t3 ... TN) exceeds a predetermined threshold, it can be judged that the message is spam.
Relationship between anti-spam firewall and firewall
The firewall is a broad style. From the perspective of practical applications, the firewall is to protect the internal network resources (such as WWW servers, file servers, etc.) from external security threats, by setting different levels and protective measures. Implementing internal network resources. Depending on the focus of the side it protects, the firewall can be divided into viral firewall, DDOS (distributed refusal service attack) firewall, spam firewall, etc. In short, anti-spam firewall is a dedicated firewall for anti-spam.
The firewall has a commonality from a work mode: analyzes the packet of the firewall, decided to release or block. In actual deployment, as a private spam firewall can be placed in front of the ordinary firewall, it is also the back of the firewall. It is recommended to put it behind logically and the mail server is the relationship between series.
a) Installing (or increasing) MX records outside of the firewall, is the MX record to point to anti-spam firewall, if there are two words, pointing to the MX record of anti-spam firewall has a high priority tall .
b) Installing the SMTP's NAT record in the firewall to point to anti-spam firewalls, do not need any changes to the server and client software (Outlook / Foxmail, etc.).
The basic knowledge of writing here to prevent spam technology and anti-spam firewall is over
Safety protection knowledge: core technical analysis of anti-spam firewall