Cute python: combating spam with havehcash
content:
How does the hardcash basic knowledge behcash works in an email? Hashcash's other application universal Hashcash and my contribution end-speaking reference materials about the author's evaluation
related information:
Cryptography: Part 1 Cryptography: Part 2 Cryptography: Part 3 Use spamassassin to eliminate spam Roaming Charges: Trouble EverydayDeveloperWorks All lovely Python columns
subscription:
DeveloperWorks News DeveloperWorks Subscribe (Subscribe CD and Download)
If you want to send spam, you have to pay a price.
Level: Intermediate
David mertz, ph.d.d.d.d.
(Mertz@gnosis.cx) Developer, Gnosis Software, Inc. November 2004
Hashcash is a clever system based on wide application SHA-1 algorithm that makes the requester to perform a large number of parameterized work, and the evaluation program can still be "cheap". In other words, the sender has to do some practical work in order to put some content into your inbox. You can of course use Hashcash to prevent spam, but it also has other applications, including the Wiki to prevent spam from spam and accelerate distributed parallel applications. In this article, you will be exposed to David's own Python-based Hashcash implementation.
Hashcash.org Web Site (see Resources) Note that the main function of the Hashcash system is as a spam filtering protocol:
Hashcash is a Denial-of-Service Counter meter meter. The main role of current is to help havehcash users from being lost because of the use of content and black-based (BlackList-based) anti-spam systems.
However, I think this technology has a wide range of applicability, not only for email. This article will also introduce this technology in email filtration, and will provide it in other ways. The article will introduce my own Hashcash implementation (it seems to be the first Python version), and the Hashcash.org site now has this implementation. David McNAB created a Python implementation, the protocol used by the implementation is not particularly similar to Hashcash; other developers have also created a partial implementation of the incomplete Pytyhon version of Hashcash. However, before starting these topics, let's review what is Hashcash. Hashcash Basics Hashcash's inspiration comes from such an idea, that is, some mathematical results are difficult to discover and easy to check. A well-known example is the factor to decompose a large number (especially fewer factors). The cost of multiplying some numbers to get their accommodation is inexpensive (after all, the CPU cycle is money), but first find those factors, and this operation is much higher. The RSA public key cryptographic system is based on this factor decomposition feature. If the respondent can answer the CHALLEENGE, it means that he has made considerable work (or sneakably got the factor from the generated person). For interactive challenge, the decomposition is sufficient. For example, I have an online resource, I hope you can make a price for it. I can send you a message, saying "As long as you can decompose this number, I will let you get this resource." People who do not have sincerity will not get my resources, only those who can prove their own interests, pay some CPU cycles to answer this resource. Non-interactive challenges However, some resources are unable to make interactive consultations. My email inbox is a resource I have very much attention. However, the news that the news occupied some of my disk space and bandwidth, the worst, they attracted my attention. I don't mind that strangers write to me, but I hope they can take a touch of email with me with a little serious attitude. At least, I don't want them to be spammers, those people send me a message containing the same message from others, and expect some of us to buy some kind of product or fall into a scam. In order to achieve non-interse "payment", Hashcash let me distribute a standard question to all people who want to send me emails. In your message head, you must include a legal HashCash stamp; specifically, the logo contains my recipient address. Hashcash's way of challenge is that when the hash by secure hash algorithm, "MINTERS" is required to generate a string (stamp, stamps), and there are many leading zero in their hash. The number of leading zeries found is the bit value of a specific stamp. The consistency and encryption intensity of the SHA-1 are given, the only known way to find a given ratio is the average 2 ^ b runs SHA-1. However, to confirm a stamp, only the SHA-1 calculation is required.
For applications in emails, currently recommended is 20-bit value: in order to find a legitimate stamp, the sender needs to perform approximately one million attempts, on the latest CPU and compiled applications, this It will take less than one second. It only takes only a few seconds that only take a few seconds on a relatively old machine. Although we have begun to discuss Bashcash's basic knowledge, let us first appreciate the powerful function of the SHA algorithm before continuing to discuss. How strong SHA is? In an event that has been proven to be significant in the password boundary, a collision of SHA-0 (see Refer to the link to Pascal Junod email, it gives the details of the actual collision ). The attack used is approximately 2 ^ 51, far less than about 2 ^ 80 (and storage space) required for the violent construction collision we expect (Birthday Paradox); About Birthday Paradox And how to apply it to more information of the hash function, see References). Before you are too worried about this type of attack related to Bashcash, you should keep it two points: First, this method attack is SHA-0 Not SHA-1 (not currently). Another related guarantee is that in the current fastest CPU, the time required for 2 ^ 51 will still exceed 9 CPU year. Even similar methods can be applied to SHA- 1. Constructing the price of a false collision is not less than the construct a larger number of 20-bit stamps (or even 40-bit Hashcash pokes). Go back to our previous discussion. Hashcash (version 1) has only one specific SHA The -1 hash value is not enough. We also want to stamp the requested resource - that is, stamps for mertz@gnosis.cx should have different applications with stamps used for Someuser@yahoo.com. If this is not the case, the spam manufacturer can only generate a high-bit value stamp and use it everywhere. Once the stamp is generated, I don't want every spam manufacturer who wants to send me an email to share it. Therefore, Hashcash uses two additional steps (or at least it should be suggested as part of the protocol): First, poke a date. The user may decide to be illegal than a specific period of time. Second, Hashcash customers Machine possibilities (and mostly should) implement a Double Spend database. In the Double Spend database, each stamp can only be used; if it receives it, then it is illegal (very similar to stamps) It will be tagged later. Specifically, Hashcash (Version 1) stamps similar to the following code: 1: Bits: Date: Resource: EXT: SALT: SUFFIX stamp includes 7 domains.
The version number (version 0 is simpler, but there are some limitations). The bit value declared. If the stamp does not really use the preamble zero bits that are declared, then it is illegal. Date of generating stamps (and time). The stamps after the current time can be considered and those who have been illegal for a long time ago. Which resource is made to be generated. It may be an email address, but it may be a URI or other named resource. The specific application may need to extension. Any additional data can be placed here, but in the current use, this domain is usually empty. The stochastic factor (SALT) is distinguished from the stamp of the same resource as the same resource as the same resource. For example, two different people can email my same address to my same address in the same day. They should not send success because I have used the Double Spend database. However, if each of them use a random factor, then a complete stamp will be different. The suffix is part of the algorithm to actually work. It is assumed that the top 6 domains have been given, in order to generate a stamp that is hashed by the desired number of preamble, MINTER must try a lot of consecutive rest 4. Now let us see how Bashcash works in email. Bashcash works in an ideal world in an email, all senders should include Bashcash tags in their messages; the recipient will check their legitimacy when receiving. However, in actual life, Hashcash has not yet been so extensive. Even so, starting using Bashcash (whether as a sender or as a recipient) does not have any impact on existing email tools. In other words, use Bashcash in an email, you don't have any losses. In order to add a stamp to the message, you only need to add a header file to the email: Each to: or CC: the recipient's X-Hashcash header header. For example, someone who wants to send me a message may contain a header file similar to an example RFC2822 header file in the message: X-Hashcash: 1: 20: 040927: mertz@gnosis.ca28 obvious, This should be made by MUA (Mail User Agent, Mail User Agents), filter or MTA (Mail Transport Agent, Mail Transport Agents), rather than requiring users to complete. However, it is not too difficult to finish, and at least the experiment. First, check it by viewing a stamp, as follows: $ echo -n 1: 20: 040927: mertz@gnosis.cx :: odvzhqmp: 7ca28 | SHA00000B50B85A61E7BA8AC4D5FED317C737706AE5 Watch the preamble zero (every hexadecimal number It is 4 bits). Of course, it is also necessary to check which resource is the resource you identified (such as one of your recipient addresses), that stamp has not been used, the date is the current date. In addition, a legal stamp has the number of preamble zeries should be the same as their statement (but you can decide to enforce your own allowable email) The minimum price: 20 bits are an incomplete standard (SEMI-STANDARD), It can eventually change as Moore's law). Why do this work? Generating a 20-bit stamp is only a few seconds. This price is not big when you only send dozens of emails one day.
However, for those spam makers who want to send millions of messages, they cannot tolerate each message using an additional second CPU time. There is only 86,400 seconds in one day. Even if spam makers use the Zombies of Trojans, it is necessary to use specific Hashcash stamps, which will also reduce the amount of zombie processes. Of course, the time required to verify a stamp is only a small part of one second. On the other hand, add Hashcash generation and check to your own MUA, there is no negative impact on other all people (unlike some other anti-spam methods). For those recipients that do not use the protocol, these are just an additional header file they easily ignore. For those senders who have not added a Hashcashstost, the recipients of the X-Hashcash: do not have to verify anything. If the sender does not add a stamp, then your situation will not become worse because of the inspection; it will not become better. A good MUA or spam filtration system can include an email with legal Hashcash stamps in White List (WhiteList). Spamassassin even more cleverly provides a higher VE score for more legal Hashcash bits. I think Bashcash-based methods is applied to the white list is an improvement to the interactive challenge system such as TMDA - the challenge message will not be lost when returned, and the sender will not forget the response question. The question response is among the original news (as a Hashcash poke). Hashcash's other application of Hashcash is most practical to non-interactive challenge. However, there is no reason such that it cannot be used in interactive context. As more tools add support to Hashcash, especially multi-purpose applications such as Mozilla kits, using Bashcash under interactive and non-interactive conditions, it is also simpler. For example, if the Thunderbird Mail Tool gets the API call for Hashcash calculations, it should directly let its subsidiary tool Firefox web browser to respond to interactive challenge with an API that generates a Hashcash stamp.
What is wiki? Wiki is "the simplest online database that can be run". It supports hyperlinks and simple text syntax processing designed to dynamically create a new page and a page cross-link. Wiki is server software, allowing users to use browers to freely build and edit the contents of the web page, providing an "Open Edit" service, thus promoting a unusual group communication mechanism. It not only allows all users to edit the content, but also allows users to edit organizations that contribute to pages or sites. To learn more about Wiki, see the link on "What is Wiki" in the reference. Protecting Wikiwiki sometimes encounters very similar destruction with spam, Bashcash seems to be a nice solution in non-email context. Since Wiki is usually opened to anyone, one of the disasters of Wiki communities is Wiki-crawling destruction procedures, which add some unrelated business links to Wiki sites. A Wiki I helped maintenance has recently been maliciously destroyed, forcing us to make some unwelcome responses, requiring all posters to have a user account. These accounts are given based on the colleagues and returns a message that has been received with the random key based on the question that automatically uses the email. However, requiring such an account fundamentally speaking to the Wiki spirit is contrary. Adding a Hashcash question does not prevent automatic damage to the Wiki site, but it can make the destruction behavior slower. If it takes a few seconds that destroys a site, not a second in a second, then retrieve Wiki to find out useless information is not so compelling. In fact, I think in this application, the transfer rate of greater than 20-bits is a good idea. Perhaps 24 bits or 28 bits are reasonable loads (users who have logged in can still avoid it). You may think that when you accept Wiki editing, normal time delays have similar effects, but this kind of thinking is a vulnerability. The destroyer can parallelize its destruction behavior - for example, if each site adds 5 seconds of delay, the destroyer can use this 5 second time to start modifying other Wiki on the list. By requesting the utilization of effective CPUs, such as using Bashcash, the destroyer can no longer be destroyed in parallel. Wiki challenge can be interactive or non-interactive. The site can direct the user to a question screen before booting the user to the actual editing screen. A random resource can be generated as a question of this protection screen. However, a better way is to make this request with non-interaction. For example, in an existing Wiki system, you can edit a resource using the URL shown below: http://somewhere.net/wiki matption=edit&id=sometopic Wiki protects using Bashcash Among you, you may need to use different URLs, such as http://somewhere.net/wiki?stamp=1:Net/wiki?stamp = 1:24:040928:Sometopic: ^ 4 :KG4E9PAK2VLJKM2Z:0000ZBRC before allowing editing, Wiki servers can verify the stamp. However, editing does not need to create an account and disclose any personal information. Double Spending and (shorter durations are short) provide a guarantee for behavior that truly editors. For me, it is not difficult to generate the above URL. Use the following command: Hashcash -mcb 24 -x Edit Sometopic However, in order to ensure fewer delays, the web browser may choose to generate similar in the background. stamp.
For example, when I am reading a resource, the above URL may have created in the cache: http://somewhere.net/wiki?sometopic may also cache some other editing stamps, use them for the current wiki page links page. Testing a interactive application of CPU resource Hashcash may be used in distributed processing tasks. Some projects (such as Great Internet Mersenne Prime Search (GREAT Internet Mersenne Prime Search (Great Internet Mersenne Prime Search (Great Internet Mersenne Prime Search (Great Internet Mersenne Prime Search (GIMPS)) and SETI @ Home and its tasks (such as protein folding and passwords) sometimes borrow a large number of volunteer machines, only lists of minority items and tasks name. Each volunteer only needs to download some code and run it as a part of a big task, and then send the intermediate calculation to the central server. These jobs are excellent use of idle CPU cycles. All distributed tasks I know almost all allow anyone to join. However, it is not difficult to imagine that for the task with collaborative requirements, if a node does not complete its task in the expected time period, the sluggish node of this action is more than what is more contributing to the overall calculation. In this case, each participating node should be required to have a minimum CPU speed. Although the calculation of the specific type is more accurate, but Hashcash provides a relatively universal CPU baseline. SHA-1 is a very typical mathematical calculation. If the participation node has installed Hashcash (not some custom software tools), then the answer to the Hashcash question can be used as a "you must reach a height to go to the building (You Must Be this Tall to Enter this Ride) style Calibration. The method of checking the CPU capability is required to obtain a high bit value in the short term. Only enough CPUs can answer this question. To this end, it is necessary to provide a semi-interactive resource name. Otherwise, participants can pass the date of their datestamps, and create a fast faster intensity. For example, a fast Pentium III or G4 can generate a 20-bit stamp within less than one second, but Pentium-II or G3 cannot do. We can assume a 32-bit challenge, and the candidate machine for the trial run must answer it within an hour. The requester may send an e-mail, say: "Send a question to me"; collaborative server respond: "Time is 040927124732; Query resources are A37TQK." If the server gets a correct one before 1:47 in the afternoon The hash, then the requester will get the qualification to access the resource. Obviously, the agreement I suggested cannot be ensured that you can really complete your job on each node. Even the fastest machine may also occur accidents. Users may change their ideas for running distributed software. However, at least it can prove that it seems trusted qualifications. General Hashcash and my contribution come from the Hashcash concept, the use of specific domains and separators is arbitrary in some extent. In fact, the Hashcash version 0 uses a different domain with version 1. These options are very good, but I think "actual hashcash" is just a member of a family, and we may call this family "General Hashcash." That is, as long as the question string is given, the following requirements can be reasonably proposed: "Give me a suffix, once Challenge Suffix is hashed, it will generate B-bit collision." Real Hashcash is just an example of this generic challenge.
Now, there is indeed too universal issue. Creating a lot of incompatible, approximate Bashcash agreements don't have any benefits. For example, there is a "havehcash" Python implementation, using a similar question agreement with Bashcash (which may be used for encryption value), but hardly use it to generate a Hashcash poke. So, I decided to prepare a python implementation that truly adapted Bashcash, which can even accept the same command line switch that is approximately the same as the Hashcash tool written with C (However, it may be the most practical as an import module for other applications). Even on the platform of the help (just a little bit) of the help (just a little bit), the Python version is 10 times lower than that of the optimized C version. However, compared with C, it can still win in flexibility. In addition to correct, my Hashcash.py module also provides an internal function_Mint () and a public function MINT (). The latter generates a real Hashcash version 1 poke. That is what you should use. However, the former, ie _mint (), completed the underlying work of looking for the Generalized Hashcash suffix. You may not use it, however, if you want to use it (and guarantees you to use it carefully), it is there, you can use it. In different ordinary context, Bashcash variaries may be practical. In any case, I hope that the C tool has a similar switch, even if there is a danger warning that you should do that should not do that, they can also find a universal Hashcash suffix. Our computer hacker likes to go deep into things inside. Conclusion I hope this article has enabled you to generally understand the possible Bashcash application. I think the question agreement introduced earlier is an extremely clever idea. The challenge now is how to get more tools that can have more seamlessly handle Bashcashstosts. There are many MUA, MTA, and spam filtration tools that have been done well in terms of Bashcash, but there is still a significant gap between them. There is almost no non-email app to use Bashcash. However, I believe this concept is attractive. If the importance of this concept is increasing, it will provide a method of adjusting access to electronic resources with free software and open standards, which will not let us fall into Digital Restrictions Management, DRM, information Commercial and common privacy leaks. Reference
You can see this article in our website on our world. Please visit the Hashcash.org Web site. David always loves the reference material is wikipedia, it has a Hashcash section. To learn about Wiki, you first know what is Wiki. Birthday paradox is the only paradox that understands and usually intuition. Read more about Wikipedia information. For details on the collision of SHA-0, see Emails of Pascal Junod in the PaSCAL ARCHIVE. Introduction to the Guide Cryptography: Part 1 (DeveloperWorks, Jan 2001) introduced you to password and its technology, mathematics, and conceptual basis, terminology. Brief introduction: Part 2 (DeveloperWorks, February 2001) and Cryptography: Part 3 (DeveloperWorks, March 2001) is the continuation of the course. To fully understand the utility used to filter spam, read spamassassin to eliminate spam (developerWorks, Oct 2002). Tagged Message Delivery Agent (TMDA) is a spam filtering tool based on whitening instead of a blacklist; Hashcash can be integrated with TMDA. Download David's hashcash.py modules and scripts, the Python Realization of Hashcash Release 1. To go deep into Python, please read all cute Python column articles on other developerWorks written on David on developerWorks. In Roaming Charges: Trouble Everyday (DEVELOPERWORKS, Oct 2004), Larry Loeb describes a hash collision and studies the safety hash algorithm. ENHANCING E-mail security with S / MIME describes the role of SHA-1 algorithm for the S / MIME E-mail Security Protocol (E-mail-security Protocol). Lessons In Secure Messaging Using Domino 6 (DEVELOPERWORKS, July 2004) gives another view of the SHA-1 as the key role of the hash algorithm. Order the release of Linux books in the developer bookstore Linux column. From the free test version of developerWorks Speed-start your Linux app running on Linux zone download of IBM middleware products, including WebSphere® Studio Site Developer, WebSphere SDK for Web services, WebSphere Application Server, DB2® Universal Database Personal Developers Edition, Tivoli® Access Manager and Lotus® Domino Server. To get started more quickly, see how-to articles and technical support for each product. Join the developerWorks community by participating in developerWorks Blogs. In the developerWorks Linux zone, you can find more reference materials for Linux developers.