Vipul's Razor v2 README Vipul's Razor is a distributed, collaborative, spam detection and filtering network. Through user contribution, Razor establishes a distributed and constantly updating catalogue of spam in propagation that is consulted by email clients to filter out known spam. Detection is done with statistical and randomized signatures that efficiently spot mutating spam content. User input is validated through reputation assignments based on consensus on report and revoke assertions which in turn is used for computing confidence values associated with individual signatures. Vipul's Razor v2 agent software is available from project's homepage at http://razor.sf.net. Razor Agents are written in Perl and will work on most Unix operating systems and others OSes for which perl is available. Installation and usage instructions can be found in the INSTALL document in the distribution. Vipul's Razor v2 is almost a complete rewrite of Razor v1. The following is a list of the most significant new features: 1 New Protocol The Razor v2 protocol has been completely redesigned. The new protocol is based on exchange of _Structured Information Strings_, that are similar to URIs and can be parsed with URI decoding libraries. v2 protocol supports _Pipelining_, which means Razor Agents can keep a connection open with server to eliminate the latency introduced by TCP 3-way handshake and 4-way breakdown for every connection. The new protocol semantics allow seamless introduction of new signature schemes. 2 Ephemeral Signatures Ephemeral Signatures are short-lived signatures based on collaboratively computed random numbers. Ephemeral Signatures select a section of text from the spam message based on a random number that changes every so often. This makes the hashing scheme a moving target, and spammers can't exploit it because they don't know which part of the message will be hashed after the random number rollover. 3 Preprocessors Razor v2 supports several preprocessors. Preprocessors alter the the text of a spam before a hash is computed. This version includes preprocessors to decode Base64 encoded messages, decode QP encoded messages and convert HTML to plaintext. Spammers employ several techniques that hide mutations in various encoding. Preprocessors defeat such techniques by hashing the content that a recipient actually sees in his/her mail user agent. 4 Multiple Filteration Engines Razor v2 supports multiple engines. An engine is logical unit that encapsulates a particular type of filteration service. Razor v2 currently supports four engines - VR1 which is equivalent to Razor v1, VR2 that is based on SHA1 signatures of bodytext, VR3 that is based on Nilsimsa signatures, and VR4 based on Ephemeral hashes. New engines can be seamlessly plugged into the service as and when required. 5 Complete Backward Compatibility with Razor v1 The VR1 engine is functionally equivalent to the Razor v1 service and uses the same database. This means users who transition from v1 to v2 will still get the benefit of several million signatures known to the v1 service. 6 Base64 signature encoding Signatures are now encoded as base 64 numbers instead of base 16 (hex), reducing traffic that goes over the wire by 33%. 7 Truth Evaluation System (TeS) Razor v2 has a transparent, back-end component known as TeS. TeS is a combination of a reputation system and pattern recognition heuristics that assigns trust to reporters and confidence values (between 0-100) to every signature. Users can set an acceptable confidence level in their Razor configuration. The server also publishes a recommended confidence level. TeS has been designed to eliminate false positives of legit bulk email that were occasionally generated by bad reports in Razor v1. 8 Submission of entire spam messages Razor v2 accepts the entire body text of spam messages not previously known to the system. This lets Razor v2 compute new Ephemeral Signatures every n hours as well as seed the database whenever a new signature scheme and/or preprocessor is introduced. It should be noted that Razor v2 _does not_ accept contents of legit email during a check dialogue. Only signatures are sent when checking email. 9 Revocation Razor v2 allows users to revoke messages that they don't consider to be spam. Revocation input is fed into TeS, that adjusts the confidence value of a signature or remove it from the database as necessary. Revocation is done through a tool called razor-revoke, which is a part of the new Razor distribution. 10 Reporter Registration Razor v2 requires reporters to be registered. This lets reporters build a reputation over time, so their reports and revocations are weighed according to their reputation value. Report requires users to authenticate which is done using a CRAM-SHA1 authentication scheme. 11 Content classes Razor v2 introduces the concept of content classes. A content class is a set of messages that represents variations on the same content. As new reports come in, Nomination servers associate them to an existing content class, if a (close) match is found. Additionally, Razor v2 treats each MIME attachment is a separate content class, so spammers MIME attachment can be individually tracked (which is very useful in case of viruses). $Id: README,v 1.4 2005/06/28 22:19:07 jpr5 Exp $