|
Ingenious
spam techniques require anti-spam firms to stay one step
ahead, says Ambarish Deshpande, regional director
India & SAARC, IronPort Systems.
The
volume of spam has been steadily increasing every year
since 2002. In addition to sheer volume, the sophistication
of spammer tactics has also grown. This flood of illegitimate
email is propelled by a powerful motive profit.
Spammers make money from selling a wide array of marginal
products ranging from herbal supplements, low interest
mortgages, and ergonomic computer products, to criminal
activities such as credit card fraud, pornography and
illegal pharmaceutical sales. The profits behind these
endeavours are being ploughed back into new technology
and infrastructure for delivering spam.
When
spam initially became a pandemic, corporations and networks
began to deploy first-generation spam filters. These filters
primarily relied upon heuristic analysis looking
at the words in a message and using a weighting system
to create a probability that the message was spam.
As
these anti-spam solutions became more widespread, spammers
began to develop new, more sophisticated tactics to circumvent
the filters. This spawned a cat and mouse game
in which spammers would develop a new tactic to get past
filters, then anti-spam vendors would add a new technique
to their "cocktail" to stop the spammers'', then
spammers would come out with a new tactic to get past
even these filters, etc.
Recently,
spam has been using increasingly sophisticated obfuscation
techniques and mutating faster than ever. Most spam now
includes blocks of text that contain words known to score
as "not spam" which are often technical
terms or a passage from a text book. Other tricks involve
using words with white on white text or replacing letters
with numbers. Spammers have keep becoming increasingly
smarter in using URLs. Some spam contains minimal content
but includes a URL with a call to action, while other
spam attacks host their spam URLs on the same servers
used by legitimate websites using free web hosting
services, like Geocities.
These
obfuscation techniques have effectively defeated most
content-based filters. While most vendors still claim
to have spam capture rates in the high 90''s, in reality,
their capture rate may be in the 80''s (or worse). At the
same time, content-based filters have the challenge of
occasionally deleting legitimate mail that happens to
contain words associated with spam creating a "false
positive". The table highlights the evolution of
spam filtering, along with the limitations of each of
the approaches.
|
Generation
|
Limitations
|
Example
|
|
1. Heuristics
|
Spoofable spammers
change words so filters dont recognize spam but
humans do. False positives legitimate email often
contains "spammy" words.
|
"C H EAP V.i.a.g.r.a"
|
|
2. Signatures
|
Spoofable Hashbusters
fool bulk detection systems by making spam look
dissimilar. Reactive writing signatures first requires
collecting spam samples.
|
"Cheap Viagra dgjk#"
|
|
3. Adaptive
|
Spoofable Defeated by inserting
good words that only machines see. High Overhead
learn ing systems, like bayesian, are hard to train/maintain.
|
"Cheap Viagra here:http://abc.comCancer,
office, Shakespeare."
|
|
4. Context Adaptive
|
Emerging Requires extensive
vendor investment in tracking email and Web reputation.
|
|
IronPort
Systems'' latest industry research shows an increased prevalence
of "image-based spam" an advanced technique
that spammers have adopted to evade detection. Image-based
spam bypasses both traditional content and signature scanning
and contains little or no text to analyse, instead including
a .gif or .jpeg file with an image.
The image contains the spam message in the form of text
and graphics, similar to an HTML email, making it difficult
for a machine to easily recognize the text. Image-based
spam has exploded-growing from less than 1 per cent of
all spam in June of 2005 to more than 12 per cent of all
spam in June 2006.
This
represents more than five billion image-based spam messages
sent per day 78 per cent of which pass right through
first- and second-generation spam filters. The study was
conducted using SenderBase data, which represents 25 pe
rcent of the worlds email traffic and data from more than
100,000 ISPs, universities, and corporations around the
world.
In
late 2005, spam volumes were still increasing, but the
growth rate began to decline from the 100 percent+ that
spam volumes had sustained for the two previous years.
But this respite was brief. Over the last six months,
spam volumes have resumed their hyper growth rates.
From
just two months between April 2006 to June 2006, spam
volumes have surged 40 per cent worldwide. At the same
time, spammers are focusing the intensity of their attacks.
When the sophisticated spammers launch a new wave of randomised
image spam, they will typically target a specific geographical
area, an ISP or even an enterprise.
When
this happens, as much as 50 percent of the incoming spam
at a corporation is image-based. If the filter protecting
that corporation is not equipped to detect and block these
highly sophisticated attacks, end-users are deluged with
spam for the duration of the attack, causing sever communication
disruptions and major productivity losses.
Today''s
spam attacks have become too sophisticated for earlier-generation
spam systems. These systems share a common weakness
relying heav-ily on analysing content that can easily
be manipulated by a spammer. State of the art anti-spam
systems must go beyond content analysis and analyse messages
in the full context in which they are sent.
Maintaining
leading efficacy also requires publishing high-quality
rules in near real time. Rule quality is driven by the
size, breadth, and quality of the data that feeds the
rule generation system. Finally, the most effective
rule development systems have humans in the loop
analyzing and responding to the last few percent of spam
messages that escaped automated defenses.
|