Exploiting Transport-Level Characteristics of Spam

Unknown author (2008-02-15)

In the arms race to secure electronic mail users and servers fromunsolicited messages (spam), the most successful solutions employtechniques that are difficult for spammers to circumvent. Thisresearch investigates the transport-layer characteristics ofemail in order to provide a new, novel and robust defense againstspam. We find that spam SMTP flows exhibit TCP behavior consistentwith traffic competing for link access, large round trip times andresource constrained hosts. Thus, SMTP flow characteristics providesufficient statistical power to differentiate between spam andlegitimate mail (ham). We build "SpamFlow" to learn and exploitthese differences. Using machine learning feature selection weidentify the most discriminatory flow properties and effect greaterthan 90% spam classification accuracy without content or reputationanalysis. SpamFlow correctly identifies 78% of the false negativesgenerated by a popular content filtering application -- demonstratingthe power in combining SpamFlow with existing techniques. Finally, weargue that SpamFlow is not easily subvertible due to economicand practical constraints inherent in sourcing spam.