تکنیک مبتنی بر ساختار برای تشخیص هرزنامه و طبقه بندی ایمیل
Abstract: Many techniques are available to combat the spread of unwanted emails and online spams. One popular technique is content-based Bayesian filters. Spammers have found techniques to defeat these filters. A structure-based anti-spam technique uses a different approach to the spam problem by checking for the structure of a message instead of its content. The structure of an email is extracted from the DOM (Document Object Model) of the HTML (Hyper Text Markup Language) in the email. We implemented a tree-based comparison and quadratic weighted level scoring system to find similarities between emails. This method is used for email classification so that similar emails can be grouped together. Upon classification of an email, we compared the domain of the email to the whitelisted domains. If the domains do not match we label the email as a spam. The experimental results showed a high success rate of spam detection and email classification.
Keywords: Bayesian filter,Document object model,Email classification,Phising,Spam detection, Structured based technique