So a routine search turned up a little Research Paper from the University of Nebraska in Omaha.
While I’m flattered by being used as a creditable source, I am upset that they:
- Used the Scamdex Email Archive without permission.
- Did not contact Scamdex to get permission.
- Used ‘Screen Scraping’ tools to (in their words)
…To obtain a corpus of phishing emails, we scraped 2709 emails from Scamdex.com (“Email Scam, Internet Fraud, IdentityTheft & Phishing Resource,” n.d.). This corpus contained emails over a 3-year period from November 2006 to June 2009.These emails were submitted to Scamdex by recipients of phishing attacks..
- Did not credit Scamdex in their references.
The legality of screen-scraping, a term used for software tools that extensively mine or extract information or complete contents of a website, is debatable – Generally speaking, if commercial use is made of the result then it gets a bit tricky, but for research purposes a lot more latitude is generally given. The Electronic Frontier Foundation has a good one-pager on Fair Use.
If asked, Scamdex would have been completely happy to collaborate. We do ask (nicely) that …
“Any derived content from the Scamdex.com website must clearly show attribution to Scamdex.com as the source and must include a link to the original information”. –http://www.scamdex.com/About-Scamdex.php#use
Scamdex is happy to be used as a research tool, but in future – ask first, then make sure it is credited – is that too much to ask for?