Scamdex Data used in Research – if only they’d asked!

So a routine search turned up a little Research Paper from the University of Nebraska in Omaha.

Trends in Phishing Attacks: Suggestions for Future Research (2011) | Ryan M. Schuetzler | University of Nebraska at Omaha,

While I’m flattered by being used as a creditable source, I am upset that they:

  1. Used the Scamdex Email Archive without permission.
  2. Did not contact Scamdex to get permission.
  3. Used ‘Screen Scraping’ tools to (in their words)

    …To obtain a corpus of phishing emails, we scraped 2709 emails from (“Email Scam, Internet Fraud, IdentityTheft & Phishing Resource,” n.d.). This corpus contained emails over a 3-year period from November 2006 to June 2009.These emails were submitted to Scamdex by recipients of phishing attacks..

  4. Did not credit Scamdex in their references.

The legality of screen-scraping, a term used for software tools that extensively mine or extract information or complete contents of a website, is debatable – Generally speaking, if commercial use is made of the result then it gets a bit tricky, but for research purposes a lot more latitude is generally given. The Electronic Frontier Foundation has a good one-pager on Fair Use.

If asked, Scamdex would have been completely happy to collaborate. We do ask (nicely) that …

“Any derived content from the website must clearly show attribution to as the source and must include a link to the original information”. –

Scamdex is happy to be used as a research tool, but in future – ask first, then make sure it is credited – is that too much to ask for?

Scammers turning your WordPress Website into a Spam/Malware Distributor

So I get an email from Google complaining that several links from my son’s blog (which I will not name here) are linking to malware sites. The sample links they included were valid but completely foreign to the site, and the pages themselves were mangled versions of existing blog posts, with long lists of search engine spam and a few websites.
Needless to say, these were not of our making so I set out to investigate and clear them down as soon as possible.
The blog is concerned with my 12 year old son’s love of all things Lego which, since getting his own laptop and discovering Minecraft, has been languishing since his last post in July 2012.
The spam pages are not referenced from the valid blog pages in any way and have been in place since October. Only Google, chiding me about pages with my Adsense code being used to point to Malware alerted me to the breech. Otherwise I would never have noticed.

How it got there?

I suspect that the intrusion method was a theme I installed in October. It’s just a guess though – WordPress is so ubiquitous, I’m sure there are loads of vulnerabilities, especially if the constant stream of updates is anything to go by. Suffice it to say that they got in, and with enough authentication to allow them to upload files.

What I found

I found a couple of anonymous type directories under the ‘/wordpress’ directory: “imgxkm” and “imguut”. The content was a load of files of the form 74XXXXX.html. Each file was a complete webpage which seems to be spam content mixed with genuine blog page content. There was also an index.php.txt file which does a lot of stuff which I was in no mood to examine.
The important file, the one that makes the whole thing work is a .htaccess file. For those not in the know, this file is the Swiss-army penknife for web developers – it can make black into white and cure cancer – it can also take a mangled-looking url and make it go to a perfectly normal-looking webpage (and vice versa). Anyway, the job of this one was to take those odd looking SEO-spammy type urls and serve them up with a content-rich webpage – all without the website owner knowing a thing about it.

What Now?

I dont have time to completely debug this issue, I’m just glad to have found it (thanks to Google’s ever vigilant search engine spam detection algorithms). If you get messages from Google relating to webpages that you dont recognize, check for .htaccess files like this one.

Bon Chance!

dDos attacks on Scamdex – an apology.

Running the Scamdex Website isn’t a full-time job but occasionally I fall foul of the lovable rogues who perpetrate these scams and who get upset when I tell people about their doings. For example, from mid November in 2012, I had a week of distributed denial of service (dDos) attacks which effectively made stop responding to requests.

A day or so into the attack, I was contacted by the instigator; a nice Russian scammer who said “You see I can bring your server down, now remove the post”. He referred to a post someone had made in the Scam Tip Off Reports section of the site.

I’m sad to say that I had no option other than to comply with is threat on the grounds of ‘The Greater Good’. Cowardly you may say, but dDos attacks are not to be taken lightly and while they were going on, no-one would be able to see anything on Scamdex.

You have all seen the effects that dDos attacks have on even the biggest Internet presences – with all their resources and experts, they can still be reduced to server farms full of technically dead servers – Scamdex really can’t fight this.

I’m sorry if the Russian scammed someone who just might have been saved if the original post had remained online, but my duty is to the whole Internet community, above and beyond the individual. Mea Culpa!

Asterpix (who they?) vs. Google

Some Background FirstAsterpix Logo

A company called Asterpix ( have a cool tool (Searchlight) for websites (such as this one!).  They analyse the page you are on, include any recent search terms passed in (eg. from Google) and the website you are on and generate a Tag Cloud of words and phrases that seem to be interesting, by means of search terms or frequency of occurance or other weighting.

When a user clicks on one of the phrases in the tag cloud, they are taken to a fairly plain looking Google Search Results page with a (Google Adsense) ad block at top and bottom. While the search page is hosted on Asterpixs’ website, the Google Adsense account is the original site owners.

A visitor clicks on an ad, the revenue goes to the site owner – what could be simpler?

The Problem is ….


… From CPanel to …. what??

I am an old Unix dude, I have installed more different versions of Unix than most people – Everything from Sco Xenix/286 thru to Centos5.2 and I don’t usually have much problems – but as time wears on, my brainDisk is starting to squeal and it’s not as fast at random access as it used to be so I was really happy when I rented a server with Cpanel/WHM installed on it.
For those who don’t know, Cpanel is the web-based interface to everything you will never learn on a Unix server – plus, the WHM super system allows you to carve off a chunk and sell it or give it away to your pals, reasonably confident that they won’t/can’t screw it up.
Add in virtual web/mail/log server management and lots of useful pre-installed tools and you have a system where you rarely have to get your hands dirty under the #hood.

Well, I love Cpanel now and I have grown to rely on it (curses!) so when it comes to creating my own server, so I can save money on a dedicated one I find I need it to get things done (and my old stuff transferred.

The problem with CP is that it costs $$money. between $30 and $48/month. and. I. just. don’t. want. to. pay. that. any. more….. so….

Piracy is out – mainly because you need to register the license with CP and also because that’s bad!:'(

Perhaps I could install it, setup my system the way I want and then after a month or so, hand it back??

well, no apparently – most people (Including themselves) seem to be of the opinion that to uninstall CP, you should really re-install Linux…. kind of defeats my object here!

so…. alternatives, anyone?

There are a few – some other commercial (pay $$ for) such as DirectAdmin and some Public Domain ones (Web-CP, WebMin/VirtualMin). So I started evaluating these free Cpanel Alternatives ….

1. WebMin/VirtualMin

Looks like it will do the job – only one of the alts that I’ve heard of and actually used before. Installs easily enough and looks nice – has a fine range of functionality but what lets it down is it’s non-simplicity. Cpanel’s approach is to show you a bunch of things that you may want to do and asks sensible questions (with usually relevant tooltips close by) so help you accomplish your requirements.  WebMin takes the ‘I’ll help you to write the configuration files correctly’ approach – you really have to know what you’re doing and in a lot of cases, the input fields are just blank with no clue as to what to put there.

WebMin Configuring Backup Example Screenshot

WebMin Configuring Backup Example Screenshot

This probably highlights the major difference between CPanel/WHM and the rest of the Server Admin systems out there – CP/WHM does some pretty radical things to your server when you install it and this is why it’s so hard to uninstall. The other systems kind of leave things as they are and just act as configuration helpers. As an example, see the two screenshots of the ‘backup’ functions.

Cpanel Domain Owner Backup Page

Cpanel Domain Owner Backup Page

2. Web-CP

Much, much, harder to install and harder to find the installation instructions too. but seems pretty good so far.

I had problems with the PHP startup scripts being written with DOS line endings which confused the life out of me for a while until I found it.  Still not able to start the system up but suspect it’s something to do with the line that reads:

$args = trim(next($HTTP_SERVER_VARS[“argv”]));

# Shouldn’t that just be ARGV for shell scripts?)

… I’ll continue and let you know how I get on.

Been Scammed on CraigsList? – New Service from Scamdex

Back in 2008, Scamdex found a blog devoted to outing the huge population of scammers to whom Craigslist is a God-given opportunity to rip off people the world over. The Blog, Exposing scam artists who use Craigslist
is still here but hasn’t been updated for a while, which is a shame.

Scamdex’s Scam Tip Off Service

Scam Tip Off

I created a service to allow people to quickly log any instances they see of scams online. There are other services that allow this but I didn’t find on that I liked. These were the criteria I felt were important:

  • Completely Anonymous (if desired) logging in several different categories.
  • No username/signup process keeps it simple and quick.
  • Few mandatory fields. If you dont know the fax number, I won’t bust your balls!
  • High Search Engine visibility. Scamdex has always relied on Search Engines, especially Google, for indexing of it’s information. 85% our visitors come to Scamdex through a search engine query.

You can see what has already been logged Here and Add information Here. The app is under continuous development and I am working on tools to aggregate similar and identical reports. Any feedback is gratefully received (just make a comment here)

New Scam Email Indexing Method (again!)

It’s my third iteration on the same basic principle: take a carefully filtered and enhanced archive of 150,000 email messages and then sort, categorize and analyze them, then put them in a defanged, indexable/searchable list format so that people can browse them.

The first was a program I wrote in perl back in 2004, it was a POP sucker that connected to the mailbox, attempted to extract message parts and rewrite them as a html page. While successful, I was never happy at my efforts to disentangle nested messages and alternate body parts – this meant that a lot of emails showed up with lots of Base64 and other garbage. (eg. ScamDB_S_74.php)

The next try I had was to use a mail archive indexer program called ‘Hypermail‘. This was mostly successful at splitting messages into component parts but was still not quite flexible enough for my needs and the indexes were way too long. (eg. HYPMAIL/date.php)

So this spring, I am trying a whole new system that I rewrote in PHP, my code of choice for the decade. I am still mailbox based, mainly so that I can prune spam that has sneaked through my filters, but that may change soon.

This is how the Scamdex Engine works:

  1. Scam Emails arrive in the honeypot mailbox.
  2. Using Thunderbird with various Add-ons, I partially manually sort the scam emails into a holding mailstore and throw away the junk.
  3. A program runs nightly which:
    1. Analyses emails in the holding mailstore into one of 5 categories (419/AFF, Auctions, Jobs, Phishing, Lottery).
    2. Adds some extra Headers to the email.
    3. Moves it to the correct mailbox archive location.
    4. Runs MHONARC to create the indexed archive and html-ized emails.
    5. post-processes the MHonarc-ized pages to add a php index include file, update the (MySQL) database and  distribute the keywords  and scoring to  META and the nice little  graph widget.
    6. Our illustrious Founder
    7. err… that’s it!

It’s not pretty or fast but it works, and I can understand it. It’s easy to fix and add to. It’s annoying having to run the process every night from scratch but until I work out how to use the MHONARC system to add/delete emails from the archive, it’s all I can do. Any suggestions about how I can do this better, let me hear them!

(send to scamblog(a)

Long-needed Upgrades to Scamdex

Well to start with, I wanted to PHP-ize everything. SO I started looking at the Apache/PHP config and as usual, without backups or testing or anything, I dived in and threw the big red switch. Everything seemed to go ok but then the trouble started!
All my websites broke. any ‘.html’ web pages that had embedded php in them broke really badly, whole directories of files became ‘not found’ and it kept asking me what I wanted to do with files of type httpd-php5 and so on….

ANyway, lots of hacking later and it seems to be working. I had to force all the ‘.html’ files to become ‘.php’ files, but a little bit of .htaccess rewriting allows for previous search engine results to continue to work. had to upgrade wordpress and do a lot of tweaking for the file ownerships and permissions to even allow people to see them.

and then…

and then I noticed I had a visitor. Not just any visitor – he had guessed the ‘admin’ password (and I thought it was SOOOO clever) and had made himself root and installed some shitty little spam engine. Got rid of that and locked down sshd access to impose limits on number of failed logins per IP but he got back in and this time installed a Mech Chat server.

He”l probably get back in – linux security isn’t my best skill – but at least he didnt trash anything and it forced me to tidy up a bit.

Sees like he was one of our dear Romanian friends, but that might just have been ip cloaking…

Next phase is to make the scam emails look a bit nicer. I am trying out mhonarc – more flexible than hypermail and much better de-miming than my sad pathetic efforts. Check back to see how I’m doing.