The Scamdex Scam Email Archive X

Subject:  Confirm Bio - Mem No. 512-564677
From:  Updated Registry Info <Ethan__Cole@recyclerye.com>
Date:  Fri, 23 May 2014 11:04:17 -0700
Date Added:  2016-07-20 21:09:17

An Email with the Subject "Confirm Bio - Mem No. 512-564677" was received in one of Scamdex's honeypot email accounts on Fri, 23 May 2014 11:04:17 -0700 and has been classified as a Advance Fee Fraud/419 Scam Email. The sender shows as Updated Registry Info <Ethan__Cole@recyclerye.com>.

The email address was probably spoofed. Do not reply to or contact any persons or organizations referenced in this email, or follow any URLs as you may expose yourself to scammers and, at the very least, you will be added to their email address lists for spam purposes.

Dear Fckyou,


** Notice: Your nomination has been accepted into the 2014 Business Who's Who Registry.

Please take a moment and confirm your acceptance:
http://www.recyclerye.com/Registry/Acceptance/Membership/accepted.index
Member No. 97-4652 


You have demonstrated leadership and earned a membership. 

Please confirm the info we have for your Bio. 



We look forward to working with you. 



Regards, 


Membership Admissions




















/\/\/\ notification settings w/Alert System Notifications \/\/\
can be altered -by- writing1107 Valeria Dr.Marion.TX.78124
http://www.recyclerye.com/wi34h/fke3.ng4ot






True. And it works with PHP's built-in XPath and XSLTProcessor classes, which are great for 
extracting content.   porneL Nov 27 '08 at 13:28
7               
For really mangled HTML, you can always run it through htmltidy before handing it off to DOM. 
Whenever I need to scrape data from HTML, I always use DOM, or at least simplexml.   Frank Farmer 
Oct 13 '09 at 0:41
4               
I've be re-researching this, and discovered that the problem I was having with DomDocument's 
loadXML method was due to an older linked version of libxml. I've been working on more up-to-date 
systems and DomDocument::loadHTML works like a charm.   Alan Storm Nov 21 '09 at 18:04
7               
Another thing with loading malformed HTML i that it might be wise to call 
libxml_use_internal_errors(true) to prevent warnings that will stop parsing.   Husky May 24 '10 at 
17:51
5               Well, just a comment about your "real-world consideration" standpoint. Sure, there 
ARE useful situations for Regex when parsing HTML. And there are also useful situations for using 
GOTO. And there are useful situations for variable-variables. So no particular implementation is 
definitively code-rot for using it. But it is a VERY strong warning sign. And the average developer 
isn't likely to be nuanced enough to tell the difference. So as a general rule, Regex GOTO and 
Variable-Variables are all evil. There are non-evil uses, but those are the exceptions (and rare at 
that)... (IMHO)   ircmaxell Sep 7 '10 at 12:11 
@mario: Actually, HTML can be properly parsed using regexes, although usually it takes several of 
them to do a fair job a tit. Its just a royal pain in the general case. In specific cases with 
well-defined input, it verges on trivial. Those are the cases that people should be using regexes 
on. Big old hungry heavy parsers are really what you need for general cases, though it isnt always 
clear to the casual user where to draw that line. Whichever code is simpler and easier, wins
I have used DOMDocument to parse about 1000 html sources (in various languages encoded with 
different charsets) without any issues. You might run into encoding issues with this, but they 
aren't insurmountable. You need to know 3 things: 1) loadHTML uses meta tag's charset to determine 
encoding 2) #2 can lead to incorrect encoding detection if the html content doesn't include this 
information 3) bad UTF-8 characters can trip the parser. In such cases, use a combination of 
mb_detect_encoding() and Simplepie RSS Parser's encoding / converting / stripping bad UTF-8 
characters code for workarounds.   Vasu Sep 19 '10 at 6:58 Yes, but DOMDocument does not support 
CSS a4095yh459yj4956hynd XPATH queries, just getElementById or getElementsByTagName?   umpirsky Nov 
16 '10 at 9:22My problem with loadHTML is the extra nodes it inserts, which are presumably there to 
"fix" the HTML but aren't actually required by the DOM spec. As such, the result of a loadHTML call 
is ill defined. Would have been much better to have this sort of thing happen on saveHTML.   
CurtainDog Mar 3 '11 atDOM does actually support XPath, take a look aVincent That is not what the 
docs mean by "safe" in this context. It is safe to raise SyntaxError or ValuError (which the 
calling code can catch and handle appropriately if necessary), rather than going ahead and evaling 
"import os; do_evil_stuff.." or whatever other string was passed in...   wim 2 days ago
                
But that doesn't make it any "safer" than using int("31") or float("545.2222"). The only advantage 
that I can see is that you don't have to know beforehand what type of mathematical expression 
you've got (which can be useful under certain circumstances, but is not what the OP was asking)1e3 
is a number in python, but a string according to your code.   Cees Timmerman Oct 4 '12 at 13:24
It's good to have a decent, peer-reviewed roll-your-own version next to a good recommendation for a 
standard library. Sometimes I don't want to pull in another library for that one place where I need 
to parse urlencoded strings, and sometimes I might even have that library already in my dependency 
list. That both alternatives are listed as top answers is once again a great testimony to the SO 
community.   Hanno Fietz May 19 '11 at 10:38
1               
@Hanno Fietz you mean you trust these alternatives? I know they are buggy. I know pointing out the 
bugs I see will only encourage people to adopt 'fixed' versions, rather than themselves look for 
the bugs I've overlooked.   Will May 19 '11 at 10:57
1               
@Will - well, I would never just trust copy-and-paste snippets I got from any website, and no one 
should. But here, these snippets are rather well reviewed and commented on and thus are really 
helpful, actually. Simply seeing some suggestions on what might be wrong with the code is already a 
great help in thinking for myself. And mind you, I didn't mean to say "roll your own is better", 
but rather that it's great to have good material for an informed decision in my own code.   Hanno 
Fietz May 23 '11 at 10:55 
nyway, assuming you are using UTF-8 or some other multi-byte character encoding, now that you've 
decoded one encoded byte you have to set it aside until you capture the next byte. You need all the 
encoded bytes that are together because you can't url-decode properly one byte at a time. Set aside 
all the bytes that are together then decode them all at once to reconstruct your characterPlus it 
gets more fun if you want to be lenient and account for user-agents that mangle urls. For example, 
some webmail clients double-encode things. Or double up the ?&= chars (for example: . If you want 
to try to gracefully deal with this, you will need to add more logic to your            
I imagine parse returns a list so that it maintain positional ordering and more easily allows 
duplicate entries
Aside from that, it's almost 5 times as fast as a nested try, except! Using lambda instead of def 
also saves 5% execution time. Tested with 32-bit Python 3.2 on 64-bit Windows 7.   Cees Timmerman 
Oct 4 '12 at 13:55
          Good point, Cees. Thanks. I appreciate benchmarking too :) How about a modified version 
of parseStr using regular expressions? It will probably hurt performance but someone might find it 
useful. The new parseStr function:  parseStr = lambda x: x.isalpha() and x or x.isdigit() and 
int(x) or re.match('(?i)^-?(\d+\.?e\d+|\d+\.\d*|\.\d+)$',x) and float(x) or x   krzym Oct 9 '12 at 
11:20 
  Using re is almost twice as slow as the try, except method, even with the 3% faster version that 
uses only match. Tested using time.time() and range(1000000) on a quadcore Intel Xeon 2.93 GHz.   
Cees Timmerman Oct 9 '12 at 12:17I ran a few tests using: parseStrRE = lambda x: x.isalpha() and x 
or x.isdigit() and int(x) or re.match('(?i)^-?(\d+\.?e\d+|\d+\.\d*|\.\d+)$', x) and float(x) or x 
and the try/except method modified to return strings if both int and float raise ValueError for the 
following test cases: ['1e3', '1.e3', '123', '-1234.12', 'e', 'ee', '1e', 'e2', '3hc1']. The 
execution time is as 2.7 (try/except) : 1.25 (parseStrRE) : 0.85 (original parseStr). Short-circuit 
expressions I employed speed things up since the result might actually be returned by evaluating 
only a part of the expression

Dear Fckyou, ** Notice: Your nomination has been accepted into the 2014 Business Who's Who Registry. Please take a moment and confirm your acceptance: http://www.recyclerye.com/Registry/Acceptance/Membership/accepted.index Member No. 97-4652 You have demonstrated leadership and earned a membership. Please confirm the info we have for your Bio. We look forward to working with you. Regards, Membership Admissions /\/\/\ notification settings w/Alert System Notifications \/\/\ can be altered -by- writing1107 Valeria Dr.Marion.TX.78124 http://www.recyclerye.com/wi34h/fke3.ng4ot True. And it works with PHP's built-in XPath and XSLTProcessor classes, which are great for extracting content. porneL Nov 27 '08 at 13:28 7 For really mangled HTML, you can always run it through htmltidy before handing it off to DOM. Whenever I need to scrape data from HTML, I always use DOM, or at least simplexml. Frank Farmer Oct 13 '09 at 0:41 4 I've be re-researching this, and discovered that the problem I was having with DomDocument's loadXML method was due to an older linked version of libxml. I've been working on more up-to-date systems and DomDocument::loadHTML works like a charm. Alan Storm Nov 21 '09 at 18:04 7 Another thing with loading malformed HTML i that it might be wise to call libxml_use_internal_errors(true) to prevent warnings that will stop parsing. Husky May 24 '10 at 17:51 5 Well, just a comment about your "real-world consideration" standpoint. Sure, there ARE useful situations for Regex when parsing HTML. And there are also useful situations for using GOTO. And there are useful situations for variable-variables. So no particular implementation is definitively code-rot for using it. But it is a VERY strong warning sign. And the average developer isn't likely to be nuanced enough to tell the difference. So as a general rule, Regex GOTO and Variable-Variables are all evil. There are non-evil uses, but those are the exceptions (and rare at that)... (IMHO) ircmaxell Sep 7 '10 at 12:11 @mario: Actually, HTML can be properly parsed using regexes, although usually it takes several of them to do a fair job a tit. Its just a royal pain in the general case. In specific cases with well-defined input, it verges on trivial. Those are the cases that people should be using regexes on. Big old hungry heavy parsers are really what you need for general cases, though it isnt always clear to the casual user where to draw that line. Whichever code is simpler and easier, wins I have used DOMDocument to parse about 1000 html sources (in various languages encoded with different charsets) without any issues. You might run into encoding issues with this, but they aren't insurmountable. You need to know 3 things: 1) loadHTML uses meta tag's charset to determine encoding 2) #2 can lead to incorrect encoding detection if the html content doesn't include this information 3) bad UTF-8 characters can trip the parser. In such cases, use a combination of mb_detect_encoding() and Simplepie RSS Parser's encoding / converting / stripping bad UTF-8 characters code for workarounds. Vasu Sep 19 '10 at 6:58 Yes, but DOMDocument does not support CSS a4095yh459yj4956hynd XPATH queries, just getElementById or getElementsByTagName? umpirsky Nov 16 '10 at 9:22My problem with loadHTML is the extra nodes it inserts, which are presumably there to "fix" the HTML but aren't actually required by the DOM spec. As such, the result of a loadHTML call is ill defined. Would have been much better to have this sort of thing happen on saveHTML. CurtainDog Mar 3 '11 atDOM does actually support XPath, take a look aVincent That is not what the docs mean by "safe" in this context. It is safe to raise SyntaxError or ValuError (which the calling code can catch and handle appropriately if necessary), rather than going ahead and evaling "import os; do_evil_stuff.." or whatever other string was passed in... wim 2 days ago But that doesn't make it any "safer" than using int("31") or float("545.2222"). The only advantage that I can see is that you don't have to know beforehand what type of mathematical expression you've got (which can be useful under certain circumstances, but is not what the OP was asking)1e3 is a number in python, but a string according to your code. Cees Timmerman Oct 4 '12 at 13:24 It's good to have a decent, peer-reviewed roll-your-own version next to a good recommendation for a standard library. Sometimes I don't want to pull in another library for that one place where I need to parse urlencoded strings, and sometimes I might even have that library already in my dependency list. That both alternatives are listed as top answers is once again a great testimony to the SO community. Hanno Fietz May 19 '11 at 10:38 1 @Hanno Fietz you mean you trust these alternatives? I know they are buggy. I know pointing out the bugs I see will only encourage people to adopt 'fixed' versions, rather than themselves look for the bugs I've overlooked. Will May 19 '11 at 10:57 1 @Will - well, I would never just trust copy-and-paste snippets I got from any website, and no one should. But here, these snippets are rather well reviewed and commented on and thus are really helpful, actually. Simply seeing some suggestions on what might be wrong with the code is already a great help in thinking for myself. And mind you, I didn't mean to say "roll your own is better", but rather that it's great to have good material for an informed decision in my own code. Hanno Fietz May 23 '11 at 10:55 nyway, assuming you are using UTF-8 or some other multi-byte character encoding, now that you've decoded one encoded byte you have to set it aside until you capture the next byte. You need all the encoded bytes that are together because you can't url-decode properly one byte at a time. Set aside all the bytes that are together then decode them all at once to reconstruct your characterPlus it gets more fun if you want to be lenient and account for user-agents that mangle urls. For example, some webmail clients double-encode things. Or double up the ?&= chars (for example: . If you want to try to gracefully deal with this, you will need to add more logic to your I imagine parse returns a list so that it maintain positional ordering and more easily allows duplicate entries Aside from that, it's almost 5 times as fast as a nested try, except! Using lambda instead of def also saves 5% execution time. Tested with 32-bit Python 3.2 on 64-bit Windows 7. Cees Timmerman Oct 4 '12 at 13:55 Good point, Cees. Thanks. I appreciate benchmarking too :) How about a modified version of parseStr using regular expressions? It will probably hurt performance but someone might find it useful. The new parseStr function: parseStr = lambda x: x.isalpha() and x or x.isdigit() and int(x) or re.match('(?i)^-?(\d+\.?e\d+|\d+\.\d*|\.\d+)$',x) and float(x) or x krzym Oct 9 '12 at 11:20 Using re is almost twice as slow as the try, except method, even with the 3% faster version that uses only match. Tested using time.time() and range(1000000) on a quadcore Intel Xeon 2.93 GHz. Cees Timmerman Oct 9 '12 at 12:17I ran a few tests using: parseStrRE = lambda x: x.isalpha() and x or x.isdigit() and int(x) or re.match('(?i)^-?(\d+\.?e\d+|\d+\.\d*|\.\d+)$', x) and float(x) or x and the try/except method modified to return strings if both int and float raise ValueError for the following test cases: ['1e3', '1.e3', '123', '-1234.12', 'e', 'ee', '1e', 'e2', '3hc1']. The execution time is as 2.7 (try/except) : 1.25 (parseStrRE) : 0.85 (original parseStr). Short-circuit expressions I employed speed things up since the result might actually be returned by evaluating only a part of the expression