[ Team LiB ] |
Recipe 6.19 Matching a Valid Mail Address6.19.1 ProblemYou want to find a pattern to verify the validity of a supplied mail address. 6.19.2 SolutionBecause you cannot do real-time validation of deliverable mail addresses, no single, succinct pattern will solve this problem. You must pick from several available compromise approaches. 6.19.3 DiscussionOur best advice for verifying a person's mail address is to have them enter their address twice, just as you would when changing a password. This usually weeds out typos. If both entries match, send mail to that address with a personal message such as: Dear someuser@host.com, Please confirm the mail address you gave us on Sun Jun 29 10:29:01 MDT 2003 by replying to this message. Include the string "Rumpelstiltskin" in that reply, but spelled in reverse; that is, start with "Nik...". Once this is done, your confirmed address will be entered into our records. If you get back a message where they've followed your directions, you can be reasonably assured that it's real. A related strategy that's less open to forgery is to give them a personal identification number (PIN). Record the address and PIN (preferably a random one) for later processing. In the mail you send, ask them to include the PIN in their reply. In case your email bounces, or the message is included via a vacation script, ask them to mail back the PIN slightly altered, such as with the characters reversed, one added or subtracted to each digit, etc. Most common patterns used for address verification or validation fail in various and sometimes subtle ways. For example, the address this&that@somewhere.com is valid and quite possibly deliverable, but most patterns that allegedly match valid mail addresses fail to let that one pass. 1 while $addr =~ s/\([^( )]*\)//g; You could use the 6598-byte pattern given on the last page of the first edition of Mastering Regular Expressions to test for RFC conformance, but even that monster isn't perfect, for three reasons. First, not all RFC-valid addresses are deliverable. For example, foo@foo.foo.foo.foo is valid in form, but in practice is not deliverable. Some people try to do DNS lookups for MX records, even trying to connect to the host handling that address's mail to check if it's valid at that site. This is a poor approach because most sites can't do a direct connect to any other site, and even if they could, mail-receiving sites increasingly either ignore the SMTP VRFY command or fib about its answer. Second, some RFC-invalid addresses, in practice, are perfectly deliverable. For example, a lone postmaster is almost certainly deliverable, but doesn't pass RFC 822 muster: it doesn't have an @ in it. Finally and most importantly, just because the address happens to be valid and deliverable doesn't mean that it's the right one. president@whitehouse.gov, for example, is valid by the RFC and deliverable. But it's unlikely in the extreme that that would be the mail address of the person submitting information to your CGI script. The Email::Valid CPAN module makes a valiant (albeit provably imperfect) attempt at doing this correctly. It jumps through many hoops, including the RFC 822 regular expression from Mastering Regular Expressions, DNS MX record lookup, and stop lists for naughty words and famous people. But this is still a weak approach. The approach suggested at the beginning of the Discussion is easier to implement and less prone to error. 6.19.4 See AlsoThe "Matching an Email Address" section of Chapter 7 of the first edition Mastering Regular Expressions; Recipe 18.16 |
[ Team LiB ] |