abnf

Sunday, June 16 2013

WTFWG
[01:54:25] matt [wronka.org]/navic The WHATWG can take a good idea and make it useless. Take for example their "willful violations":


A valid e-mail address is a string that matches the email production of the following ABNF, the character set for which is Unicode. This ABNF implements the extensions described in RFC 1123. [ABNF] [RFC5322] [RFC1034] [RFC1123]

email = 1*( atext / "." ) "@" label *( "." label )
label = let-dig [ [ ldh-str ] let-dig ] ; limited to a length of 63 characters by RFC 1034 section 3.5
atext = < as defined in RFC 5322 section 3.2.3 >
let-dig = < as defined in RFC 1034 section 3.5 >
ldh-str = < as defined in RFC 1034 section 3.5 >

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/


Which as resulted in bugs like 791069 against Firefox by users expecting a valid eMail address to be a valid eMail address: https://bugzilla.mozilla.org/show_bug.cgi?id=791069

If you want to fix this locally, so it actually serves its purpose, I've got two patches you can apply locally: http://matt.wronka.org/stuff/projects/icpp/mozilla/

The second is the more-correct patch, and will also help if you happen to have a non-ASCII local part. It sounds like even with this applied, if you have a long domain name (more than 63 characters) Mozilla might complain but I haven't verified this, just looked at the comments in the code and bugzilla.