ascii

Sunday, June 16 2013

WTFWG
[01:54:25] matt [wronka.org]/navic The WHATWG can take a good idea and make it useless. Take for example their "willful violations":


A valid e-mail address is a string that matches the email production of the following ABNF, the character set for which is Unicode. This ABNF implements the extensions described in RFC 1123. [ABNF] [RFC5322] [RFC1034] [RFC1123]

email = 1*( atext / "." ) "@" label *( "." label )
label = let-dig [ [ ldh-str ] let-dig ] ; limited to a length of 63 characters by RFC 1034 section 3.5
atext = < as defined in RFC 5322 section 3.2.3 >
let-dig = < as defined in RFC 1034 section 3.5 >
ldh-str = < as defined in RFC 1034 section 3.5 >

This requirement is a willful violation of RFC 5322, which defines a syntax for e-mail addresses that is simultaneously too strict (before the "@" character), too vague (after the "@" character), and too lax (allowing comments, whitespace characters, and quoted strings in manners unfamiliar to most users) to be of practical use here.

The following JavaScript- and Perl-compatible regular expression is an implementation of the above definition.

/^[a-zA-Z0-9.!#$%&'*+/=?^_`{|}~-]+@[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?(?:\.[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?)*$/


Which as resulted in bugs like 791069 against Firefox by users expecting a valid eMail address to be a valid eMail address: https://bugzilla.mozilla.org/show_bug.cgi?id=791069

If you want to fix this locally, so it actually serves its purpose, I've got two patches you can apply locally: http://matt.wronka.org/stuff/projects/icpp/mozilla/

The second is the more-correct patch, and will also help if you happen to have a non-ASCII local part. It sounds like even with this applied, if you have a long domain name (more than 63 characters) Mozilla might complain but I haven't verified this, just looked at the comments in the code and bugzilla.

Wednesday, March 6 2013

GnuPG and eMail Address Validation
[04:40:40] matt [wronka.org]/Psi.generay It's bothered me for some time that GNU Privacy Guard (gpg or gnupg) rejects valid eMail addresses. It feels like a piece of software that should get eMail validation correct, despite how often others get it wrong.

It turns out that GPG actually let's any non-ASCII character through ostensibly for PGP compatability. I patched the validation routines to also allow some cases it was currently rejecting. This means that although this isn't a 100% to-spec validation routine, it should at least allow all valid cases.

http://matt.wronka.org/stuff/projects/icpp/gnupg/gnupg-1.4.13-emailvalidator.diff

Tuesday, November 9 2010

[15:06:36] matt [wronka.org]/Merch http://matt.wronka.org/stuff/projects/scripts/wav2txt/ Convert audio/x-wav to ASCII text using pocketsphinx. Useful for adding to your mailcap. Very poor recognition (what do you expect for an open grammar?)