Unit Tests are Parity Bits for Code

I was recently asked by a skilled programmer:

I have never understood the rationale for unit tests. It seems like yet more code that can have bugs in it, so you would need unit tests of your unit tests.

I think a good analogy for understanding unit tests is to think of them as parity bits for code. In data communication, parity bits are added to a message to detect transmission errors caused by noise.

Just as with unit tests, you might ask: “Doesn’t adding a parity bit just increase the chance of error? If I’m sending an 8-bit message, won’t an extra bit just increase the chance of error by ~12%?” It is true a parity bit slightly increases the message’s “surface area” for raw errors, but it automatically eliminates half of all 9-bit messages as being incorrect. In practical terms, this means the parity bit can detection all 1-bit errors, and many multi-bit errors. That extra parity bit easily pays for itself many times over by detecting damaged messages.

Unit tests serve the same function as parity bits. Unit tests are not designed to prove the code correct (extremely difficult), nor to exhaustively test the code for all inputs (combinatorial impractical). A unit test is simply there to detect the most common “1-bit errors” in code caused by “noise.” Of course, unit tests themselves can contain errors, but just like the parity bit, they can detect many times more errors then they might introduce.

So what is the source of “noise” in code? In data communication, noise is caused by physical things like electrical interference, cosmic rays, patchy fog, etc. that produce random errors in messages.

In code, noise is caused by programmers mutating code. Of course programmers are not random mutators, and ideally most changes are benign and beneficial. However, the complexity of computing systems almost guarantees that some changes will cause unintentional, nearly random effects, i.e. code noise.

Code noise can be external or internal changes to the code being tested. External noise is changes to things like library versions, compiler options, platform or architecture. Internal noise takes the form of ill-conceived optimizations, half-baked re-factorings, out-of-control search-n-replaces, etc.

In this view of the world, when you check-in a piece of code it is really an act of communication. You are transmitting the code to a programmer in the future (often yourself 6 months from now). During transmission, the code will face many mutations, some of it damaging code noise. To help ensure that the code’s message arrives un-garbled and usable, a unit test is added to automatically detect simple “1-bit” errors.

Parity as a unit test:

function message() { return "Hello World!"; }

function testMessage() { assertEqual( parity(message()), EVEN ); }

Want more about Parity bits and Unit testing? Stay tuned for the followup: “Unit Testing is for Farmers”

Wanted: Comment Redaction Plugin

Don’t trust code comments over 30 hours old. Inevitably the comment isn’t updated after code changes, resulting in confusion and bugs. In fact, I wish every IDE (Xcode, Eclipse, etc.) had a feature/plugin that would redact comments whenever code is changed, in order to force comment revision.

Here is how it would work. Given some commented code from a game:

 /* Add a fixed bonus */
 score += 100;

Any change to the code would immediately redact the associated comment with X’s:

 /* XXX X XXXXX XXXXX */
 score += time_left * 2;

Alternatively, it could reduce the comment to an Mad-Libs style fill-in-the-blank exercise:

 /* _verb_ a _adjective_ _noun_ */
 score += time_left * 2;

Either way, the coder would be forced to rewrite the comment to match the new code. This would encourage short comments, or better yet, no comments.

genericOnError: A generic window.onerror emulator for Safari and Opera

Update: Safari 5.1 added support for onerror, so this post is now mostly superfluous.

By default, most browsers hide un-handled Javascript errors from end users. Typically, the error is logged to a Javascript console which can only be viewed via developer tools. This is reasonable behavior for a production web site. However, during development and testing, it is better to receive an immediate, visible notification of any error, since the console is too easily forgotten or ignored.

Unfortunately, there currently isn’t a widely supported, standard way to trap all un-handled exception. The HTML standard defines window.onerror, but it is implemented inconsistently (IE, Firefox, Chrome), or not at all (Safari, Opera).

GenericOnError fills this gap with some kludgy hackery, by browser sniffing and patching the un-documented Error callback found in Safari and Opera to emulate a simplified version of window.onerror. Here’s a test page and the code:

GenericOnError Test Page


function genericOnError(handler) {
    // Chrome, Firefox and IE implement onerror,
    // and it will one day be standardized...
    window.onerror = function (message, url, line) {
        handler("Error: " + message + " " + url + ":" + line);
        // Note, don't return a value since Chrome/Firefox don't agree
        // on true/false meaning. A no-return triggers the correct
        // behavior in both (print error to console).
    }

    // Safari 5 and Opera 11 doesn't implment onerror,
    // so intercept the undocumented Error function
    if (RegExp("Safari|Opera").test(navigator.userAgent)
      && !RegExp("Chrome").test(navigator.userAgent)) {
        var originalError = window.Error;
        window.Error = function() {
            if (arguments.length > 0) handler("Error: " + arguments[0]);
            return originalError.apply(this, arguments);
        }
    }
}

genericOnError(function(m) { alert(m); });

throw new Error("test error");

GenericOnError has been tested on the current releases of IE, Firefox, Chrome, Safari and Opera. GenericOnError should only be used for development testing, since it is likely to break catastrophically in future browser releases.

Safari rejects Cookies with Version/Discard Attributes after Mac OS X 10.6.5 update

A few months ago while helping to debug a cookie character-encoding problem, I randomly suggested trying the Java J2EE javax.servlet.http.Cookie’s setVersion(1) call, based on the RFC 2109/RFC 2965 claims that cookies labeled version 1 would behave differently. This was a foolish idea, since the cookie RFCs have little or nothing to do with the de-facto cookie non-standard.

Cookie c = new Cookie('Key", "Value");
c.setVersion(1);

Unfortunately, what was meant to be a one-line experiment got checked in, and was silently adding the Version and Discard attribute to cookies for months (note, the Discard attribute signifies a session cookie according to RFC 2965). All the common browsers ignore the version/discard attributes, so no problems appeared. Some typical output from GAE‘s Jetty server:

Set-Cookie: Key=Value;Version=1;Path=/;Discard

However, the recent Mac OS X 10.6.5 update included a security patch to CFNetwork involving cookies and allowed domains. The patch has caused problems for web-developers using local IP-addresses. It also appears that this patch silently changed how Safari treats cookies with the version/discard attribute — rather then simply ignoring the attributes, Safari now actively rejects cookies with version/discard attributes.

Is this a bug in Safari? Or a cookie-validation feature? With no real standard to measure it by, there is no way to tell. Suffice it say, it is best to avoid calling setCookie(1)!

wfinger: WebFinger and command-line finger combined

WebFinger is a new protocol for mapping email addresses to public profile information. Despite being named after the classic finger protocol, there isn’t a version of the Unix finger command that supports the WebFinger protocol. So to fill this gap, I’ve cobbled together wfinger – the traditional finger command with WebFinger support.

  • Binary: wfinger.zip (Mac OS X 10.6 Universal Binary)
  • Demo CGI Gateway:

    Or: curl http://wfinger.habilis.net/user@example.com

Example (using web-fingerling Blaine Cook):

% ./wfinger romeda@gmail.com

[gmail.com - web finger]

Account: romeda@gmail.com               Name: Blaine Cook
Organization: BT                        Title: Sociotechnologist
Email:                                  Phone:
Address: Belfast, Northern Ireland
Profile: http://www.google.com/profiles/romeda
OpenID: http://www.google.com/profiles/romeda

Links:
        Twitter: http://twitter.com/blaine
           Blog: http://blog.romeda.org/
    del.icio.us: http://delicious.com/lattice
           Yelp: http://blaine.yelp.com
         Flickr: http://www.flickr.com/photos/lattice/
            tel: http://tel:447595925264

Latest Tweet:
   #blogtalk2010 finished, really lots of fun. Thanks to @johnbreslin
   and everyone else here for a great event. :-D

The WebFinger-based output of wfinger is mainly fields extracted from the user’s profile using the hCard micro-format. To add some color, wfinger will also display the user’s latest tweet, if a Twitter account is detected. When WebFinger information can’t be found, wfinger falls back to using the traditional finger code/protocol. Thus, it still works with those who have keep the finger-protocol flame alive throughout the dark ages, like bzs and alexis at The World and Panix.

I also added code to look for a new “https://habilis.net/hfinger” relationship in account XRDs. The “hfinger” stand for HTTP Finger, and hfinger URLs should point to HTTP finger gateways that return text/plain finger output. This allows fingerd-like output to be tunneled via WebFinger resource discovery. You can see this in action by wfingering my account (chuck@habilis.net). This will be useful for people who just want traditional finger output, but are on systems that don’t allow port 79 access.

I hope wfinger will generate some interest in the WebFinger protocol amongst the command-linerati and grumpy grey-beard sysadmins, who run the internet. Share and Enjoy!

HTML in 3D!

Boing Boing recently noted the satirical McSweeney’s piece “Leaping off the Page” by Ben Greenman that proposed a 3D typographic system, 3*TYPE, which would allow simple prose to meet the challenges of the Avatar-inspired 3D revolution. However, where would satire be without farce? So taking things to their natural extreme, I present “HTML in 3D!” which implements the 3*TYPE process for any web page.

HTML in 3D is a bookmarklet and CSS stylesheet that produce a anaglyph stereoscopic 3D effect for common HTML text elements (headers, links, etc). It should work in most modern browsers (i.e. probably not IE). Put on some anaglyph red-blue 3D glasses and click the link to see this post in headache-inducing 3D:

3D!

How to use the bookmarklet elsewhere:

  • Drag the 3D! link above to your browser’s bookmark bar
  • Load any web page
  • Don anaglyph red-blue 3D glasses
  • Click the 3D! bookmark, and watch the HTML pop!

Thanks to GEKE.NET for the CSS Bookmarklet Maker.

RSS Feeds for Full Episodes of The Colbert Report and The Daily Show

Recently Comedy Central yanked The Daily Show and The Colbert Report off of Hulu [update: in early 2011, the shows returned to Hulu]. I started watched these shows on Hulu because it provided RSS feeds for the full episodes, while Comedy Central has only ever had segment/clip feeds. Luckily, the shows’ sites have feed-like JSON AJAH pages that are easily massaged into a true RSS feeds, so here are substitute feeds. Share and enjoy:

The Daily Show Full Episodes RSS Feed

The Colbert Report Full Episode RSS Feed

The Nightly Show Full Episode RSS Feed

The feeds update every hour, although the shows only appear the morning after their cable broadcast. The Python (and mustache!) shell, XSL (and sed!) source can be viewed in the full-episode feed directory.

The Snout of Development

A resting Eurasian Lynx

Eurasian Lynx by Michaelphillipr

I finally got around to converting Lynxlet from Ye Olde CVS repository into Subversion. By default the cvs2svn tool uses the customary trunk/branch/tag naming. I’ve never much like this naming scheme, in part because “tag” breaks the botanical morphology theme (shouldn’t it be trunk/branch/leaf?)

Since Lynx are carnivores, I decided mammalian anatomy would be more appropriate. So now main development is done on the “snout”, speculative versions are on “tails” and snap-shots of individual releases are “paw-prints.” See where Lynxlet’s snout leads it at the Habilis Public Subversion Repositories.

Case Study: Blistering Barnacles! release to Apple’s Web App List

I released a simple iPhone and iPod Touch web app called “Blistering Barnacles!” (BB) — a homage to my favorite Tintin character, Captain Haddock. I submitted BB to Apple’s Web Application Catalog, which produced a small flurry of hits. Here are some of the numbers. In total about 770 unique visitors ran the app, about 460 on the first day when BB was listed high on the front page. There were very few repeat visitors, averaging about 1.08 runs per visitor.

picture-2

Geographically, the English speaking countries are at the top. Curiously, Singapore makes a very strong showing, and why so few Australians?

picture-8

After two days, BB had fallen off the front page and was in 10th place on the Most Popular page and 4th place in the Entertainment Category. Interestingly, the percent of iPhones to iPod Touches was roughly 60%/40%. I’m surprised there were so many Touches.

I wasn’t particularly surprised by the low numbers, partly because the Adventures of Tintin are not that well know (at least until the movie comes out). Also, web apps are now a quiet backwater in the iPhone ecosystems in comparison to native apps which are directly accessible from the iPhone/Touch.