November 20, 2009

Mean Streets of Silicon Valley

Mean/variance calculation is ridiculously commonplace in data analysis, yet most programmers have never seen this gem from TAoCP:

def online_mean_and_variance(data):
    n, mu, s2 = 0, 0, 0
    for x in data:
        n += 1
        delta = x - mu
        mu += delta/n
        s2 += delta*(x - mu)
    if n > 1:
        yield (mu, s2/(n-1))

Unlike the standard two-pass algorithm, this one is online; it also happens to be more stable.
If that's not enough, I've given it to you here as a Python generator. Enjoy!

No comments:

Post a Comment