Tuesday, 15 December 2009

Everything is a powerlaw. Everything.

Alternate title: How to use Google and Twelve Years of Higher Education to Prove Useless Things.

So this, pointed me to this, which reminded me of a similar experiment that I'd done a year ago.  Khan's fine, but I like LOLcats, so instead of the number of "a"s, I counted "ol"s after the first "l".
Here are the interesting things I noticed from this data.

  1. The current distributions of lols is very similar to the distribution 18 months ago when I first considered the problem.  Other than a few wobbles, this is well described by a N_{ol}^{-3.5} power law.
  2. This distribution suggests that adding two more characters gets boring at a constant factor.
  3. There is a sharp jump in the current distribution at eleven "ol"s.  Beyond this point, the power law has an identical index. This suggests that during the last 18 months, eleven "ol"s became the new cool, but the boredom factor is still the same.
  4. The fall off in the 2008 data occurs at 50 "ol"s, which corresponds to a jump from 99 to 101 characters. I suspect this shows that something somewhere had a limit of 100 characters.  This isn't visible in the new data, because the tail of the eleven-ol cool boost is sufficient to wash out any such drop.
The khan data linked above has a powerlaw index of N_{a}^{-2.5} for the middle section, between the fall off at low N_{a} (probably just typos, and not meme-related), and the drop above 50 (no theories on this one).  Therefore, while adding "a"s, you're less likely to get bored as you are with "ol"s. I don't have a good test meme for larger letter sets, so I can't test that a three letter suffix/infix would drop off as N^{-4.5} or so.

Another interesting test was to look at the quoted phrase "N martini lunch", where the number was written out.
After a quick upswing (I assume while our luncheoneers gain courage from martini's one and two) to the peak at four (a skew point suggesting the proper number of martinis for your lunching), there's a rapid drop with a power law index of -8.  This probably represents the combination of two effects:
  1. The difficulty of drinking that much liquid during a standard lunch.
  2. The unwillingness of people to call something "lunch" when it's clearly devolved into "mid-day binge."