Wednesday, January 11, 2012

From p=0.5 to Arbitrary p

The blog A Neighborhood of Infinity- which is probably the best blog I've ever seen- has a new post discussing exactly what I was going to discuss here, except with a much better technique and with ties to some interesting math. Therefore, I defer to this awesomeness by providing a link: http://blog.sigfpe.com/2012/01/lossless-decompression-and-generation.html

Apparently there are ways of doing this in constant time, but the technique described in the above article draws a connection between probability distributions, Huffman encoding, algorithmic complexity, and entropy.

What these posts had been leading up to was two things- given such a technique like the one mentioned above, we can generate a single number in the distribution describing the probability of getting each number of mutations for in a population (from 0 to p*n with p the size of the population and n the size of the individuals). Then we need that many random numbers to pick out places to mutate. This is much, much more efficient then the naive way of generating a random number per location (which I see used a lot).

The other point I was leading to is that we can get a distribution of 1s in a random number besides 0.5 by using AND, OR, and XOR on evenly distributed bits. This means is we want some distribution with p probability of a bit being a 1, and we start with samples from p=0.5, some fixed combination of samples will give us the desired distribution within any desired accuracy. This can be seen either with numbers in [0, 1] (continuous) or with sets of outcomes a subset of which are considered successes (discrete). The operators in the continuous case are: AND is pp' = p*p', OR is p+p' = p + p' - p*p', XOR is pxp' = p*(1-p') + p'*(1-p) - p*p', and NOT is -p = 1-p. In the discrete case we have the size of a set, n, and the number of successes, k, such that k <= n, so I'm going to write (n, k) has a probability of success k/n. The operators are then: AND is (n, k)(n', k') = (n*n', k*k'), OR is (n, k)+(n', k') = (n*n', k*n' + k'*n - k*k'), XOR is (n, k)x(n', k') = (n*n', k*(n'-k') + k'*(n-k)), and NOT is (n, k) = (n, n-k). There is presumably some way to determine the smallest number of operations starting with (2, 1) and ending with (n, k) such that p - n/k < e for any error e, although nothing comes to mind as how to find it.

No comments:

Post a Comment