Wednesday, October 05, 2011

Balancing Randomness

As part of a project, I'm required to implement various protocols for randomisation of statistical trials. The most basic of these we call "simple": it's the equivalent of flipping a coin every item. This is generally enough for most purposes, although even with a perfectly weighted coin there is still the possibility of a variance between the number of heads and the number of tails. For a clinical trial, for statistical purposes it can be preferred that both heads and tails - in this case, experimental treatments, placebos and/or controls - end up flipping the same number of times, while still requiring each individual result to be completely random.

    from random import choice
    from collections import Counter

    def flip(times):
        return Counter(
            choice(['H','T']) for _ in xrange(times)
        )

    for _ in xrange(5):
        print flip(1000)

    Counter({'H': 506, 'T': 494})
    Counter({'T': 512, 'H': 488})
    Counter({'H': 500, 'T': 500})
    Counter({'H': 521, 'T': 479})
    Counter({'T': 559, 'H': 441})
The result can be evenly balanced but it can also vary wildly. Where the results absolutely must be balanced, then block randomisation can help.

The basic idea behind block randomisation is to "pre-flip" the coin for small ranges (the "blocks") in a way that ensures the range is balanced. Sticking with coin flipping as the example, if we had a block size of 4, there are 6 possible sequences of flips that would give us a balanced result:
    H H T T
    H T H T
    H T T H
    T H H T
    T H T H
    T T H H
So rather than flip a coin for each item, we randomly select one of those sequences, and uses its values for the next 4 items:
    from random import choice
    from collections import Counter
    from itertools import permutations

    def get_block(choices, size):
        return choice(
            list(permutations(choices * (size/len(choices))))
        )
    
    for _ in xrange(5):
        print get_block(['H','T'], 4)
    ('H', 'T', 'H', 'T')
    ('T', 'H', 'H', 'T')
    ('H', 'H', 'T', 'T')
    ('T', 'H', 'T', 'H')
    ('H', 'T', 'T', 'H')
So the first item would be heads, then tails, then head, tails, tails, head, head etc. As long as the sample size is evenly divisible by the block size, the distribution will always be balanced:
    def block_flip(times, block_size):
        assert (times % block_size) == 0
        choices = ['H','T']
        return Counter(
            flip for _ in xrange(0, times, block_size)
            for flip in get_block(choices, block_size)
        )

    for sample_size in [100, 1000, 10000, 100000]:
        print block_flip(sample_size, 8)
    Counter({'H': 50, 'T': 50})
    Counter({'H': 500, 'T': 500})
    Counter({'H': 5000, 'T': 5000})
    Counter({'H': 50000, 'T': 50000})
Perfect!

No comments:

Post a Comment