For example, the study might wish to monitor treatment differences between men & women (one strata), and between two ages groups: less than 30, or 30 and over (another strata). This gives us two strata. However, this gives us four categories we're concerned with:
Men <30, Men >=30, Women <30, Women >=30.
What we're trying to achieve here is as balanced an allocation between treatments for each of those categories as we possibly can. So effectively, we can treat each individual category as a separate schedule and use block randomisation to assign treatments within each one.
# this builds on the code from the previous post class Stratified(object): def __init__(self, strata, arms, block_size): self.strata = strata self.arms = arms self.block_size = block_size self.blocks = dict() self.record = defaultdict(list) def new_block(self): return list(get_block(self.arms, self.block_size)) def add(self, stratum): block = self.blocks.get(stratum, self.new_block()) flip, block = block, block[1:] self.record[stratum].append(flip)The Stratified class is used to maintain the current block state for each of the categories, the actual scheduling uses the block randomisation functions.
[Please note that this is incorrect. See the follow up article for correction.]
To test this, the schedule is fed a stream of participants with randomised characteristics:
strata = ['M<30', 'M>=30', 'F<30', 'F>=30'] arms = ['H','T'] stratrand = Stratified(strata, arms, 4) for _ in xrange(10000): stratrand.add(choice(strata)) for stratum in strata: count = Counter(stratrand.record[stratum]) count['diff'] = abs(count['H']-count['T']) print stratum, 'H=%(H)d T=%(T)d Diff: %(diff)d' % count
M<30 H=1241 T=1262 Diff: 21 M>30 H=1241 T=1241 Diff: 0 F<30 H=1278 T=1244 Diff: 34 F>30 H=1249 T=1244 Diff: 5The larger the sample size, the smaller the imbalance tends to be. The final protocol - minimisation - is effectively stratified randomisation with a biased coin (I think) to even further reduce any imbalance.