How to parallelize TBSS / randomise (sort of)
Lots of resources - which I surely have not exhausted - are available on how
to make optimal use of TBSS and in particular of the lengthy repetitive
randomise
part and the way to reduce costs in execution time.
I confess I may have not tried hard after my first issues with randomise_parallel
and despite being a big fan of Nipype
I could not find my heart's desire (if only there would be a way to it).
Nevertheless my ambition was quite humble at this point: allow randomise
to
use N
available CPUs to estimate a set of M
contrasts - in parallel - using the
traditional threshold-free permutation scheme. Instead of losing time running
one contrast after the other.
TL;DR
The following method is probably the dumbest way to parallelizerandomise
as it actually splits a.con
file into various smaller files and returns a set ofrandomise
commands ready to run. Instead of running all the contrasts sequentially one after the other, multiple jobs are used, hence dividing execution time while preserving the numbering of the contrasts as given in the original.con
file.
The following recipe is implemented in this module.
Concept:
The typical randomise
command looks like:
`randomise -i <skeletonized_data> -o <output_basename> -m <mean_FA_skeleton_mask>\
-d <design.mat> -t <design.con> -n <n_permutations> [--T2] [-V]`
This will estimate every given contrast from the .con
file, sequentially,
using a single processor and will generally take a substantial
amount of time, multiplied by the number of contrasts.
One dirty way to divide this time consists in splitting the .con
file in
various smaller ones and to run various instances of randomise
in parallel on them.
randomise
names the produced statistical maps after the rank/number of the corresponding
contrast in the .con
file so it will be important that the ranks/numbers are
consistent between the initial file and the small bits.
For example, with M=16
contrasts and N=6
CPUs, we could make 6 mini-batches of
3, 3, 3, 3, 3 and 1 contrast(s) each, respectively. The first batch (call it
part1.con
) will take the 3 first contrasts, part2.con
the 3
following ones, etc. Now to make sure that the numbers do match between the
initial design.con
file and the small part*.con
, the trick consists in
adding as many dummy contrasts as necessary before the "real ones". For instance,
part2.con
will contain 6 contrasts in total, 3 dummy contrasts + 3 genuine
ones (#4, #5, #6) from design.con
. Then we use randomise
's hidden skipTo
option to ignore the first contrasts of a file and to start from a given one
directly. This will conveniently skip the initial dummy contrasts
in every .con
file.
Splitting the contrast file (aka the .con
trick):
In short, we can adapt a function from a previous post on building design matrices and contrast for TBSS in Python to create these mock contrasts as wished. In the following example, we defined a set of 8 contrasts over 9 independent variables.
Example:
con = [('age(+)', [0, 0, 1, 0, 0, 0, 0, 0, 0]),
('age(-)', [0, 0, -1, 0, 0, 0, 0, 0, 0]),
('sex(m>f)', [0, 0, 0, 1, 0, 0, 0, 0, 0]),
('sex(f>m)', [0, 0, 0, -1, 0, 0, 0, 0, 0]),
('apoe4(+)', [0, 0, 0, 0, 0, 0, 0, 0, 1]),
('apoe4(-)', [0, 0, 0, 0, 0, 0, 0, 0, -1]),
('sleep(+)', [0, 0, 0, 0, 0, 0, 0, 1, 0]),
('sleep(-)', [0, 0, 0, 0, 0, 0, 0, -1, 0])]
from tbss import contrasts
contrasts.dump(con, '/tmp/contrasts.con')
The contents of the generated file (following the format from that previous post will be the following:
/NumWaves 9
/ContrastName1 age(+)
/ContrastName2 age(-)
/ContrastName3 sex(m>f)
/ContrastName4 sex(f>m)
/ContrastName5 apoe4(+)
/ContrastName6 apoe4(-)
/ContrastName7 sleep(+)
/ContrastName8 sleep(-)
/NumContrasts 8
/Matrix
0 0 1 0 0 0 0 0 0
0 0 -1 0 0 0 0 0 0
0 0 0 1 0 0 0 0 0
0 0 0 -1 0 0 0 0 0
0 0 0 0 0 0 0 0 1
0 0 0 0 0 0 0 0 -1
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 -1 0
Now, let us provide a list of filepaths instead of a single one.
python
fp = ['/tmp/part1.con', '/tmp/part2.con', '/tmp/part3.con', '/tmp/part4.con']
contrasts.dump(con, fp)
Since we have M=8
contrasts and N=4
files (for, say, N=4
processors
available), part*.con
would then contain two contrasts from the 8 preceded
by the right number of dummy constrasts. For example, the contents of the
last one will be:
/NumWaves 9
/ContrastName1 mock1
/ContrastName2 mock2
/ContrastName3 mock3
/ContrastName4 mock4
/ContrastName5 mock5
/ContrastName6 mock6
/ContrastName7 sleep(+)
/ContrastName8 sleep(-)
/NumContrasts 8
/Matrix
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 1 0
0 0 0 0 0 0 0 -1 0
There is also a function called main_effects
to generate all the contrasts
corresponding to the main effects of all the variables contained in the design
matrix.
Building the right randomise
commands:
Say we want the M=8
contrasts to run on N=4
processors. We need to build
N
randomise
commands over N
contrast files.
The randomise_parallel
function takes care of all this as shown hereafter.
(Let us assume in the following that all the <arguments>
will be replaced
with correct paths to the actual data)
num_perm = 5000
n_cpus = 4
sleep_interval = 0
commands = contrasts.randomise_parallel('<skeletonized_data>',
'<output_basename>',
'<design.mat>',
'<mean_FA_skeleton_mask>',
'<contrasts.con>', num_perm=num_perm,
n_cpus=n_cpus,
sleep_interval=sleep_interval)
for e in commands[:]:
print(e)
randomise -i <skeletonized_data> -o <output_basename> -m <mean_FA_skeleton_mask>\
-d <design.mat> -t /tmp/part1.con -n 5000 --T2 --skipTo=1 -V &> /tmp/log1.log
randomise -i <skeletonized_data> -o <output_basename> -m <mean_FA_skeleton_mask>\
-d <design.mat> -t /tmp/part2.con -n 5000 --T2 --skipTo=3 -V &> /tmp/log2.log
randomise -i <skeletonized_data> -o <output_basename> -m <mean_FA_skeleton_mask>\
-d <design.mat> -t /tmp/part3.con -n 5000 --T2 --skipTo=5 -V &> /tmp/log3.log
randomise -i <skeletonized_data> -o <output_basename> -m <mean_FA_skeleton_mask>\
-d <design.mat> -t /tmp/part4.con -n 5000 --T2 --skipTo=7 -V &> /tmp/log4.log
Eventually these lines may be saved in a script and passed to any tool like GNU parallel for execution, as in the following example:
Example:
cat /tmp/script.sh | parallel -j <n_cpus>
Writing this post will have taken the actual time of running randomise
in
background over a dataset of 500 subjects including 12 contrasts with 5000
permutations each "in parallel" following this same strategy. Much shorter
than the original way.
Please comment if you have different experience or feedback to share.
Please let me know if you liked this post by clicking the button below.