PARALLELIZED MIGRATE
====================
Peter Beerli
beerli@fsu.edu
[updated March 2011]

Contents:
This text describes how you can improve the performance of migrate on most modern 
computers with multiple cores or on computer clusters (adhoc:  e.g. your lab, or
high performance computing center). You can parallelize migrate runs 

(I) using a virtual parallel architecture with a message-passing 
interface (MPI) or (II) by hand (The hand-version works but is cumbersome and 
is not recommended anymore because some features cannot be used this way).

The MPI-version runs fine on clusters of MacOSX workstations, dedicated clusters of Linux machines, 
and windows workstations. I use the freely available openmpi package for the compile and runtime
parallel environment. In the past I had problem with earlier versions of MPICH and do not recomment
its use with migrate because I lack experience with its successor MPICH2. I suggest to use openmpi
because it is  a standard environment on MacOS 10.5+, and easy to install on linux clusters and 
even windows (look for openmpi 1.5 binary install). 


I. Message passing interface 
============================

(1) - Download OPENMPI from  http://www.openmpi.org 
    - install OPENMPI on all machines (if this is to complicated for you ask a
      sysadmin or other guru to help, the openmpi documentation _is_ helpful.
      [On computer cluster you certainly will need to talk to the system administrator]

    - prepare a file hosts according to the specs in the openmpi distribution,
       the master node needs to be the first machine mentioned.
      my hosts looks like this:
      ciguri slots=4
      zork   slots=2
      nagual slots=32
	      
    - make sure that you can access all machines using ssh (I use openssh)
     without the need to specify a password, see man ssh-keygen and man ssh
     if you have firewalls installed on your individual systems then you would need to allow
     the individual machines to open/request "random" ports on the other machines. 
     
    - LINUX: change into the migrate-3.2.8/src/ directory
      configure and then use "make mpis", a binary named migrate-n-mpi
      will be created.
      MAC: follow the Linux instructions, or simply use the mac binaries
      that use the migrateshell.app
      WINDOWS: use the migrate-n-mpi.exe binary, its compilation is more
      tricky and currently I have no windows compilation instructions.
      If you want to try use the makefilempi.msvc file in the src directory.
    
(2)  If your machines have no cross-mounted file system,
     you need to make sure that migrate-n-mpi is 
     in the same path e.g. /home/beerli/bin/migrate-n-mpi on 
     EVERY machine.

(3) Try run the following commands
    cd into the example directory (I assume here that you have src and example
    on the same hierarchical level and that the executable is still in src:  
    
    mpirun  -np 7 --hostfile hosts ../src/migrate-n-mpi parmfile.testbayes
    [ watch in awe, that 6 loci can get analyzed at once,
     the log is not very comprehensive because all 7 processes
     write to the same console, 7 because there is one master-node
     who does only scheduling and maximization, 6 worker-nodes
     do the actual tree rearrangements and the likelihood calculations.
     the number you specify has nothing to do with the physical computers,
     LAM can run several nodes on a single CPU.


II. BY HAND =========================================================
    This guideline works only for ML analyses and may similarly work for
    Bayes analyses if you use the bayesallfile instead of the sumfile.
    [I reiterate: the marginal likelihood calculations will potentially fail
    with this approach]

(1) Secure as many computers for the analysis as you have loci
    in your dataset.

(2) On one machine prepare a directory with

    - migrate-n

    run the program once, and adjust the run parameters using
    the menu. Use the sumfile option in the (I)nput menu
    and then save the parmfile with the 
    (W)rite parmfile option. Then (Q)uit.
    Edit the created parmfile and check if you can find 
    write-sumfile=YES
    then change menu=YES to menu=NO

(2) Copy this directory on each machine and name the directories
    e.g. locus1 locus2 .....
    If you use Appleshare be careful that you have also 
    directories for each locus.
     
(3) Prepare the infiles. One for each locus
    Copy the infiles into the directories.

(4) Start migrate-n on all machines

(5) Once all the migrate-n runs have finished, 
    copy all sumfiles onto a single machine
    it would be helpful if this is your fastest 
    with lots of RAM. Be careful not ot overwrite
    individual files (the have the same name" sumfile").
 
(6) Concatenate the sumfiles 

(7) The combined sumfile needs hand editing
    or you can use the PERL script 
    concat-sumfile
    if you cannot run the PERL script or 
    want to do it by hand, see the example below.

(8) make a save copy of the fixed combined sumfile

(9) run migrate-n
    and use option (D)atatype and there (g)enealogy
    and change other menu items if you want.
    
(10) voila, a multilocus outfile in a fraction of
     the time the program needs to run on a single machine.

=========================================================================
What to edit in a sumfile 

(1) the heading of a sumfile needs the two first comment lines
# begin genealogy-summary file of migrate 0.9.8 ------
#

(2) the third line needs editing, the first number is the number of loci
for single locus data it is 1, change it to the number of loci
1 3 9 0 1  [before]
4 3 9 0 1  [after, 4 loci]

(3) Search for ####### you will find  lines like the following
0 0 ####### locus 0, replicate 0 ################
the file start couting with 0, so the lines reads locus 1 and replicate 1
leave the first occurrence as it is. Goto the end of the file and 
remove
# end genealogy-summary file of migrate 0.9.8 ------

(4) Prepare the next sumfile.x to the master sumfile
    - Remove everything above 0 0 ######## locus 0, replicate 0 ..... 
    - change the number to    1 0 ######## locus 1, .....
      if you use replicates you need to change the replicates accordingly.
    - Remove the last line [except for the very last sumfile
(5) concatenate the above sumfile-fragment to the master sumfile

(6) Goto (4) until done 
   
======================================================================

Example of a sumfile

# begin genealogy-summary file of migrate 0.9.8 ------
#
1 3 9 0 1
0 0 ####### locus 0, replicate 0 ################    <<<<<<<<<<<<<<change this
1 0 0
                   0                    0                    0
101 0.01224715726902237366 0.24028906596661075978 68
0.01797109426997086853 0.46646854779862101381 88
0.00810026471633800565 0.36704951583234807222 98
0.010000 0.010000 0.010000 32.000000 23.000000 23.000000 29.000000 21.000000 27.000000 
3.53366272929252732415e-03 3.53366273324574875492e-03
5.30077896995009931885e-03 5.30077895638609870171e-03
3.74540337472894320145e-03 3.74540325539807769997e-03
2.61285099904684057037e+03 2.61285112767777718545e+03
1.87798679166376041394e+03 1.87798680725500389599e+03
1.27983305210177809386e+03 1.27983304107036315145e+03
1.61370252198881439654e+03 1.61370251775554697815e+03
2.59250790064689090286e+03 2.59250788575867818508e+03
3.33322424226003886361e+03 3.33322432150516533511e+03
39 1.71782306759650010393e-06
# end genealogy-summary file of migrate 0.9.8 ------





good luck

Peter
<beerli@fsu.edu>




























