mcl(1)                    USER COMMANDS                    mcl(1)



  NAME
      mcl  -  the  Amsterdam  implementation  of the Markov Cluster
      Algorithm

  SYNOPSIS
      mcl <-|fname> [-I f (inflation)] [-o str (fname)] [-scheme  k
      (resource scheme)]

      These  options  are  sufficient in 95 percent of the cases or
      more. The first argument must be the name of a file  contain-
      ing  a  graph/matrix  in the mcl input format, or a hyphen to
      read from STDIN. With respect  to  clustering,  only  the  -I
      option  and -scheme option are relevant and the rest is imma-
      terial.

      As of the first 1.002 release, mcl will accept a very general
      input  format.   Graph  indices no longer need be sequential;
      you can use any set of indices you like, as long as they  are
      in  a  suitable range. Refer to the mcxio(5) section, and use
      mcl -z to find the range from which indices can be picked.

      As of the first 1.003 release, clmformat enables you to map a
      clustering  onto a format better suited for inspection, using
      an index file (the so called tab file)  mapping  mcl  indices
      onto  descriptive labels.  Read the clmformat manual for more
      information - it is simple to use and the manual is small.

      A mechanism for pipelines is supported (as of the first 1.003
      release).   Refer  to the PIPELINES section for more informa-
      tion.  A prepackaged pipeline for BLAST data  is  present  in
      the form of mclblastline.

      The  full  listing  of  mcl options is shown below, separated
      into parts corresponding  with  functional  aspects  such  as
      clustering,  threading,  verbosity, pruning and resource man-
      agement, automatic output naming, and dumping.   The  -scheme
      parameter  provides  a  single  access  point  to the pruning
      options, and should be sufficient in most cases.  mcl  allows
      comprehensive  tuning  and  access to its internals for those
      who are interested, so it has many options.

      Baseline clustering options
      [-I f (inflation)] [-o  str  (fname)]  [-scheme  k  (resource
      scheme)]

      Additional clustering options
      [-c  f  (centering)]  [-pi  f (pre-inflation)] [-l n (initial
      iteration number)] [-L n (main iteration number)] [-i f (ini-
      tial  inflation)]  [-pp n (input matrix)] [-do str (carry out
      action)]

      Verbosity options
      [-v str (verbosity type on)] [-V str  (verbosity  type  off)]
      [--silent  (very)]  [--verbose  (very)] [-progress k (gauge)]
      [--show (print (small) matrices to screen)]

      Thread options
      [-te k (#expansion threads)] [-ti k (#inflation threads)] [-t
      k   (#threads)]  [--clone  (when  threading  (experimental))]
      [-cloneat n (trigger)]

      Output file name options
      [-o str (fname)] [-ap str (use str as file name prefix)] [-aa
      str  (append  str to suffix)] [-az (show output file name and
      exit)]

      Dump options
      [-di i:j (dump interval)] [-dm k  (dump  modulo)]  [-ds  stem
      (dump  file  stem)]  [-dump  str (type)] [-digits n (printing
      precision)]

      Output options
      [-sort str (sort mode)] [--expand-only (factor  out  computa-
      tion)] [--inflate-first (rather then expand)]

      Info options
      [--jury-charter  (explains  jury)] [--version (show version)]
      [-how-much-ram k  (RAM  upper  bound)]  [-h  (most  important
      options)] [-z (show current settings)] [-az (show output file
      name and exit)] [--show-schemes (show resource schemes)]

      Pruning options
      The following options all  pertain  to  the  various  pruning
      strategies that can be employed by mcl. They are described in
      the PRUNING OPTIONS section, accompanied by a description  of
      the  mcl  pruning  strategy.  If your graphs are huge and you
      have an appetite for tuning, have a look at the following:

      [-p f (cutoff)] [-P n (1/cutoff)] [-S n  (selection  number)]
      [-R  n  (recovery  number)]  [-pct  f  (recover  percentage)]
      [-warn-pct n (prune warn percentage)] [-warn-factor n  (prune
      warn  factor)]  [--dense  (allow  matrices to fill)] [--adapt
      (pruning)] [--rigid (pruning)] [-ae f (adaptive pruning expo-
      nent)]  [-af  f  (adaptive  pruning factor)] [-nx x (x window
      index)] [-ny y (y window index)] [-nj j (jury window  index)]
      [-nw  w  (nr of windows)] [-nl w (nr of iterations)] [--thick
      (expect dense input graph)]

      The first argument of mcl must  be  a  file  name,  but  some
      options  are allowed to appear as the first argument instead.
      These are the options that cause mcl to print out information
      of  some  kind, after which it will gracefully exit. The full
      list of these options is

      -z, -h, --version, --show-settings,  --show-schemes,  --jury-
      charter, -how-much-ram k.

  DESCRIPTION
      mcl  implements the MCL algorithm, short for the Markov clus-
      ter algorithm, a cluster algorithm for  graphs  developed  by
      Stijn  van  Dongen at the Centre for Mathematics and Computer
      Science in Amsterdam, the Netherlands.  The  algorithm  simu-
      lates flow using two simple algebraic operations on matrices.
      The inception of this flow process and the theory  behind  it
      are  described  elsewhere  (see REFERENCES). Frequently asked
      questions are answered in the mclfaq(7) section.  The program
      described  here  is a fast threaded implementation written by
      the algorithm's creator with contributions by several others.
      Anton Enright co-implemented threading; see the HISTORY/CRED-
      ITS section for a complete account.   See  the  APPLICABILITY
      section  for  a  description  of  the type of graph mcl likes
      best, and for a qualitative assessment of its speed.  mcl  is
      accompanied by several other utilities for analyzing cluster-
      ings and performing matrix and graph operations; see the  SEE
      ALSO section.

      The  first  argument is the input file name (see the mcxio(5)
      section for its expected format), or a single hyphen to  read
      from  stdin.  The  rationale for making the name of the input
      file a fixed parameter is that you typically do several  runs
      with  different parameters. In command line mode it is pleas-
      ant if you do not have to skip over  an  immutable  parameter
      all the time.

      The -I f option is the main control, affecting cluster granu-
      larity. Using mcl is as simple as  typing  (assuming  a  file
      proteins contains a matrix/graph in mcl input format)

      mcl proteins -I 2.0

      The  above  will  result  in a clustering written to the file
      named out.proteins.I20s2. It is - of  course  -  possible  to
      explicitly  specify  the name of the output file using the -o
      option. Refer to the -ap option for a  description  of  mcl's
      procedure  in  automatically  constructing file names from it
      parameters.

      The mcl input format is described in  the  mcxio(5)  section.
      Clusterings  are also stored as matrices - this is again dis-
      cussed in the mcxio(5) section.  In finding good mcl  parame-
      ter  settings  for a particular domain, or in finding cluster
      structure at different levels of granularity,  one  typically
      runs mcl multiple times for varying values of f (refer to the
      -I option for further information).

      mcl expects a nonnegative matrix in the input file, or equiv-
      alently,  a  weighted  (possibly  directed) graph. NOTE - mcl
      interprets the matrix entries or graph edge weights as  simi-
      larities,  and  it likes undirected input graphs best. It can
      handle directed graphs, but any node  pair  (i,j)  for  which
      w(i,j) is much smaller than w(j,i) or vice versa will presum-
      ably have a slightly negative effect on the clusterings  out-
      put by mcl. Many such node pairs will have a distinctly nega-
      tive effect, so try to make your input graphs undirected. How
      your  edge weights are computed may affect mcl's performance.
      In protein clustering, one way to go is to choose the negated
      logarithm of the BLAST probabilities (see REFERENCES).

      mcl's  default  parameters  should  make  it quite fast under
      almost all circumstances. Taking default parameters, mcl  has
      been used to generate good protein clusters on 133k proteins,
      taking 10 minutes running time on a Compaq ES40  system  with
      four  alpha  EV6.7 processors. It has been applied (with good
      results) to graphs with 800k nodes, and if you have the  mem-
      ory  (and  preferably  CPUs  as well) nothing should stop you
      from going further.

      For large graphs, there  are  several  groups  of  parameters
      available  for tuning the mcl computing process, should it be
      necessary. The easiest thing to do is just vary  the  -scheme
      option.  This  triggers  different  settings for the group of
      pruning parameters {-p/-P, -R, -S,  and  -pct}.  The  default
      setting  corresponds  with -scheme 2.  There is an additional
      group of control parameters  {--adapt,  --rigid,  -ae,  -af},
      which may be helpful in speeding up mcl.  When doing multiple
      mcl runs for the same graphs with different -I settings  (for
      obtaining clusterings at different levels of granularity), it
      can be useful to factor out the first bit of computation that
      is  common to all runs, by using the --expand-only option one
      time and then using --inflate-first for each run in the  set.
      Whether  mcl  considers  a  graph large depends mainly on the
      graph connectivity; a highly connected graph on 50,000  nodes
      is  large  to  mcl (so that you might want to tune resources)
      whereas a sparsely connected graph on 500,000  nodes  may  be
      business as usual.

      mcl  is  a memory munger. Its precise appetite depends on the
      resource settings. You can get a rough (and usually much  too
      pessimistic) upper bound for the amount of RAM that is needed
      by using the -how-much-ram option. The corresponding entry in
      this  manual  page  contains the simple formula via which the
      upper bound is computed.

      Two other groups of interest are the  thread-related  options
      (you can specify the number of threads to use) {-t, -te, -ti,
      --clone, -cloneat} and the verbosity-related options  {--ver-
      bose,  --silent, -v, -V}.  The actual settings are shown with
      -z, and for graphs with at most 12 nodes or so you  can  view
      the  MCL  matrix iterands on screen by supplying --show (this
      may give some more feeling).

      MCL iterands allow a generic interpretation as clusterings as
      well. The clusterings associated with early iterands may con-
      tain a fair amount of overlap. Refer to the -dump option, the
      mclfaq(7) manual, and the clmimac utility (Interpret Matrices
      As Clusterings).  Use clmimac only if you have a special rea-
      son; the normal usage of mcl is to do multiple runs for vary-
      ing -I parameters and  use  the  clusterings  output  by  mcl
      itself.

      The  creator of this page feels that manual pages are a valu-
      able resource, that online html documentation is also a  good
      thing to have, and that info pages are way way ahead of their
      time. The NOTES section explains how this page was created.

      In the OPTIONS section options are listed in order of  impor-
      tance, with related options grouped together.

  OPTIONS
      -I f (inflation)
         Sets the main inflation value to f. This value is the main
         handle for affecting cluster granularity.  It  is  usually
         chosen  somewhere in the range [1.2-5.0]. -I 5.0 will tend
         to result in fine-grained  clusterings,  and  -I 1.2  will
         tend  to  result  in very coarse grained clusterings. Your
         mileage will vary depending on the characteristics of your
         data.  That  is  why it is a good idea to test the quality
         and  coherency  of  your  clusterings  using  clmdist  and
         clminfo.  This will most likely reveal that certain values
         of -I are simply not right for your data. The clmdist sec-
         tion contains a discussion of how to use the cluster vali-
         dation tools shipped with mcl (see the SEE ALSO  section).

         A  second  option for affecting cluster granularity is the
         -c option.  It may possibly increase granularity.

         With low values for -I, like -I 1.2, you  should  be  pre-
         pared  to  use more resources in order to maintain quality
         of clusterings, i.e. increase the argument to the  -scheme
         option.

      -o str (fname)
         Output the clustering to file named fname.  It is possible
         to send the clustering to stdout by  supplying  -o -.  The
         clustering  is  output  in  the mcl matrix format; see the
         mcxio(5) section for more information on this.

         Look at the -ap option and its siblings for the  automatic
         naming  constructions  employed by mcl if the -o option is
         not used.

      -scheme k (use a preset resource scheme)
         There  are  currently  five  different  resource  schemes,
         indexed  1..5.  High schemes result in more expensive com-
         putations that may possibly be more accurate. The  default
         scheme  is  2. When mcl is done, it will give a grade (the
         so called jury synopsis) to  the  appropriateness  of  the
         scheme  used.  A low grade does not necessarily imply that
         the resulting clustering is bad - but anyway, a low  grade
         should  be  reason  to try for a higher scheme. The grades
         are listed in the PRUNING OPTIONS section  under  the  -nj
         option.

         The PRUNING OPTIONS section contains an elaborate descrip-
         tion of the way  mcl  manages  resources,  should  you  be
         interested.   In case you are worried about the validation
         of the resulting clusterings, the  mclfaq(7)  section  has
         several  entries discussing this issue. The bottom line is
         that you have to compare the  clusterings  resulting  from
         different  schemes  (and  otherwise  identical parameters)
         using utilities such as clmdist, clminfo on the one  hand,
         and your own sound judgment on the other hand.

         If  your  input  graph is extremely dense, with an average
         node degree (i.e. the number of neighbours per node)  that
         is  somewhere  above 500, you may need to filter the input
         graph by removing the nodes of highest  degree  (and  pro-
         jecting  them  back  onto  the resulting clustering after-
         wards) or by using the -pp option.

      --show-schemes (show preset resource schemes)
         Shows the explicit settings to which the different  preset
         schemes correspond.

         The  characteristics  are written in the same format (more
         or less) as the output triggered by -v pruning.

      -c f (centering)
         The larger the value of f the more nodes are  attached  to
         themselves  rather  than their neighbours, the more expan-
         sion (the spreading of flow through the graph) is opposed,
         and the more fine-grained clusterings tend to be. f should
         be chosen greater than or equal to  1.0.  The  default  is
         f=1.0.  This  option  has a much weaker effect than the -I
         option, but it can be useful depending on your data.

      -do str (carry out action)

      -dont str (omit action)
         These are the actions that can be controlled:

         clm
         log
         show-log
         keep-overlap

         The clm action appends a  short  synopsis  of  performance
         criteria to the output cluster file, as well as a synopsis
         of the granularity characteristics.

         The log action appends several characteristics of the  mcl
         run  to the output cluster file, such as the resource man-
         agement log, parameter settings, timings, and the  command
         line.   Useful  bookkeeping, and it is advisable to always
         use it.

         The show-log action sends the resource management  log  to
         STDOUT.

         The  keep-overlap  action  causes  mcl  to  retain overlap
         should this improbable event occur.  In  theory,  mcl  may
         generate a clustering that contains overlap, although this
         almost never happens in practice, as it requires some par-
         ticular  type of symmetry to be present in the input graph
         (not just any symmetry will do).  Mathematically speaking,
         this  is  a conjecture and not a theorem, but I am willing
         to eat my shoe if it does not hold (for marzipan values of
         shoe).  It  is easy though to construct an input graph for
         which certain mcl settings result in overlap - for example
         a  line graph on an odd number of nodes. The default is to
         remove overlap should it occur.

         This option has more than theoretical use because  mcl  is
         able  to generate clusterings associated with intermediate
         iterands.  For these clusterings, overlap is more  than  a
         theoretical  possibility,  and  will  often  occur. If you
         specify the -L k option, mcl will  output  the  clustering
         associated with the last iterand computed, and it may well
         contain overlap.

         This option has no effect on the clusterings that are out-
         put  when  using -dump cls - the default for those is that
         overlap is not touched, and this default can  not  yet  be
         overridden.

      -v str (verbosity type on)
         See the --verbose option below.

      -V str (verbosity type off)
         See the --verbose option below.

      --silent (very)
         See the --verbose option below.

      --verbose (very)
         These are the different verbosity modes:

         progress
         pruning
         explain
         all

         where  all  means all three previous modes.  --verbose and
         -v all turn them all on, --silent and -V all turn them all
         off.  -v str  and  -V str  turn on/off the single mode str
         (for str equal to one of progress, pruning,  or  explain).
         Each verbosity mode is given its own entry below.

      -v progress
         This  mode causes mcl to emit an ascii gauge for each sin-
         gle matrix multiplication. It uses some default length for
         the gauge, which can be altered by the -progress k option.
         Simply using the latter will also turn on  this  verbosity
         mode.   This mode can give you quickly an idea how long an
         mcl run might take. If  you  use  threading  (see  the  -t
         option  and  its  friends),  this option may slow down the
         program a little (relative to -V progress, not relative to
         a single-CPU mcl run).

      -v explain
         This mode causes the output of explanatory headers illumi-
         nating the output generated  with  the  pruning  verbosity
         mode.

      -v pruning
         This mode causes output of resource-related quantities. It
         has a separate entry in the PRUNING OPTIONS section.

      -progress k (gauge)
         If k>0 then for each matrix multiplication mcl will  print
         an  ascii  gauge  telling how far it is. The gauge will be
         (in some cases approximately) k characters  long.  If  k<0
         then mcl will emit a gauge that is extended by one charac-
         ter after every |k| vectors computed.  For  large  graphs,
         this option has been known to ease the pain of impatience.
         If k=0 then mcl will print  a  message  only  after  every
         matrix  multiplication,  and not during matrix multiplica-
         tion. This can be useful when you want mcl to be as speedy
         as possible, for example when using parallellized mode (as
         monitoring progress requires thread  communication).   For
         parallellization (by threading) see the -t option.

      -aa str (append str to suffix)
         See the -ap option below.

      -ap str (use str as file name prefix)
         If  the -o option is not used, mcl will create a file name
         (for writing output to) that should uniquely  characterize
         the important parameters used in the current invocation of
         mcl. The default format is  out.fname.suf,  where  out  is
         simply the literal string out, fname is the first argument
         containing the name of the file (with  the  graph)  to  be
         clustered,  and  where suf is the suffix encoding a set of
         parameters (described further below).

         The -ap str option specifies a prefix to use  rather  than
         out.fname  as sketched above.  However, mcl will interpret
         the character '=', if present in str, as a placeholder for
         the input file name.

         If  the -aa str option is used, mcl will append str to the
         suffix suf created by itself.  You can  use  this  if  you
         need  to  encode  some  extra information in the file name
         suffix.

         The suffix is constructed as follows. The -I f and -scheme
         parameter  are always encoded.  The -pi f, -l k, -i f, and
         -c f options are only encoded if they are used.  Any  real
         argument  f  is  encoded  using exactly one trailing digit
         behind the decimal separator (which itself  is  not  writ-
         ten).  The  setting  -I 3.14  is  thus encoded as I31. The
         -scheme option is encoded using the letter 's', all  other
         options mentioned here are encoded as themselves (stripped
         of the hyphen). For example

         mcl small.mci -I 3 -c 2.5 -pi 0.8 -scheme 5

         results in the file name  out.small.mci.I30s5c25pi08.   If
         you  want  to  know beforehand what file name will be pro-
         duced, use the -az option.

      -az (show output file name and exit)
         If mcl automatically constructs a file  name,  it  can  be
         helpful  to  known beforehand what that file name will be.
         Use -az and mcl will write the file  name  to  STDOUT  and
         exit.  This  can  be  used if mcl is integrated into other
         software for which the automatic creation of  unique  file
         names is convenient.

      -te k (#expansion threads)
         See the -t k option below.

      -ti k (#inflation threads)
         See the -t k option below.

      --clone (when threading)
         See the -t k option below.

      -cloneat n (trigger)
         See the -t k option below.

      -t k (#threads)
         The  -t  options are self-explanatory. Note that threading
         inflation is hardly useful, as inflation is orders of mag-
         nitude  faster than expansion. Also note that threading is
         only useful if you have a multi-processor system.

         The --clone option says to give each thread its  own  copy
         of  the  matrix  being expanded/squared. The latter option
         can be further controlled using  the  --cloneat k  option.
         Copies  are  only made if the source matrix (the one to be
         squared) has on average at least k  positive  entries  per
         vector.  This  option is probably not very useful, because
         without it mcl is a memory munger already.

         When threading, it is best not to  turn  on  pruning  ver-
         bosity  mode if you are letting mcl run unattended, unless
         you want to scrutinize its output later. This  is  because
         it  makes mcl run somewhat slower, although the difference
         is not dramatic.

      -l n (initial iteration number) (small letter ell)
         The number of times mcl will  use  a  different  inflation
         value  before it switches to the (main) inflation given by
         the -I (capital eye) option. The different value is called
         initial  inflation  and  is  tunable using the -i f option
         (default value f=2.0). The default value (to -l) is  zero.
         This  option supplies new ways of affecting cluster granu-
         larity, e.g. by supplying

         mcl proteins -i 1.4 -l 2 -I 4.0

         one lets expansion prevail during  the  first  two  itera-
         tions,  followed by inflation catching up (in a figurative
         way of writing).  This may be useful in certain cases, but
         this  type  of experiment is certainly secondary to simply
         varying -I (capital eye).

      -L n (main iteration number)
         Normally, mcl computes the MCL process until it  has  con-
         verged  fully to a doubly idempotent matrix. The number of
         iterations required is typically somewhere  in  the  range
         10-100.   The  first  few  iterations  generally  take the
         longest time.  The -L option can be used  to  specify  the
         number  of iterations mcl may do at most. When this number
         is reached, mcl will output the clustering associated with
         the iterand last computed.

      -i f (initial inflation)
         The  inflation  value  used during the first n iterations,
         where n is specified by the -l (ell) option.  By  default,
         n=0 and f=2.0.

      -pi f (pre-inflation)
         If  used,  mcl  will apply inflation one time to the input
         graph before entering the main process. This can be useful
         for making the edge weights in a graph either more homoge-
         neous (which may result in less granular  clusterings)  or
         more  nonuniform  (which may result in more granular clus-
         terings).

      -di i:j (dump interval)
      -dump-interval i:j
         Dump during iterations i..j-1. See  the  -dump str  option
         below.

      -dm k (dump i+0..i+k..)
      -dump-modulo k
         Sampling  rate:  select  only these iterations in the dump
         interval.  See the -dump str option below.

      -ds stem (file stem)
      -dump-stem stem
         Set the the stem for file names of dumped objects (default
         mcl). See the -dump str option below.

      -dump str (type)
         str can be of the following types.

         ite
         cls
         chr

         Repeated  use  is  allowed.   The  ite  option  writes mcl
         iterands to file. The cls option writes clusterings  asso-
         ciated with mcl iterands to file.

         The  chr  option  says,  for  each  iterand I, to output a
         matrix C with characteristics of I. C has the same  number
         of  columns  as  I. For each column k in C, row entry 0 is
         the diagonal or 'loop' value of column k in I after expan-
         sion  and  pruning,  and  before  inflation and rescaling.
         Entry 1 is the loop value after inflation  and  rescaling.
         Entry  2 is the center of column k (the sum of its entries
         squared) computed  after  expansion  and  before  pruning,
         entry  3  is the maximum value found in that column at the
         same time. Entry 4 is the amount of  mass  kept  for  that
         column after pruning.

         The  -ds  option  sets  the  stem for file names of dumped
         objects (default mcl). The -di and  -dm  options  allow  a
         selection of iterands to be made.

      -digits n (printing precision)
         This has two completely different uses. It sets the number
         of decimals used for  pretty-printing  mcl  iterands  when
         using  the --show option (see below), and it sets the num-
         ber of decimals used for writing the expanded matrix  when
         using the --expand-only option.

      --show (print matrices to screen)
         Print matrices to screen. The number of significant digits
         to be printed can be tuned with  -digits n.  An  80-column
         screen  allows  graphs (matrices) of size up to 12(x12) to
         be printed with three digits precision (behind the comma),
         and  of  size up to 14(x14) with two digits. This can give
         you an idea of how mcl operates, and what  the  effect  of
         pruning is.  Use e.g. -S 6 for such a small graph and view
         the MCL matrix iterands with --show.

      -sort str (sort mode)
         str can be one of lex, size, revsize, or none. The default
         is 'revsize', in which the largest clusters come first. If
         the mode is 'size', smallest clusters come first,  if  the
         mode is 'lex', clusters are ordered lexicographically, and
         if the mode is 'none', the order is the same  as  produced
         by the procedure used by mcl to map matrices onto cluster-
         ings.

      --inflate-first (rather then expand)
         Normally, mcl will take the input  graph/matrix,  make  it
         stochastic,  and  start  computing  an  mcl process, where
         expansion  and  inflation  are  alternated.  This   option
         changes  that  to  alternation of inflation and expansion,
         i.e. inflation is the first operator to be  applied.  This
         is  intended  for use with an input matrix that was gener-
         ated with the --expand-only option (see below).  If you do
         multiple  mcl runs for the same graph, then the first step
         will be the same for all runs, namely computing the square
         of  the input matrix. With the pair of --inflate-first and
         --expand-only this bit of computing can be  factored  out.
         NOTE this option assumes that the input matrix is stochas-
         tic (as it will be when generated with --expand-only.  The
         --inflate-first  option  renders  all options useless that
         will otherwise affect  the  input  matrix,  and  precisely
         these  options  do  affect the matrix resulting from using
         --expand-only. See the entry below for more information.

      --expand-only (factor out computation)
         This option makes mcl compute just the square of the input
         graph/matrix,  and write it to the file name supplied with
         the -o flag, or to the default file named out.mce. NOTE in
         this  case  the  output  matrix  is  not a clustering. The
         intended use is that the output matrix is  used  as  input
         for mcl with the --inflate-first switch turned on, so that
         multiple mcl runs need not redo the same computation  (the
         first expansion step).

         Note  that  the  -scheme parameters affect the matrix com-
         puted with --expand-only. Other options  that  affect  the
         matrix  resulting  from this option: -pp, -c, and -digits.
         The latter option sets the precision for output in  native
         ascii format.

      -pp n (input matrix)
         For  each column vector (node) in the input matrix (graph)
         mcl will keep the n entries (outgoing edges) of that  vec-
         tor (node) that have largest weight and remove the rest.

      --jury-charter (explains jury)
         Explains  how  the jury synopsis is computed from the jury
         marks.

      --version (show version)
         Show version.

      -how-much-ram n (RAM upper bound)
         n is interpreted as the number of nodes of an input graph.
         mcl  will print the maximum amount of RAM it needs for its
         computations.  The formula for this number in bytes is:






            2 * c * k * n

            2  :  two matrices are concurrently held in memory.
            c  :  mcl cell size (as shown by -z).
            n  :  graph cardinality (number of nodes).
            k  :  MAX(s, r).
            s  :  select number (-S, -scheme options).
            r  :  recover number (-R, -scheme options).

         This estimate will usually be  too  pessimistic.  It  does
         assume  though  that  the average node degree of the input
         graph does not exceed k. The  -how-much-ram  option  takes
         other  command-line arguments into account (such as -S and
         -R), and it expresses the amount of RAM in megabyte units.

      -h (show help)
         Shows a selection of the most important mcl options.

      --show-settings (show settings)
         A synonym for the -z option.

      -z (show settings)
         Show current settings for tunable parameters.  --show-set-
         tings is a synonym.

  PRUNING OPTIONS
      -p f (cutoff)
      -P n (1/cutoff)
      -S s (selection number)
      -R r (recover number)
      -pct pct (recover percentage)
         After computing a new (column  stochastic)  matrix  vector
         during  expansion  (which  is  matrix  multiplication c.q.
         squaring), the vector is successively exposed to different
         pruning  strategies.  The  intent  of pruning is that many
         small entries are removed  while  retaining  much  of  the
         stochastic  mass  of  the  original vector. After pruning,
         vectors are rescaled to be stochastic again. MCL  iterands
         are  theoretically known to be sparse in a weighted sense,
         and this manoever effectively perturbs the MCL  process  a
         little  in  order  to  obtain  matrices that are genuinely
         sparse, thus keeping the computation tractable. An example
         of  monitoring  pruning  can be found in the discussion of
         -v pruning at the end of this section.

         mcl proceeds as follows. First, entries that  are  smaller
         than  cutoff  are  removed,  resulting in a vector with at
         most 1/cutoff entries. The cutoff can be  supplied  either
         by  -p,  or as the inverse value by -P. The latter is more
         intuitive, if your intuition  is  like  mine  (and  the  P
         stands  for  precision or pruning by the way).  The cutoff
         just described is rigid; it is the same for  all  vectors.
         The --adapt option causes the computation of a cutoff that
         depends on a vector's  homogeneity  properties,  and  this
         option may speed up mcl considerably.

         Second,  if the remaining stochastic mass (i.e. the sum of
         all remaining entries) is less than pct/100 and the number
         of  remaining  entries is less than r (as specified by the
         -R flag), mcl will try to regain ground by recovering  the
         largest  discarded entries. The total number of entries is
         not allowed to grow larger than r.  If  recovery  was  not
         necessary,  mcl  tries to prune the vector further down to
         at most s entries (if applicable), as specified by the  -S
         flag.  If  this  results  in  a  vector that satisfies the
         recovery condition then recovery is attempted, exactly  as
         described  above. The latter will not occur of course if r
         <= s.

         The default  setting  is  something  like  -P 4000  -S 500
         -R 600.  Check  the  -z flag to be sure. There is a set of
         precomposed settings, which  can  be  triggered  with  the
         -scheme k option. k=2 is the default scheme; higher values
         for k result in costlier and  more  accurate  computations
         (vice  versa  for lower, cheaper, and less accurate).  The
         schemes are listed using the --show-schemes option. It  is
         advisable  to  use  the -scheme option only in interactive
         mode, and to use the explicit expressions when doing batch
         processing. The reason is that there is no guarantee what-
         soever that the schemes will not change between  different
         releases.  This  is  because  the  scheme  options  should
         reflect good general purpose settings, and it  may  become
         appararent that other schemes are better.

         Note  that 'less accurate' or 'more accurate' computations
         may  still  generate  the  same  output  clusterings.  Use
         clmdist   to  compare  output  clusterings  for  different
         resource parameters. Refer to clmdist for a discussion  of
         this issue.

      -warn-pct k (prune warn percentage)
      -warn-factor k (prune warn factor)
         The two options -warn-pct and -warn-factor relate to warn-
         ings that may be triggered once the initial pruning  of  a
         vector is completed. The idea is to issue warnings if ini-
         tial pruning almost completely destroys a computed vector,
         as  this  may be a sign that the pruning parameters should
         be changed. It depends on the mass remaining after initial
         pruning  whether a warning will be issued. If that mass is
         less than warn-pct or if the number of  remaining  entries
         is smaller by a factor warn-factor than both the number of
         entries originally computed and the  recovery  number,  in
         that case, mcl will issue a warning.

         -warn-pct takes an integer between 0 and 100 as parameter,
         -warn-factor takes a real positive number. They default to
         something  like 30 and 50.0. If you want to see less warn-
         ings, decrease  warn-pct  and  increase  warn-factor.  Set
         warn-factor to zero if you want no warnings.

      --dense (allow matrices to fill)
         This  renders  all pruning options useless except for one.
         After each expansion step, mcl  will  remove  all  entries
         that are smaller than the threshold specified by -p or -P,
         which acts like a precision in this case.  After  removal,
         the matrix columns are rescaled to be stochastic again.

         If the -p threshold (precision) is zero or very small, the
         --dense option results  in  a  rather  accurate  and  very
         costly  computation  of  the  MCL process. Do not use this
         option for graphs with  more  than  several  thousands  of
         entries,  or  you will have trouble digging your processor
         out of swap.

      --rigid (pruning)
         See the --adapt option below.

      -ae f (adaptive pruning exponent)
         See the --adapt option below.

      -af f (adaptive pruning factor)
         See the --adapt option below.

      --adapt (pruning)
         The default mcl pruning behaviour as described  under  the
         -P  option  is  called rigid pruning (it being the default
         renders the switch --rigid currently useless), refering to
         the  fact  that the first stage of pruning removes entries
         smaller than a fixed  threshold.   The  options  discussed
         here enable the computation of a threshold that depends on
         the  homogeneity  characteristics  of   a   vector.   This
         behaviour is triggered by supplying --adapt.

         The  --adapt  behaviour  only  affects  the  first pruning
         stage, c.q.  the computation of the first  threshold  (see
         the discussion under the -P option). It does not interfere
         with either selection or recovery. It is affected  however
         by the threshold as specified by the -P option. When using
         --adapt, you typically use the -P option as well, and  you
         can  and  should use a higher value then you would without
         using --adapt.

         All that said, --adapt triggers this  behaviour:  Given  a
         stochastic  vector v, its mass center of order two is com-
         puted, which is the sum of each entry  squared.  The  mass
         center  of  v, call it c, is strongly related to its homo-
         geneity properties (see REFERENCES). The  threshold  T  is
         computed  as  1/f * pow(c, e), where e and f are the argu-
         ments to the -af f and -ae e options  respectively  (check
         -z  for  the  respective  defaults).   For  either  e or f
         decreasing it means that T becomes larger.  Finally, T  is
         maxed with the rigid threshold value, which can be altered
         using either -p f or -P n.  The latter is why  you  should
         increase  the  -P parameter n (so that the rigid threshold
         is decreased) once you switch  to  adaptive  pruning.  The
         adaptive  threshold  should be the main factor controlling
         pruning, with the rigid threshold acting  as  a  safeguard
         that does not take over too often.

         This  may  seem  complicated,  but  the rules are actually
         quite simple, and you may just disregard the definition of
         T.  The usefulness of these options will vary. If you want
         to speed up mcl, try it out and add --adapt to  your  set-
         tings.

      --thick (expect dense input graph)
         This  option  is somewhat esoteric. It does not affect the
         matrices as computed by mcl, but it  affects  the  way  in
         which they are computed. If the input graph is very dense,
         this may speed up mcl a little.

      -v pruning
         Pruning verbosity mode causes mcl to emit several  statis-
         tics  related  to  the  pruning  process, each of which is
         described below. Use -v explain to get explanatory headers
         in the output as well (or simply use -v all).

         Selection and recovery
         The number of selections and recoveries mcl had to perform
         during each iteration is shown. It also shows  the  number
         of  vectors  for  which  the  mass after final pruning was
         below the fraction defined by the -pct option  as  a  per-
         centage (default probably 90 or 95).

         Initial and pruned vector footprint distributions
         The distribution of the vector footprints (i.e. the number
         of nonzero entries) before and  after  pruning  is  shown.
         This  is assembled in a terse (horrid if you will) format,
         looking as follows (with  some  context  stripped,  noting
         that the data for three expansion steps is shown):

         ----------------------------------------------------
          mass percentages  | distr of vec footprints       |
                  |         |____ expand ___.____ prune ____|
           prune  | final   |e4   e3   e2   |e4  e3   e2    |
         all ny nx|all ny nx|8532c8532c8532c|8532c8532c8532c|
         ---------.---------.---------------.---------.-----.
          98 88 86  98 91 86 _________022456 ___________0234
          98 89 86  98 94 91 _______00245678 ___________0234
          98 90 89  99 95 94 _______00235568 ___________0234
          ...

         This  particular output was generated (and truncated after
         three rounds of expansion and inflation) from clustering a
         protein graph on 9058 nodes with settings -I 1.4, -P 2000,
         -S 500, -R 600, and -pct 95.

         The header entries  8532c85..  indicate  thresholds  going
         from 80000, 50000, 20000, 12500, 8000, all the way down to
         300, 200, and 125. The character 'c'  signifies  the  base
         1.25 (for no apparent reason). The second entry '2' (after
         '0') on the first line signifies that roughly  20  percent
         of  all  the  vectors  had  footprint  (#nonzero  entries)
         between 800 and 1250.  Likewise, 40 percent had  footprint
         between  300  and  500. The '0' entries signify a fraction
         somewhere below 5 percent, and the '@' entries  signify  a
         fraction somewhere above 95 percent.

         Two columns are listed, one for the expansion vector foot-
         prints (i.e. after squaring), and the other for the vector
         footprints  right  after  initial pruning took place (i.e.
         before selection and recovery, after  either  adaptive  or
         rigid pruning).  This may give an idea of the soundness of
         the initial pruning  process  (overly  severe,  or  overly
         mild), and the extent to which you want to apply selection
         and/or recovery.

         Initial and final mass windows
         The mass averages of the pruned vectors  after  the  first
         selection  stage  are  shown, and the mass averages of the
         vectors as finally  computed,  i.e.  after  selection  and
         recovery.  Note that the latter corresponds to a different
         stage than what is shown for  the  vector  footprints,  if
         either  selection  or  recovery  is  turned  on.  For both
         cases, three averages are shown: the average over all vec-
         tors,  the average over the worst x cases, and the average
         over the worst y cases. The mass  averages  are  shown  as
         percentages:  '98' on the first line under the 'prune/all'
         column means that overall 98  percent  of  the  stochastic
         mass of the matrix was kept after pruning.

         This  example  demonstrates  that  many  entries  could be
         removed while retaining much of the stochastic  mass.  The
         effect  of  the recovery (-R) parameter is also clear: the
         final averages are higher than the initial averages, as  a
         result of mcl undoing some overenthousiastic pruning.

         An  average  over  the worst k cases is called a window of
         width k; internally, mcl tracks many  more  such  windows.
         The  result  of  this  can  be seen when using the -do log
         option (which appends a log to the cluster output) or  the
         -do show-log option (which sends the log to STDOUT).  From
         a fixed set of  windows  those  that  are  applicable  are
         tracked,  that  is,  all those windows for which the width
         does not exceed the graph cardinality. The windows in  the
         fixed  set  have respective sizes 1, 2, 5, 10, 20, 50, and
         so on up until 5000000 (which makes 15 windows in all).

      -nx i (x window index)
      -ny j (y window index)
         The options -nx and -ny both take an index  in  the  range
         1..15. The default values for -nx and -ny are respectively
         4 and 7, denoting the fourth and seventh window of respec-
         tive  widths  10  and  100. They are used in the verbosity
         output as described above.

      -nj i (jury window index)
         The -nj denotes a window index in the same way as -nx  and
         -ny  do.  This particular window is used for computing the
         jury marks, which are the three  number  reported  by  mcl
         when  it  is done. They are a reminder of the existence of
         pruning and its importance for both  speed  and  accuracy,
         and they are indicative rather than authorative.

         These  jury  marks are simply the respective mass averages
         in the jury window for the  first  three  iterations.  The
         marks  are  even further simplified and mapped to the jury
         synopsis, which is a single grade expressed as  an  adjec-
         tive.  The grades are, in decreasing order of achievement,
         perfect exceptional  superior  excellent  good  acceptable
         mediocre  poor  bad  lousy  miserable awful wretched atro-
         cious. Doing 'mcl --jury-charter' will tell  you  how  the
         jury marks map onto the jury synopsis.

         The  jury  marks  should  preferably be higher than 70. If
         they are in the vicinity of 80 or 90, mcl is doing fine as
         far  as  pruning  is concerned.  Choose a higher scheme if
         you think them too low. For very dense graphs that do have
         strong  cluster  structure, the jury marks can sink as low
         as to the 30's and 40's, but the clusterings generated  by
         mcl  may  still  be good. The marks and the synopsis grade
         the severity of pruning, not cluster  quality.  Note  that
         the  jury  becomes  friendlier, resp. harsher when the -nj
         option is increased/decreased.

      -nw w (nr of windows)
         Normally, mcl will use all windows that have width smaller
         than  the cardinality of the input graph. This option lim-
         its the set of windows to  those  w  windows  of  smallest
         width.  This affects the -do log output.

      -nl l (number of iterations)
         By  default, mcl will log the window mass averages for the
         first ten iterations. This options sets that number  to l.
         It affects the -do log output.

  PIPELINES
      In  general, clustering requires several stages; creating the
      matrix, running mcl, and displaying the result.  The  display
      stage  is  supported  by clmformat. The matrix creation stage
      often needs only be done once for a  given  data  collection,
      followed by repeated runs of the other two stages for varying
      inflation values and scheme settings.

      The matrix creation stage can often be split up in  two  more
      stages,  namely parsing a data file in some given format, and
      assembling a matrix from the data bits and  pieces,  such  as
      node  indices  and edge weights or even edge weight contribu-
      tions.  The assembly step can be done by  mcxassemble,  which
      allows a very general input format and customizable behaviour
      in how the bits and pieces should be transformed to the input
      graph.  This leaves the parse stage to be filled in.

      The  mclpipeline script implements a generic and customizable
      pipeline encapsulating the  four  stages  distinguished  here
      (parsing,  assembly,  clustering, display). It is possible to
      let only part of the pipeline be active, and many other  fea-
      tures are supported. The IO mechanism is entirely file based,
      and files are associated with parametrizations via file  name
      extensions (by all means a simple mechanism).

      mclpipeline  requires  a single parse script to be specified.
      It will be plugged into the pipeline and you should be set to
      run.   The  parse  script must satisfy the interface require-
      ments described in the mclpipeline manual page.

      For BLAST input, the mclblastline script provides a dedicated
      interface.  It uses the mcxdeblast script that comes prepack-
      aged with mcl.

  APPLICABILITY
      mcl will work very well for graphs in which the  diameter  of
      the  natural  clusters is not too large. The presence of many
      edges between different clusters is not problematic; as  long
      as  there  is cluster structure, mcl will find it. It is less
      likely to work well for graphs with clusters  (inducing  sub-
      graphs) of large diameter, e.g. grid-like graphs derived from
      Euclidean data. So mcl in its canonical form is certainly not
      fit  for  boundary detection or image segmentation. I experi-
      mented with a modified mcl and image segmentation in the the-
      sis  pointed  to below (see REFERENCES). This was fun and not
      entirely unsuccesful, but not something to  be  pursued  fur-
      ther.

      mcl  likes  undirected  input graphs best, and it really dis-
      likes graphs with node pairs (i,j) for  which  an  arc  going
      from  i  to  j  is present and the counter-arc from j to i is
      absent. Try to make your input  graph  undirected.   Further-
      more,  mcl interprets edge weights in graphs as similarities.
      If you are used to working  with  dissimilarities,  you  will
      have  to  convert those to similarities using some conversion
      formula. The most important thing is that you feel  confident
      that the similarities are reasonable, i.e. if X is similar to
      Y with weight 2, and X is similar to Z with weight 200,  then
      this  should  mean  that  the  similarity  of  Y  (to  X)  is
      neglectible compared with the similarity of Z (to X).

      mcl is probably not suited for clustering tree  graphs.  This
      is because mcl works best if there are multiple paths between
      different nodes in the natural clusters, but in  tree  graphs
      there  is  only one path between any pair of nodes. Trees are
      too sparse a structure for mcl to work on.

      mcl may well be  suited  for  clustering  lattices.  It  will
      depend on the density characteristics of the lattice, and the
      conditions for success are the same as those  for  clustering
      graphs  in  general:  The  diameter  of  the natural clusters
      should not be too large.  NOTE when clustering a lattice, you
      have  to cluster the underlying undirected graph, and not the
      directed graph that represents the lattice itself. The reason
      is that one has to allow mcl (or any other cluster algorithm)
      to 'look back in time', so to speak.  Clustering  and  direc-
      tionality bite each other (long discussion omitted).

      mcl has a worst-case time complexity O(N*k^2), where N is the
      number of nodes in the graph, and k is the maximum number  of
      neighbours  tracked  during computations. k depends on the -P
      and -S options. If the  -S  option  is  used  (which  is  the
      default  setting)  then k equals the value corresponding with
      this option. Typical values for k are in the range 500..1000.
      The  average  case is much better than the worst case though,
      as cluster structure itself has the effect of  helping  mcl's
      pruning  schemes,  certainly if the diameter of natural clus-
      ters is not large.

  FILES
      There are currently no resource nor configuration files.  The
      mcl matrix format is described in the mcxio(5) section.

  ENVIRONMENT
      Currently, no environmental issues with mcl.

  DIAGNOSTICS
      If  mcl  issues  a  diagnostic  error, it will most likely be
      because the input matrix could  not  be  parsed  succesfully.
      mcl  tries  to  be  helpful  in  describing the kind of parse
      error.  The mcl matrix format is described  in  the  mcxio(5)
      section.

  BUGS
      No  known  bugs at this time. Please send bug reports to mcl-
      devel@mdcc.cx.

  AUTHOR
      Stijn van Dongen.

  HISTORY/CREDITS
      The MCL algorithm was conceived in spring 1996 by the present
      author.   The  first implementation of the MCL algorithm fol-
      lowed that spring and summer. It  was  written  in  Perl  and
      proved  the  viability  of  the algorithm. The implementation
      described here began its life in autumn 1997. The first  ver-
      sions  of  the  vital matrix library were designed jointly by
      Stijn van Dongen and Annius Groenink in  the  period  Oktober
      1997  -  May 1999. The efficient matrix-vector multiplication
      routine was written by Annius. This routine is  without  sig-
      nificant  changes  still  one of the cornerstones of this MCL
      implementation.

      Since May 1999 all MCL libraries have seen  much  development
      and redesign by the present author. Matrix-matrix multiplica-
      tion has been rewritten several times to take full  advantage
      of  the  sparseness  properties  of  the  stochastic matrices
      brought forth by the MCL algorithm. This mostly concerns  the
      issue  of pruning - removal of small elements in a stochastic
      column in order to keep matrices sparse.

      Very instructive was  that  around  April  2001  Rob  Koopman
      pointed  out  that  selecting the k largest elements out of a
      collection of n is best done using a min-heap. This  was  the
      key  to  the second major rewrite (now counting three) of the
      MCL pruning schemes, resulting in much faster code, generally
      producing a more accurate computation of the MCL process.

      In  May  2001 Anton Enright initiated the parallellization of
      the mcl code and threaded inflation. From this example, Stijn
      threaded  expansion.  This  was great, as the MCL data struc-
      tures and operands (normal matrix multiplication and Hadamard
      multiplication) just beg for parallellization.

      Joost  van  Baal set up the mcl CVS tree and packaged mcl for
      Debian GNU/Linux. He completely autotooled  the  sources,  so
      much  so  that  at  first  I  found it hard to find them back
      amidst bootstrap, aclocal.m4, depcomp, and other beauties.

      Jan van der Steen shared his  elegant  mempool  code.  Philip
      Lijnzaad  gave  useful  comments.  Philip,  Shawn  Hoon, Abel
      Ureta-Vidal, and Martin Mokrejs sent helpful bug reports.

  SEE ALSO
      mcl development is discussed on mcl-devel@lists.mdcc.cx, this
      list is archived at http://lists.mdcc.cx/mcl-devel/.

      mcxio(5) - a description of the mcl matrix format.

      mclfaq(7) - Frequently Asked Questions.

      mcx - an interpreter for a stack language that enables inter-
      action with the mcl matrix libraries. It  can  be  used  both
      from  the command line and interactively, and supports a rich
      set of operations  such  as  transposition,  scaling,  column
      scaling,  multiplication,  Hadamard  powers  and products, et
      cetera. The general aim is to provide handles for simple num-
      ber and matrix arithmetic, and for graph, set, and clustering
      operations. The following is a very simple example of  imple-
      menting and using mcl in this language.

       2.0 .i def                    # define inflation value.
       /small lm                     # load matrix in file 'small'.
       dim id add                    # add identity matrix.
       st .x def                     # make stochastic, bind to x.
       { xpn .i infl vm } .mcl def   # define one mcl iteration.
       20 .x .mcl repeat             # iterate  20 times
       imac                          # interpret matrix as clustering.
       vm                            # view matrix (clustering).

      One  of the more interesting things that can be done is doing
      mcl runs with more complicated inflation  profiles  than  the
      two-constant approach used in mcl itself.

      Several  other utilities come with mcl, facilitating analysis
      and comparison of different clusterings.

      clmdist - compute the split/join distance between two  parti-
      tions. The split/join distance is better suited for measuring
      partition similarity than the long-known equivalence mismatch
      coefficient.  The  former  measures  the number of node moves
      required to transform one partition into the other, the  lat-
      ter  measures  differences between volumes of edges of unions
      of complete graphs associated with partitions.

      clminfo - compute a performance measure  saying  how  well  a
      clustering captures the edge weights of the input graph. Use-
      ful for comparing different clusterings on  the  same  graph,
      best  used  in  conjunction  with clmdist - because comparing
      clusterings at different levels of granularity  should  some-
      what  change the performance interpretation. The latter issue
      is discussed in the clmdist entry.

      clmmeet - compute the intersection of a set  of  clusterings,
      i.e.  the  largest clustering that is a subclustering of all.
      Useful for measuring the consistency of a  set  of  different
      clusterings at supposedly different levels of granularity (in
      conjunction with clmdist).

      clmformat - display clusters suitable for scrutinizing.

      clmresidue -  extend  a  clustering  of  a  subgraph  onto  a
      clustering of the larger graph.

      clmimac - interpret MCL iterands as clusterings. The cluster-
      ings associated with  early  iterands  may  contain  overlap,
      should you be interested therein.

      mclpipeline  - set up a pipeline from data parsing stage unto
      clustering format/display stage.

      mcxassemble -  assemble  a  matrix/graph  from  partial  edge
      weight  scores.  Useful  intermediate  format to be used when
      transforming application specific  data  into  an  mcl  input
      matrix.

      mcxsubs  -  compute  a submatrix of a given matrix, where row
      and column index sets can be specified as  lists  of  indices
      combined  with list of clusters in a given clustering. Useful
      for inspecting local cluster structure.

      mcxmap - relabel indices in a graph.

      mcl's home at http://micans.org/mcl/.

  REFERENCES
      Stijn van Dongen, Graph Clustering by Flow  Simulation.   PhD
      thesis, University of Utrecht, May 2000.
      http://www.library.uu.nl/digia-
      rchief/dip/diss/1895620/inhoud.htm

      Stijn van Dongen. A cluster algorithm for graphs.   Technical
      Report INS-R0010, National Research Institute for Mathematics
      and Computer Science in the Netherlands, Amsterdam, May 2000.
      http://www.cwi.nl/ftp/CWIreports/INS/INS-R0010.ps.Z

      Stijn van Dongen. A stochastic uncoupling process for graphs.
      Technical Report INS-R0011, National Research  Institute  for
      Mathematics  and Computer Science in the Netherlands, Amster-
      dam, May 2000.
      http://www.cwi.nl/ftp/CWIreports/INS/INS-R0011.ps.Z

      Stijn van Dongen. Performance criteria for  graph  clustering
      and  Markov  cluster experiments. Technical Report INS-R0012,
      National Research Institute for Mathematics and Computer Sci-
      ence in the Netherlands, Amsterdam, May 2000.
      http://www.cwi.nl/ftp/CWIreports/INS/INS-R0012.ps.Z

      Enright  A.J.,  Van  Dongen  S.,  Ouzounis C.A.  An efficient
      algorithm for  large-scale  detection  of  protein  families,
      Nucleic Acids Research 30(7):1575-1584 (2002).

  NOTES
      This   page   was   generated   from   ZOEM   manual  macros,
      http://micans.org/zoem. Both html and roff pages can be  cre-
      ated  from  the same source without having to bother with all
      the usual conversion problems, while keeping  some  level  of
      sophistication in the typesetting.



  mcl 1.003, 03-185           4 Jul 2003                     mcl(1)
