package VCP::Help;

%topics = (
########################################################################
'vcp' => <<'TOPIC',
NAME
    vcp - Copy versions of files between repositories and/or RevML

SYNOPSIS
       # interactive mode:

       vcp [vcp_opts]

       # scriptable command line mode:

       vcp [vcp_opts] <source> <dest>

       # getting options from a file:

       vcp vcp:config.vcp

       # help output:

       vcp help
       vcp help [topic]

DESCRIPTION
    "vcp" ('version copy') copies versions of files from one repository
    to another, translating as much metadata as possible along the way.
    This allows you to copy and translate files and their histories
    between revision storage systems.

    Supported source and destination types are "cvs:", "p4:", and
    "revml:".

  Copying Versions
    The general syntax of the vcp command line is:

       vcp [<vcp options>] <source> <dest>

    The three portions of the command line are:

    "<vcp options>"
        Command line options that control the operation of the "vcp"
        command, like "-d" for debugging or "-h" for help. There are
        very few global options, these are covered below. Note that they
        must come before the "<source>" specification.

    "<source>"
        Were to extract versions from, including any command line
        options needed to control what is extracted and how. See the
        next section.

    "<dest>"
        Where to insert versions, including any command line options
        needed to control how files are stored. See the next section.

  Specifying Repositories
    The "<source>" and "<dest>" specifications specify a repository and
    provide any options needed for accessing that repository.

    These spefications may be a simple filename for reading or writing
    RevML files (if the requisite XML handling modules are installed).
    or a full repository specification like "cvs:/home/cvs/root:module"
    or "p4:user:password@server:port://depot/dir".

    When using the long form to access a repository, "<source>" and
    "<dest>" specification have several fields delimited by ":" and "@",
    and may have trailing command line options. The full (rarely used)
    syntax is:

       scheme:user(view):password@repository:filespec [<options>]

    where

    "scheme:"
        The repository type ("p4:", "cvs:", "revml:").

    "user", "view", and "password"
        Optional values for authenticating with the repository and
        identifying which view to use. "cvs" does not use "view". For
        "p4", "view" is the client setting (equibalent to setting
        "P4CLIENT" or using "p4"'s "-c" option).

    "repository"
        The repository spec, CVSROOT for CVS or P4PORT for p4.

    "filespec"
        Which versions of what files to move. As much as possible, this
        spec is similar to the native filespecs used by the repository
        indicated by the scheme.

    "<options>"
        Command line options that usually mimic the options provided by
        the underlying repositories' command line tools ("cvs", "p4",
        etc).

    Most of these fields are omitted in practice, only the "scheme"
    field is required, though (in most cases) the "repository" field is
    also needed unless you set the appropriate environment variables
    ("CVSROOT", "P4PORT", etc).

    The a bit confusing, here are some examples specs:

       cvs:server:/foo
       p4:user@server://depot/foo/...
       p4:user:password@public.perforce.com:1666://depot/foo/...

    Options and formats for of individual schemes can be found in the
    relevant help topics, for instance:

       vcp help source::cvs

    Run "vcp help" for a list of topics.

    When reading and writing RevML files, a simple filename will do
    (although the long form may also be used). The special value "-"
    means to read/write stdin and stdout when used as a source or
    destination name, respectively. "-" is assumed if a specification is
    not provided, so these invocations all accomplish the same thing,
    reading and writing RevML:

       vcp
       vcp -
       vcp revml:-
       vcp revml:
       vcp - -
       vcp - revml:-
       vcp - revml:
       vcp revml:- revml:-
       vcp revml: revml:

  "vcp" Options
    All general options to vcp must precede the "<source>".
    Scheme-specific options must be placed immediately after the
    "<source>" or "<dest>" spec and before the next one.

    --debug, -d
        Enables logging of debugging information.

    --help, -h, -?
        These are all equivalent to "vcp help".

    --output-config-file=$filename
        Write the settings (parsed from the UI, the command line, or a
        config file to a file. Useful for capturing settings or user
        interface output. Does not affect running. Use "-" to emit to
        STDOUT.

        NOTE 1: This does *not* emit an "Options:" section containing
        global options (those listed here). Almost all of these options
        are not useful to emit; we can add an option to force their
        emission if need be.

        NOTE 2: When using the interactive user interface, this option
        takes effect after the last interactive portion and, if vcp goes
        on to run a conversion, before any conversion is run. This
        occurs in addition to any configuration files the user may ask
        the interactive interface to write. This may change in the
        future (for instance, if the interactive dialog includes an
        option to extract and analyze metadata).

    --dont-convert
        Do not run a conversion. Useful when you just want to emit a
        .vcp file.

    --versions
        Emits the version numbers of bundled files.

    --terse, -t
        Suppress verbose explanations when running the interactive UI.
        Has no effect on operation if all settings are read from the
        command line or a .vcp file.

    --quiet, -q
        Suppresses the banner and progress bars.

  Getting help
    (See also Generating HTML Documentation, below).

    There is a slightly different command line format for requesting
    help:

       vcp help [<topic>]

    where "<topic>" is the optional name of a topic. "vcp help" without
    a "<"topic">" prints out a list of topics, and "vcp help vcp" emits
    this page.

    All help documents are also available as Unix "man" pages and using
    the "perldoc" command, although the names are slightly different:

       with vcp               via perldoc        
       ================       ===========
       vcp help vcp           perldoc vcp
       vcp help source::cvs   perldoc VCP::Source::cvs
       vcp help source::cvs   perldoc VCP::Dest::p4

    "vcp help" is case insensitive, "perldoc" and "man" may or may not
    be depending on your filesystem. The "man" commands look just like
    the example "perldoc" commands except for the command name. Both
    have the advantage that they use your system's configured pager if
    possible.

  Environment Variables
    The environment is often used to set context for the source and
    destination by way of variables like P4USER, P4CLIENT, CVSROOT, etc.

    VCPDEBUG
        The VCPDEBUG variable acts just like "-d=$VCPDEBUG" was present
        on the command line:

           VCPDEBUG=1

        (see "--debug, -d" for more info). This is useful when VCP is
        embedded in another application, like a makefile or a test
        suite.

SEE ALSO
    VCP::Process, VCP::Newlines, VCP::Source::p4, VCP::Dest::p4,
    VCP::Source::cvs, VCP::Dest::cvs, VCP::Source::revml,
    VCP::Dest::revml, VCP::Newlines. All are also available using "vcp
    help".

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'vcp usage' => <<'TOPIC',
Usage:
       # interactive mode:

       vcp [vcp_opts]

       # scriptable command line mode:

       vcp [vcp_opts] <source> <dest>

       # getting options from a file:

       vcp vcp:config.vcp

       # help output:

       vcp help
       vcp help [topic]
TOPIC
########################################################################
'source' => <<'TOPIC',
NAME
    VCP::Source - A base class for repository sources

SYNOPSIS
DESCRIPTION
OPTIONS
    --bootstrap
          --bootstrap=pattern

        Forces all files matching the given shell regular expression
        (may use wildcards like "*", "?", and "...") to have their first
        revisions transferred as complete copies instead of deltas. This
        is useful when you want to transfer a revision other than the
        first revision as the first revision in the target repository.
        It is also useful when you want to skip some revisions in the
        target repository (although the Map filter has superceded this
        use).

    --continue
        Tells VCP to continue where it left off from last time. This
        will not detect new branches of already transferred revisions
        (this limitation should be lifted, but results in an expensive
        rescan of metadata), but will detect updates to already
        transferred revisions.

    --rev-root
        Tells VCP to extract files relative to a directory in the source
        repository other than the default directory. Ordinarily, VCP
        looks at the source specification and deduces that the lowest
        complete directory name is the "root" directory for all
        revisions, or the "rev_root".

        For instance, given the specification:

            cvs:foo/bar/...

        the rev_root will be "foo/bar" and the files under bar will be
        extracted with a path relative to bar, so "foo/bar/baz/bat" will
        be extracted with the value "baz/bat".

        They will be inserted in the destination repository relative to
        the rev_root for the destination, so if the destination spec is
        like:

            p4://depot/...

        then "baz/bat" will be inserted in the destination as
        "//depot/baz/bat".

        If there is no target rev_root specified, as in the spec:

            p4:

        then the source's rev_root will be assumed, so "baz/bat" in our
        example would be placed in "//foo/bar/baz/bat".

SUBCLASSING
    This class uses the fields pragma, so you'll need to use base and
    possibly fields in any subclasses. See VCP::Plugin for methods often
    needed in subclasses.

  Subclass utility API
    options_spec
        Adds common VCP::Source options to whatever options VCP::Plugin
        parses:

    dest
        Sets/Gets a reference to the VCP::Dest object. The source uses
        this to call handle_header(), handle_rev(), and handle_end()
        methods.

    continue
        Sets/Gets the CONTINUE field (which the user sets via the
        --continue flag)

    real_source
        Returns the reference to be used when sending revisions to the
        destination.

        Each revision has a pointer to the source that sends it so that
        filters and destinations can call get_source_file().

        Most sources return $self; Sources that spool data, such as
        VCP::Source::metadb, need to specify a real source. They do so
        by overloading this method. VCP::Source::revml does not do this,
        as it supplies a get_source_file().

    rev_mode
            my $mode = $self->rev_mode( $filebranch_id, $rev_id );

        Returns FALSE, "base", or "normal" as a function of the
        filebranch and rev_id. Do not queue the revision if this returns
        FALSE (you may also skip any preceding revisions). Queue it only
        as a base revision if it returns "base", and queue it as a full
        revision otherwise.

        Not all base revs will be sent; base revs that have no child
        revs will not be sent.

        Always returns "normal" when not in continue mode.

    queue_rev
        Some revs can't be sent immediately. They get queued. Once
        queued, the revision may not be altered. All revisions must be
        queued before being sent. All revs from the source repository
        should be queued, --continue processing is automatic.
        Placeholders should be inserted for all branches, even empty
        ones.

        This updates last_rev and last_rev_for_filebranch.

        Returns FALSE if the rev cannot be queued, for instance if it's
        already been queued once.

        rev_mode() should be called before creating a rev, or at least
        before queue_rev()ing it in order to see if and in what form the
        rev should be sent.

    queued_rev
            $self->queued_rev( $id );

        Returns a queued rev by id.

        Sources where revs can arrive willy-nilly, like
        VCP::Source::revml, queue up all revs and need to randomly
        access them.

    last_rev_for_filebranch
            $self->last_rev_for_filebranch( $filebranch_id );

        Returns the last revision queued on the indicated filebranch.

    set_last_rev_in_filebranch_previous_id
            $self->set_last_rev_in_filebranch_previous_id( $r );

        If there is a last_rev_for_filebranch for $r->filebranch_id,
        sets its previous_id to point to $r. This is useful for sources
        which scan in most-recent-first order.

    queued_rev_count
        Returns (does not set) the number of revs queued so far.

        Replaces the deprecated function sent_rev_count().

    store_cached_revs
            $self->store_cached_revs;

        For parsers which read history one file at a time and branch in
        rev_id space, like VCP::Source::cvs, it's possible to flush all
        revs to disk after each file is parsed. This method takes the
        last VCP::Rev in each filebranch and stores it to disk, freeing
        memory.

    send_revs
            $self->send_revs;

        Removes and sends all revs accumulated so far. Called
        automatically after scan_metadata().

SUBCLASS OVERLOADS
    These methods should be overridded in any subclasses.

    scan_metadata
        This is called to scan the metadata for the source repository.
        It should call rev_mode() for each revision found (including any
        that need to be concocted to make up for collapsed metadata in
        the source, like VSS or CVS deletes or CVS branch creation) and
        if that returns TRUE, then queue_rev() should be called.

        If rev_mode() returns "base", then the transfer is in --continue
        mode and this rev should be built as or converted to a base
        revision. The easiest way to do this is to build it normally and
        then call $r->base_revify().

        If the metadata source returns metadata from most recent to
        oldest, as do most file history reports, the previous_id() need
        not be set until the next revision in a filebranch is scanned.
        The most recent rev passed to queue_rev() is available by
        calling last_rev(), if the metadata is one branch at a time, and
        the last rev in each filebranch is available by calling
        last_rev_for_filebranch().

        If the metadata is scanned one file or filebranch at a time and
        branched are all created by the time the end of a file's
        metadata arrives, calling store_cached_revs() will flush all
        queued revs from the last_rev() and last_rev_for_filebranch()
        in-memory caches to the disk cache (all other revs are flushed
        as their successors arrive).

        There is no easy way to handle randomly ordered metadata at this
        time, typically a source will accumulate as little as it can in
        memory and queue the rest. See VCP::Source::cvs for an example
        of this.

        Once scan_metadata() is complete, send_revs() will be called
        automatically.

    get_source_file
        REQUIRED OVERLOAD.

        All sources must provide a way for the destination to fetch a
        revision.

    handle_header
        REQUIRED OVERLOAD.

        Subclasses must add all repository-specific info to the $header,
        at least including rep_type and rep_desc.

           $header->{rep_type} => 'p4',
           $self->p4( ['info'], \$header->{rep_desc} ) ;

        The subclass must pass the $header on to the dest:

           $self->dest->handle_header( $header )
              if $self->dest;

        This may be called when dest is null to allow the source to
        initialize itself when it won't be scanning the real source. So
        the if $self->dest is important.

        That's not the case for copy_revs().

    handle_footer
        Not a required overload, as the footer carries no useful
        information at this time. Overriding methods must call this
        method to pass the $footer on:

           $self->SUPER::handle_footer( $footer ) ;

    parse_time
           $time = $self->parse_time( $timestr ) ;

        Parses "[cc]YY/MM/DD[ HH[:MM[:SS]]]".

        Will add ability to use format strings in future. HH, MM, and SS
        are assumed to be 0 if not present.

        Returns a time suitable for feeding to localtime or gmtime.

        Assumes local system time, so no good for parsing times in
        revml, but that's not a common thing to need to do, so it's in
        VCP::Source::revml.pm.

    bootstrap
        Sets (and parses) or gets the bootstrap spec.

        Can be called plain:

           $self->bootstrap( $bootstrap_spec ) ;

        See the command line documentation for the format of
        $bootstrap_spec.

    is_bootstrap_mode
           ... if $self->is_bootstrap_mode( $file ) ;

        Compares the filename passed in against the list of bootstrap
        regular expressions set by "bootstrap".

        The file should be in a format similar to the command line spec
        for whatever repository is passed in, and not relative to
        rev_root, so "//depot/foo/bar" for p4, or "module/foo/bar" for
        cvs.

        This is typically called in the subbase class only after looking
        at the revision number to see if it is a first revision (in
        which case the subclass should automatically put it in bootstrap
        mode).

COPYRIGHT
    Copyright 2000, Perforce Software, Inc. All Rights Reserved.

    This module and the VCP package are licensed according to the terms
    given in the file LICENSE accompanying this distribution, a copy of
    which is included in vcp.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>
TOPIC
########################################################################
'filter' => <<'TOPIC',
NAME
    VCP::Filter - A base class for filters

SYNOPSIS
       use VCP::Filter;
       @ISA = qw( VCP::Filter );
       ...

DESCRIPTION
    A VPC::Filter is a VCP::Plugin that is placed between the source and
    the destination and allows the stream of revisions to be altered.

    For instance, the Map: option in vcp files is implemented by
    VCP::Filter::Map

    By default a filter is a pass-through.

SUBCLASSING
    This class uses the fields pragma, so you'll need to use base and
    possibly fields in any subclasses.

    parse_rules_list
        Used in VCP::Filter::*map and VCP::Filter::*edit to parse lists
        of rules where every rule is a set of N "words". The value of N
        is computed from the number of labels passed in and the labels
        are used when printing an error message:

            @rules = $self->parse_rules( $options, "Pattern", "Replacement" );

    filter_name
        Returns the StudlyCaps version of the filter name. By default,
        assumes a single work name and uses ucfirst on it. Filters like
        StringEdit should overload this to be more creative and
        typgraphically appealing (heh).

    sort_keys
           my @output_sort_order = $filter->sort_keys( @input_sort_order );

        Accepts a list of sort keys from the upstream filter and returns
        a list of sort keys representing the order that records will be
        emitted in.

        This is a pass-through by default, but VCP::Filter::sort and
        VCP::Filter::changesets return appropriate values.

    config_file_section_as_string
    last_rev_in_filebranch
        (passthru; see VCP::Dest)

    backfill
        (passthru; see VCP::Dest)

    handle_header
        (passthru)

    rev_count
            $self->SUPER::rev_count( @_ );

        passthru, see VCP::Dest.

    handle_rev
            $self->SUPER::handle_rev( @_ );

        passthru, see VCP::Dest.

    skip_rev
            $self->SUPER::skip_rev( @_ );

        passthru, see VCP::Dest

    handle_footer
            $self->SUPER::handle_footer( @_ );

        passthru, see VCP::Dest

COPYRIGHT
    Copyright 2000, Perforce Software, Inc. All Rights Reserved.

    This module and the VCP package are licensed according to the terms
    given in the file LICENSE accompanying this distribution, a copy of
    which is included in vcp.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>
TOPIC
########################################################################
'defaultfilters' => <<'TOPIC',
NAME
    VCP::DefaultFilters - Class for determining default filters to
    install for a given source and dest.

SYNOPSIS
       require VCP::DefaultFilters;
       my $df = VCP::DefaultFilters->new;
       my @filter_args = $df->create_default_filters( $source, $dest );

DESCRIPTION
    Given references to a vcp source and destination, determines the
    default filters which would be appropriate, builds and returns a
    list of arguments that should look like the portion of @ARGV
    (command line arguments) that specify filters.

COPYRIGHT
    Copyright 2000, Perforce Software, Inc. All Rights Reserved.

    This module and the VCP package are licensed according to the terms
    given in the file LICENSE accompanying this distribution, a copy of
    which is included in vcp.
TOPIC
########################################################################
'dest' => <<'TOPIC',
NAME
    VCP::Dest - A base class for VCP destinations

SYNOPSIS
DESCRIPTION
  SUBCLASS API
    These methods are intended to support subclasses.

    digest
            $self->digest( "/tmp/readers" ) ;

        Returns the Base64 MD5 digest of the named file. Used to compare
        a base rev (which is the revision *before* the first one we want
        to transfer) of a file from the source repo to the existing head
        rev of a dest repo.

        The Base64 version is returned because that's what RevML uses
        and we might want to cross-check with a .revml file when
        debugging.

    compare_base_revs
           $self->compare_base_revs( $rev, $work_path ) ;

        Checks out the indicated revision from the destination
        repository and compares it (using digest()) to the file from the
        source repository (as indicated by $work_path). Dies with an
        error message if the base revisions do not match.

        Calls $self->checkout_file( $rev ), which the subclass must
        implement.

    header
        Gets/sets the $header data structure passed to handle_header().

    rev_map
        Returns a reference to the RevMapDB for this backend and
        repository. Creates an empty one if need be.

    head_revs
        Returns a reference to the HeadRevsDB for this backend and
        repository. Creates an empty one if need be.

    main_branch_id
        Returns a reference to the MainBranchIdDB for this backend and
        repository. Creates an empty one if need be.

    files
        Returns a reference to the FilesDB for this backend and
        repository. Creates an empty one if need be.

  SUBCLASS OVERLOADS
    These methods are overloaded by subclasses.

    backfill
           $dest->backfill( $rev ) ;

        Checks the file indicated by VCP::Rev $rev out of the target
        repository if this destination supports backfilling. Currently,
        only the revml and the reporting & debugging destinations do not
        support backfilling.

        The $rev->workpath must be set to the filename the backfill was
        put in.

        This is used when doing an incremental update, where the first
        revision of a file in the update is encoded as a delta from the
        prior version. A digest of the prior version is sent along
        before the first version delta to verify it's presence in the
        database.

        So, the source calls backfill(), which returns TRUE on success,
        FALSE if the destination doesn't support backfilling, and dies
        if there's an error in procuring the right revision.

        If FALSE is returned, then the revisions will be sent through
        with no working path, but will have a delta record.

        MUST BE OVERRIDDEN.

    sort_filter
            sub sort_filter {
               my $self = shift;
               my @sort_keys = @_;
               return () if @sort_keys && $sort_keys[0] eq "change_id";
               require VCP::Filter::changesets;
               return ( VCP::Filter::changesets->new(), );
            }

        This is passed a sort specification string and returns any
        filters needed to presort data for this destination. It may
        return the empty list (the default), or one or more instantiated
        filters.

    require_change_id_sort
        Destinations that care about the sort order usually want to use
        the changesets filter, so they can overload the sort filter like
        so:

           sub sort_filters { shift->require_change_id_sort( @_ ) }

    handle_footer
           $dest->handle_footer( $footer ) ;

        Does any cleanup necessary. Not required. Don't call this from
        the override.

    handle_header
           $dest->handle_header( $header ) ;

        Stows $header in $self->header. This should only rarely be
        overridden, since the first call to handle_rev() should output
        any header info.

    rev_count
           $dest->rev_count( $number_of_revs_forthcoming );

        Sent by the last aggregating plugin in the filter chain just
        before the first revision is sent to inform us of the number of
        revs to expect.

    skip_rev
        Sent by filters that discard revisions in line.

    handle_rev
           $dest->handle_rev( $rev ) ;

        Outputs the item referred to by VCP::Rev $rev. If this is the
        first call, then $self->none_seen will be TRUE and any preamble
        should be emitted.

        MUST BE OVERRIDDEN. Don't call this from the override.

    last_rev_in_filebranch
           my $rev_id = $dest->last_rev_in_filebranch(
              $source_repo_id,
              $source_filebranch_id
           );

        Returns the last revision for the file and branch indicated by
        $source_filebranch_id. This is used to support --continue.

        Returns undef if not found.

NOTES
    Several fields are jury rigged for "base revisions": these are fake
    revisions used to start off incremental, non-bootstrap transfers
    with the MD5 digest of the version that must be the last version in
    the target repository. Since these are "faked", they don't contain
    comments or timestamps, so the comment and timestamp fields are
    treated as "" and 0 by the sort routines.

COPYRIGHT
    Copyright 2000, Perforce Software, Inc. All Rights Reserved.

    This module and the VCP package are licensed according to the terms
    given in the file LICENSE accompanying this distribution, a copy of
    which is included in vcp.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>
TOPIC
########################################################################
'filter::logmemsize' => <<'TOPIC',
NAME
    VCP::Filter::logmemsize - developement logging filter

DESCRIPTION
    Watches memory size. Only works on linux for now.

    Not a supported module, API and behavior may change without warning.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::logmemsize usage' => <<'TOPIC',

TOPIC
########################################################################
'filter::logmemsize description' => <<'TOPIC',

Watches memory size. Only works on linux for now.

Not a supported module, API and behavior may change without
warning.
TOPIC
########################################################################
'filter::labelmap' => <<'TOPIC',
NAME
    VCP::Filter::labelmap - Alter or remove labels from each revision

SYNOPSIS
      ## From the command line:
       vcp <source> labelmap: "rev_$rev_id" "change_$change_id" -- <dest>

      ## In a .vcp file:

        LabelMap:
                foo-...   <<delete>> # remove all labels beginning with foo-
                F...R     <<delete>> # remove all labels F
                v-(...)   V-$1       # use uppercase v prefixes

DESCRIPTION
    Allows labels to be altered or removed using a syntax similar to
    VCP::Filter::map. This is being written for development use so more
    documentation is needed. See VCP::Filter::map for more examples of
    pattern matching (though VCP::Filter::labelmap does not use
    <branch_id> syntax).

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::labelmap usage' => <<'TOPIC',
Usage:
      ## From the command line:
       vcp <source> labelmap: "rev_$rev_id" "change_$change_id" -- <dest>

      ## In a .vcp file:

        LabelMap:
                foo-...   <<delete>> # remove all labels beginning with foo-
                F...R     <<delete>> # remove all labels F
                v-(...)   V-$1       # use uppercase v prefixes
TOPIC
########################################################################
'filter::labelmap description' => <<'TOPIC',

Allows labels to be altered or removed using a syntax
similar to VCP::Filter::map. This is being written for
development use so more documentation is needed. See
VCP::Filter::map for more examples of pattern matching
(though VCP::Filter::labelmap does not use <branch_id>
syntax).

test_script t/61labelmap.t
==========================
TOPIC
########################################################################
'filter::map' => <<'TOPIC',
NAME
    VCP::Filter::map - rewrite name, branch_id or delete revisions

SYNOPSIS
      ## In a .vcp file:

        Map:
                name_glob_1<branch_1> name_out_1<branch_result_1>
                name_glob_2<branch_2> name_out_2<branch_result_2>
                # ... etc ...

      ## From the command line:
       vcp <source> map: name_glob_1<branch_1> name_out_1<branch_result_1> -- <dest>

      ## you may have one or more ( pattern match ) pairs on the command
      ## line, ending with --

      ## the <branch> part of the maps is optional.

DESCRIPTION
    Maps source files, revisions, and branches to destination files and
    branches while copying a repository. This is done by rewriting the
    "name" and "branch_id" of revisions according to a list of rules.

  Rules
    A rule is a pair of expressions specifying a pattern to match
    against each incoming revision's name and branch_id and a
    replacement expression specifying the revision's new name and
    branch_id.

    The list of rules is evaluated top down; the first rule in the list
    that matches is used to generate the new name and branch_id. If no
    other rules match the implicit default rule is to copy files as is.

  Patterns and Replacement Expressions
    Patterns and replacements are each are composed of two
    subexpressions, the "name_expr" and the "branch_id_expr" like so:

        name_expr<branch_id_expr>

    The "<branch_id_expr>" (including angle brackets) is optional and
    may be forbidden by some sources or destinations that embed the
    concept of a branch in the name_expr. (See VCP::Dest::p4 for an
    example, though this may be changed in the future).

    For now, the symbols "#" and "@" are reserved for future used in all
    expressions and must be escaped using "\", and various shell-like
    wildcards are implemented in pattern expressions.

Pattern Expressions
    Both the "name_expr" and "branch_id_expr" specify patterns using
    shell regular expression syntax with the extension that parentheses
    are used to extract portions of the match in to numbered variables
    which may be used in the result construction, like Perl regular
    expressions:

       ?      Matches one character other than "/"
       *      Matches zero or more characters other than "/"
       ...    Matches zero or more characters, including "/"
       (foo)  Matches "foo" and stores it in the $1, $2, etc

    Some example pattern "name_expr"s are:

       Pattern
       name_expr  Matches
       =========  =======
       foo        the top level file "foo"
       foo/bar    the file "foo/bar"
       ...        all files (like a missing name_expr)
       foo/...    all files under "foo/"
       .../bar    all files named "bar" anywhere
       */bar      all files named "bar" one dir down
       ....pm     all files ending in ".pm"
       ?.pm       all top level 4 char files ending in ".pm"
       \?.pm      the top level file "?.pm"
       (*)/...    all files in subdirs, puts the top level dirname in $1

    Unix-style slashes are used, even on operating systems where that
    may not be the preferred local custom. A pattern consisting of the
    empty string is legal and matches everything (NOTE: currently there
    is no way to take advantage of this; quoting is not implemented in
    the forms parser yet. use "..." instead).

    Relative paths are taken relative to the rev_root indicated in the
    source specification for pattern "name_expr"s (or in the destination
    specification for result "name_expr"s). For now, a relative path is
    a path that does not begin with the character "/", so be aware that
    the pattern "(/)" is relative. This is a limitation of the
    implementation and may change, until it does, don't rely on a
    leading "(" making a path relative and use multiple rules to match
    multiple absolute paths.

    If no "name_expr" is provided, "..." is assumed and the pattern will
    match on all filenames.

    Some example pattern "branch_id_expr"s are:

        Pattern
        branch_id_expr  Matches files on
        =============   ================
        <>              no branch label
        <...>           all branches (like a missing <branch_id_expr>)
        <foo>           branch "foo"
        <R...>          branches beginning with "R"
        <R(...)>        branches beginning with "R", the other chars in $1

    If no "branch_id_expr" is provided, files on all branches are
    matched. "*" and "..." still match differently in pattern
    "branch_id_expr"s, as in <name_expr> patterns, but this is likely to
    make no difference, as I've not yet seen a branch label with a "/"
    in it. Still, it is wise to avoid "*" in "branch_id_expr" patterns.

    Some example composite patterns are (any $ variables set are given
    in parenthesis):

        Pattern            Matches
        =======            =======
        foo<>              top level files named "foo" not on a branch
        (...)<>            all files not on a branch ($1)
        (...)/(...)<>      all files not on a branch ($1,$2)
        ...<R1>            all files on branch "R1"
        .../foo<R...>      all files "foo" on branches beginning with "R"
        (...)/foo<R(...)>  all files "foo" on branches beginning with "R" ($1, $2)

  Escaping
    Null characters and newlines are forbidden in all expressions.

    The characters "#", "@", "[", "]", "{", "}", ">", "<" and "$" must
    be escaped using a "\", as must any wildcard characters meant to be
    taken literally.

    In result expressions, the wildcard characters "*", "?", the
    wildcard trigraph "..." and parentheses must each be escaped with
    single "\" as well.

    No other characters are to be escaped.

  Case sensitivity
    By default, all patterns are case sensitive. There is no way to
    override this at present; one will be added.

  Result Expressions
    Result expressions look a lot like patthern expressions except that
    wildcards are not allowed and $1 and "${1}" style variable
    interpolation is.

    To explore result expressions, let's look at converting set of
    example files between cvs and p4 repositories. The difficulty here
    is that cvs and p4 have differing branching implementations.

    Let's assume our CVS repository has a module named "flibble" with a
    file named "foo/bar" in it. Here is a branch diagram, with the main
    development trunk shown down the left (1.1 through 1.6, etc) and a
    single branch, tagged in CVS with a branch tag of "beta_1", is shown
    forking off version 1.5:

         flibble/foo/bar:

             1.1
              |
             ...
              |
             1.5
              | \
              |  \ beta_1
              |   \
             1.6   \
              |    1.5.2.1
             ...    |
                    |
                   1.5.2.2
                    |
                   ...

        NOTE 1: You can use C<vcp> to extract graphical branch diagrams by
        installing AT&T's GraphViz package and the Perl CPAN module
        GraphViz.pm.  Then you can use a command like:

            $ vcp cvs:/var/cvsroot:flibble/foo/bar \
                branch_diagram:foo_bar.png

        to generate a .png file showing something like the above diagram.

    On the other hand, p4 users typically branch files using directory
    names. Here's file "foo/bar" again, with the main trunk held in the
    main depot's //depot/main directory, again with a branch after the
    5th version of the file, but this time, the branch is represented by
    taking a copy

        //depot/main/foo/bar

             #1
              |
             ...
              |
             #5
              |\
              | \ //depot/beta_1/foo/bar
              |  \
             #6   \
              |   #1
             ...   |
                   |
                  #2
                   |
                  ...
          
        NOTE 2: the p4 command allows users to branch in very crafty and
        creative ways; it does not enforce the semantic of 1 branch per
        directory, and this gives p4 users a lot of power and flexibility.
        It also means that you might need some pretty crafty and creative
        branch maps when converting from p4 to other repositories.

        NOTE 3: that branch looks like a copy, but is actually just a
        metadata entry in the perforce repository, so it's very low
        overhead in terms of server effort and disk space, usually
        even more so than CVS branches.

        NOTE 4: Using GraphViz (as described in NOTE 1 above), you can
        build a diagram like this using vcp:

            $ vcp p4:perforce.our.com:1666://depot/flibble/foo/bar \
                branch_diagram:foo_bar.png

    A user may or may not choose to label a branch in p4 with something
    called a "branch specification" (see "p4 help branch" for details).
    For this discussion, we'll assume they didn't.

    First, let's look at cvs -> p4 conversion. To do this, we need to
    match the branch tags in the CVS repository and use them to map
    branched files in to a p4 subdirectory. Here's .vcp file for this:

       ## cvs2p4.vcp

       Source:
       # get all files in the flibble module from cvs
           cvs:/var/cvsroot:flibble/...

       Destination:
       # Put the files in the flibble directory in the main depot of p4
           p4:perforce.our.com:1666://depot/flibble/...

       Map:
       #   Pattern       Result
       #   ============  =======
           (...)<>       main/$1   # main trunk => //depot/flibble/main/...
           (...)<(...)>  $2/$1     # branches   => //depot/flibble/$branch/...

    The "Source:" and "Destination:" fields are just pieces of a normal
    "vcp" command line moved in to "cvs2p4.vcp". The "Map:" field is a
    list of rules composed of pattern, result expression pairs.

    In this example, all of the map expressions are relative paths. The
    patterns are relative to the "Source:" cvs repositories' ""flibble""
    module. The results are relative to the "Destination:" p4
    repositories' ""//depot/flibble/"" directory.

    The first rule maps all files that have no branch tag in to the p4
    directory "//depot/flibble/main/". The "(...)<>" pattern has two
    parts: a "name" part and a "branch_id" part. The "name" part,
    "(...)", matches all path names and copies them to the $1 variable.
    The "branch_id" part, " <" >, matches empty / missing "branch_id"s
    ("vcp"'s name for the CVS branch tag associated with a file on a
    branch). The " main/$1 " result retrieves the "name" part stored in
    $1 and prefixes it with ""main/"" to build the final "name" value.

    The second rule maps all files on branches to an appropriately named
    subdirectory in the p4 destination. The pattern is a lot like the
    first rule's, but has a "branch_id" part that matches all
    "branch_id"s and copies them in to $2. The rule merely uses this
    "branch_id" from $2 instead of the hardcoded ""main/"" string to
    place the branches in appropriate subdirectories.

    Here's how our flibble/foo/bar file version fare when passed through
    this mapping:

        CVS flibble/...              p4 //depot/flibble/...
        ========================     ======================

        foo/bar#1.1                  main/foo/bar#1
        foo/bar#1.2                  main/foo/bar#2
        ...                          ...
        foo/bar#1.5.2.1              beta_1/foo/bar#1
        foo/bar#1.5.2.2              beta_1/foo/bar#2
        ...                          ...

    It's up to you to be sure there are no branches tagged ""main"" in
    the CVS repository. Also, no branch specification will be created in
    the target p4 repository (this is a limitation that should be
    fixed).

  Result Actions: <<delete>> and <<keep>>
    The result expression "<<delete>>" indicates to delete the revision,
    while the result expression "<<keep>>" indicates to pass it through
    unchanged:

        Map:
        #   Pattern            Result
        #   =================  ==========
            old_stuff/...      <<delete>>  # Delete all files in /old
            old_stuff/.../*.c  <<keep>>    # except these

    <<delete>> and <<keep>> may not appear in results; they are
    standalone tokens.

  The default rule
    There is a default rule

        ...  <<keep>>  ## Default rule: passes everything through as-is

    that is evaluated after all the other rules. Thus, if no other rule
    matches a revision, it is passed through unchanged.

  Command Line Parsing
    For large maps or repeated use, the map is best specified in a .vcp
    file. For quick one-offs or scripted situations, however, the map:
    scheme may be used on the command line. In this case, each parameter
    is a "word" (separated by whitespace) and every pair of words is a (
    pattern, result ) pair.

    Because vcp command line parsing is performed incrementally and the
    next filter or destination specifications can look exactly like a
    pattern or result, the special token "--" is used to terminate the
    list of patterns provided on the command line. This may also be the
    last word in the "Map:" section of a .vcp file, but that is
    superfluous. It is an error to use "--" before the last word in a
    .vcp file.

LIMITATIONS
    There is no way (yet) of telling the mapper to continue processing
    the rules list. We could implement labels like "<label>" to be
    allowed before pattern expressions (but not between pattern and
    result), and we could then impelement "<goto label>". And a "<next>"
    could be used to fall through to the next label. All of which is
    wonderful, but I want to gain some real world experience with the
    current system and find a use case for gotos and fallthroughs before
    I implement them. This comment is here to solicit feedback :).

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::map usage' => <<'TOPIC',
Usage:
      ## In a .vcp file:

        Map:
                name_glob_1<branch_1> name_out_1<branch_result_1>
                name_glob_2<branch_2> name_out_2<branch_result_2>
                # ... etc ...

      ## From the command line:
       vcp <source> map: name_glob_1<branch_1> name_out_1<branch_result_1> -- <dest>

      ## you may have one or more ( pattern match ) pairs on the command
      ## line, ending with --

      ## the <branch> part of the maps is optional.
TOPIC
########################################################################
'filter::map description' => <<'TOPIC',

Maps source files, revisions, and branches to destination
files and branches while copying a repository. This is done
by rewriting the "name" and "branch_id" of revisions
according to a list of rules.

Rules
=====

A rule is a pair of expressions specifying a pattern to
match against each incoming revision's name and branch_id
and a replacement expression specifying the revision's new
name and branch_id.

The list of rules is evaluated top down; the first rule in
the list that matches is used to generate the new name and
branch_id. If no other rules match the implicit default
rule is to copy files as is.

Patterns and Replacement Expressions
====================================

Patterns and replacements are each are composed of two
subexpressions, the "name_expr" and the "branch_id_expr"
like so:

    name_expr<branch_id_expr>

The "<branch_id_expr>" (including angle brackets) is
optional and may be forbidden by some sources or
destinations that embed the concept of a branch in the
name_expr. (See VCP::Dest::p4 for an example, though this
may be changed in the future).

For now, the symbols "#" and "@" are reserved for future
used in all expressions and must be escaped using "\", and
various shell-like wildcards are implemented in pattern
expressions.
TOPIC
########################################################################
'filter::dumpdata' => <<'TOPIC',
NAME
    VCP::Filter::dumpdata - developement output filter

DESCRIPTION
    Dump all data structures. Requires the module BFD, which is not
    installed automatically. Dumps to the log file.

    Not a supported module, API and behavior may change without warning.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::dumpdata usage' => <<'TOPIC',

TOPIC
########################################################################
'filter::dumpdata description' => <<'TOPIC',

Dump all data structures. Requires the module BFD, which is
not installed automatically. Dumps to the log file.

Not a supported module, API and behavior may change without
warning.
TOPIC
########################################################################
'filter::sort' => <<'TOPIC',
NAME
    VCP::Filter::sort - Sort revs by field, order

SYNOPSIS
      ## From the command line:
       vcp <source> sort: name ascending rev_id ascending -- <dest>

      ## In a .vcp file:

        Sort:
           name     ascending
           rev_id   ascending

DESCRIPTION
    NOTE: this filter is primarily for development and testing, it is
    not designed for large datasets (it can use a lot of RAM if fed
    enough data).

    Useful with the revml: destination to get RevML output in a desired
    order. Otherwise the sorting built in to the change aggregator
    should suffice.

    The default sort spec is "name,rev_id" which is what is handy to
    VCP's test suite as it puts all revisions in a predictable order so
    the output revml can be compared to the input revml.

    NOTE: this is primarily for development use; not all fields may work
    right. All plain string fields should work right as well as name,
    rev_id, change_id and their source_... equivalents (which are parsed
    and compared piece-wise) and time, and mod_tome (which are stored as
    integers internally).

    Plain case sensitive string comparison is used for all fields other
    than those mentioned in the preceding paragraphs.

    This sort may be slow for extremely large data sets; it sorts things
    by comparing revs to eachother field by field instead of by
    generating indexes and VCP::Rev is not designed to be super fast
    when accessing fields one by one. This can be altered if need be.

How rev_id and change_id are sorted
    "change_id" or "rev_id" are split in to segments suitable for
    sorting.

    The splits occur at the following points:

       1. Before and after each substring of consecutive digits
       2. Before and after each substring of consecutive letters
       3. Before and after each non-alpha-numeric character

    The substrings are greedy: each is as long as possible and
    non-alphanumeric characters are discarded. So "11..22aa33" is split
    in to 5 segments: ( 11, "", 22, "aa", 33 ).

    If a segment is numeric, it is left padded with 10 NUL characters.

    This algorithm makes 1.52 be treated like revision 1, minor revision
    52, not like a floating point 1.52. So the following sort order is
    maintained:

       1.0
       1.0b1
       1.0b2
       1.0b10
       1.0c
       1.1
       1.2
       1.10
       1.11
       1.12

    The substring "pre" might be treated specially at some point.

    (At least) the following cases are not handled by this algorithm:

       1. floating point rev_ids: 1.0, 1.1, 1.11, 1.12, 1.2
       2. letters as "prereleases": 1.0a, 1.0b, 1.0, 1.1a, 1.1

LIMITATIONS
    Stores all metadata in RAM.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::sort usage' => <<'TOPIC',
Usage:
      ## From the command line:
       vcp <source> sort: name ascending rev_id ascending -- <dest>

      ## In a .vcp file:

        Sort:
           name     ascending
           rev_id   ascending
TOPIC
########################################################################
'filter::sort description' => <<'TOPIC',

NOTE: this filter is primarily for development and testing,
it is not designed for large datasets (it can use a lot of
RAM if fed enough data).

Useful with the revml: destination to get RevML output in a
desired order. Otherwise the sorting built in to the change
aggregator should suffice.

The default sort spec is "name,rev_id" which is what is
handy to VCP's test suite as it puts all revisions in a
predictable order so the output revml can be compared to
the input revml.

NOTE: this is primarily for development use; not all fields
may work right. All plain string fields should work right
as well as name, rev_id, change_id and their source_...
equivalents (which are parsed and compared piece-wise) and
time, and mod_tome (which are stored as integers
internally).

Plain case sensitive string comparison is used for all
fields other than those mentioned in the preceding
paragraphs.

This sort may be slow for extremely large data sets; it
sorts things by comparing revs to eachother field by field
instead of by generating indexes and VCP::Rev is not
designed to be super fast when accessing fields one by one.
This can be altered if need be.
TOPIC
########################################################################
'filter::csv_trace' => <<'TOPIC',
NAME
    VCP::Filter::csv_trace - developement logging filter

DESCRIPTION
    Dumps fields of revisions in CSV format.

    Not a supported module, API and behavior may change without warning.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::csv_trace usage' => <<'TOPIC',

TOPIC
########################################################################
'filter::csv_trace description' => <<'TOPIC',

Dumps fields of revisions in CSV format.

Not a supported module, API and behavior may change without
warning.
TOPIC
########################################################################
'filter::addlabels' => <<'TOPIC',
NAME
    VCP::Filter::addlabels - Add labels to each revision

SYNOPSIS
      ## From the command line:
       vcp <source> addlabels: "rev_$rev_id" "change_$change_id" -- <dest>

      ## In a .vcp file:

        AddLabels:
                rev_$rev_id
                change_$change_id
                # ... etc ...

DESCRIPTION
    Used when you want to track the original rev_id, change_id,
    branch_id, etc. each revision had in the source repository by adding
    a label. Can be used to turn any piece of metadata in to a label.

    Note that the fields

        source_name, source_filebranch_id, source_branch_id,
        source_rev_id, source_change_id

    are set by VCP to be the same value as the corresponding fields
    without the source prefix (except source_filebranch_id, which is
    built from the file name, rooted in the repository, and for cvs
    repositories, the branch number in angle brackets.) These source_*
    fields (intended to be immutable in vcp) should be used to make
    labels rather than their mutable equivalents which may be changed
    via a vcp filter.

    There is no way to add labels only to selected revisions at this
    time, but if you try to add a label for metadata that is undefined
    or empty, it will not be added.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::addlabels usage' => <<'TOPIC',
Usage:
      ## From the command line:
       vcp <source> addlabels: "rev_$rev_id" "change_$change_id" -- <dest>

      ## In a .vcp file:

        AddLabels:
                rev_$rev_id
                change_$change_id
                # ... etc ...
TOPIC
########################################################################
'filter::addlabels description' => <<'TOPIC',

Used when you want to track the original rev_id, change_id,
branch_id, etc. each revision had in the source repository
by adding a label. Can be used to turn any piece of
metadata in to a label.

Note that the fields

    source_name, source_filebranch_id, source_branch_id,
    source_rev_id, source_change_id

are set by VCP to be the same value as the corresponding
fields without the source prefix (except
source_filebranch_id, which is built from the file name,
rooted in the repository, and for cvs repositories, the
branch number in angle brackets.) These source_* fields
(intended to be immutable in vcp) should be used to make
labels rather than their mutable equivalents which may be
changed via a vcp filter.

There is no way to add labels only to selected revisions at
this time, but if you try to add a label for metadata that
is undefined or empty, it will not be added.

test_script t/61addlabels.t
===========================
TOPIC
########################################################################
'filter::changesets' => <<'TOPIC',
NAME
    VCP::Filter::changesets - Group revs in to changesets

SYNOPSIS
      ## From the command line:
       vcp <source> changesets: ...options... -- <dest>

      ## In a .vcp file:

        ChangeSets:
           time                     <=60     ## seconds
           user_id                  equal    ## case-sensitive equality
           comment                  equal    ## case-sensitive equality
           source_filebranch_id     notequal ## case-sensitive inequality

DESCRIPTION
    This filter is automatically loaded when there is no sort filter
    loaded (both this and VCP::Filter::sort count as sort filters).

  Sorting by change_id, etc.
    When all revs from the source have change numbers, this filter sorts
    by change_id, branch_id, and name, regardless of the rules set. The
    name sort is case sensitive, though it should not be for Win32. This
    sort by change_id is necessary for sources that supply change_id
    because the order of scanning the revisions is not usually (ever, so
    far :) in change set order.

  Aggregating changes
    If one or more revisions arrives from the source with an empty
    change_id, the rules for this filter establish the conditions that
    determine what revisions may be grouped in to each change.

    In this case, this filter rewrites all change_id fields so that the
    (eventual) destination can use the change_id field to break the
    revisions in to changes. This is sometimes used by non-changeset
    oriented destinations to aggregate "changes" as though a user were
    performing them and to reduce the number of individual operations
    the destination driver must perform (for instance: VCP::Dest::cvs
    prefers to not call cvs commit all the time; cvs commit is slow).

    Revisions are aggregated in to changes using a set of rules that
    determine what revisions may be combined. One rule is implicit in
    the algorithm, the others are explicitly specified as a set of
    defaults that may be altered by the user.

   The Implicit Rule
    The implicit rule is that no change may contain two revisions where
    one is a descendant of another. The algorithm starts with the set of
    revisions that have no parents in this transfer, chooses a set of
    them to be a change according to the explicit conditions, and emits
    it. Only when a revision is emitted does this filter consider it's
    offspring for emission. This cannot be changed.

    (EXPERIMENTAL) The only time this implicit rule is not enough is in
    a cloning situation. In CVS and VSS, it is possible to "share" files
    between branches. VSS supports and promotes this model in its user
    interface and documentation while CVS allows it more subtlely by
    allowing the same branch to have multiple branch tags. In either
    case, there are multiple branches of a file that are changed
    simultaneously. The CVS source recognizes this (and the VSS source
    may by the time you read this) and chooses a master revision from
    which to "clone" other revisions. These cloned revisions appear on
    the child branch as children of the master revision, not as children
    of the preceding revision on the child branch. This is confusing,
    but it works. In order to prevent this from confusing the
    destinations, however, it can be important to make sure that two
    revisions to a given branch of a given file do not occur in the same
    revision; this is the purpose of the explicit rule
    "source_filebranch_id notequal", covered below.

   The Explicit Rules
    Rules may be specified for the ChangeSets filter. If no rules are
    specified, a set of default rules are used. If any rules are
    specified, none of the default rules are used. The default rules are
    explained after rule conditions are explained.

    Each rule is a pair of words: a data field and a condition.

    There are three conditions: "notequal", "equal" and "<=N" (where N
    is a number; note that no spaces are allowed before the number
    unless the spec is quoted somehow):

    equal
        The "equal" condition is valid for all fields and states that
        all revisions in the same change must have identical values for
        the indicated field. So:

            user_id                  equal

        states that all revisions in a change must be submitted by the
        same user.

        All "equal" conditions are used before any other conditions,
        regardless of the order they are specified in to categorize
        revisions in to prototype changes. Once all revisions have been
        categorized in to prototyps changes, the "<=N" and "notequal"
        rules are applied in order to split the change prototypes in to
        as many changes as are needed to satisfy them.

    notequal
        The "notequal" condition is also valid for all fields and
        specifies that no two revisions in a change may have equal
        values for a field. It does not make sense to apply this to time
        fields, and is usually only needed to ensure that two revisions
        to the same file on the same branch do not get bundled in to the
        same change.

    <=N The "<=N" specification is only available for the "time" field.
        It specifices that no gaps larger than N seconds may exist in a
        change.

    The default rules are:

        time                     <=60     ## seconds
        user_id                  equal    ## case-sensitive equality
        comment                  equal    ## case-sensitive equality
        source_filebranch_id     notequal ## case-sensitive inequality

    These rules

    The "time <=60" condition sets a maximum allowable difference
    between two revisions; revisions that are more than this number of
    seconds apart are considered to be in different changes.

    The "user_id equal" and "comment equal" conditions assert that two
    revisions must be by the same user and have the same comment in
    order to be in the same change.

    The "source_filebranch_id notequal" condition prevents cloned revs
    of a file from appearing in the same change as eachother (see the
    discussion above for more details).

ALGORITHM
  handle_rev()
    As revs are received by handle_rev(), they are store on disk.
    Several RAM-efficient (well, for Perl) data structures are built,
    however, that describe each revision's children and its membership
    in a changeset. Some or all of these structures may be moved to disk
    when we need to handly truly large data sets.

   The ALL_HAVE_CHANGE_IDS statistic
    One statistic that handle_rev() gathers is whether or not all
    revisions arrived with a non-empty change_id field.

   The REV_COUNT statistic
    How many revisions have been recieved. This is used only for UI
    feedback; primarily it is to forewarn the downstream filter(s) and
    destination of how many revisions will constitute a 100% complete
    transfer.

   The CHANGES list
    As each rev arrives, it is placed in a "protochange" determined
    solely by the revision's fields in the rules list with an "equal"
    condition. Protochanges are likely to have too many revisions in
    them, including revisions that descend from one another and
    revisions that are too far apart in time.

   The CHANGES_BY_KEY index
    The categorization of each revision in to changes is done by forming
    a key string from all the fields in the rules list with the "equal"
    condition. This index maps unique keys to changes.

   The CHILDREN index
    This is an index of all revisions that are direct offspring of a
    revision.

   the PREDECESSOR_COUNT statistic
    Counts the number of parents a revision has that haven't been
    submitted yet. A revision may have a previous_id and, optionally,
    also have a from_id (can't have a from_id without a previous_id,
    however).

   The REVS_BY_CHANGE_ID index
    If all revs do indeed arrive with change_ids, they need to be sorted
    and sent out in order. This index is gathered until the first rev
    with an empty change_id arrives.

   The ROOT_IDS list
    This is a list of the IDs of all revisions that have no parent
    revisions in this transfer. This is used as the starting point for
    send_changes(), below.

   The CHANGES_BY_REV index
    As the large protochanges are split in to smaller ones, the
    resulting CHANGES list is indexed by, among other things, which revs
    are in the change. This is so the algorithms can quickly find what
    change a revision is in when it's time to consider sending that
    revision.

  handle_footer()
    All the real work occurs when handle_footer() is called.
    handle_footer() glances at the change_id statistic gathered by
    handle_rev() and determines whether it can sort by change_id or
    whether it has to perform change aggregation.

    If all revisions arrive with a change_id,
    sort_by_change_id_and_send() If at least one revision didn't
    handle_footer() decides to perform change aggregation by calling
    split_protochanges() and then send_changes().

    Any source or upstream filter may perform change aggregation by
    assigning change_ids to all revisions. VCP::Source::p4 does this. At
    the time of this writing no otherd do.

    Likewise, a filter like VCP::Filter::StringEdit may be used to clear
    out all the change_ids and force change aggregation.

  sort_by_change_id_and_send()
    If all revisions arrived with a change_id, then they will be sorted
    by the values of ( change_id, time, branch_id, name ) and sent on.
    There is no provision in this filter for ignoring change_id other
    than if any revisions arrive with an empty change_id, this sort is
    not done.

  split_and_send_changes()
    Once all revisions have been placed in to protochanges, a change is
    selected and sent like so:

    1   Get an oldest change with no revs that can't yet be sent. If
        none is found, then select one oldest change and remove any revs
        that can't be sent yet.

    2   Select as many revs as can legally be sent in a change by
        sorting them in to time order and then using the <=N and
        notequal rules to determine if each rev can be sent given the
        revs that have already passed the rules. Delay all other revs
        for a later change.

LIMITATIONS
    This filter does not take the source_repo_id in to account: if
    somehow you are merging multiple repositories in to one and want to
    interleave the commits/submits "properly", ask for advice.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::changesets usage' => <<'TOPIC',
Usage:
      ## From the command line:
       vcp <source> changesets: ...options... -- <dest>

      ## In a .vcp file:

        ChangeSets:
           time                     <=60     ## seconds
           user_id                  equal    ## case-sensitive equality
           comment                  equal    ## case-sensitive equality
           source_filebranch_id     notequal ## case-sensitive inequality
TOPIC
########################################################################
'filter::changesets description' => <<'TOPIC',

This filter is automatically loaded when there is no sort
filter loaded (both this and VCP::Filter::sort count as
sort filters).

Sorting by change_id, etc.
==========================

When all revs from the source have change numbers, this
filter sorts by change_id, branch_id, and name, regardless
of the rules set. The name sort is case sensitive, though
it should not be for Win32. This sort by change_id is
necessary for sources that supply change_id because the
order of scanning the revisions is not usually (ever, so
far :) in change set order.

Aggregating changes
===================

If one or more revisions arrives from the source with an
empty change_id, the rules for this filter establish the
conditions that determine what revisions may be grouped in
to each change.

In this case, this filter rewrites all change_id fields so
that the (eventual) destination can use the change_id field
to break the revisions in to changes. This is sometimes
used by non-changeset oriented destinations to aggregate
"changes" as though a user were performing them and to
reduce the number of individual operations the destination
driver must perform (for instance: VCP::Dest::cvs prefers
to not call cvs commit all the time; cvs commit is slow).

Revisions are aggregated in to changes using a set of rules
that determine what revisions may be combined. One rule is
implicit in the algorithm, the others are explicitly
specified as a set of defaults that may be altered by the
user.

The Implicit Rule
=================

The implicit rule is that no change may contain two
revisions where one is a descendant of another. The
algorithm starts with the set of revisions that have no
parents in this transfer, chooses a set of them to be a
change according to the explicit conditions, and emits it.
Only when a revision is emitted does this filter consider
it's offspring for emission. This cannot be changed.

(EXPERIMENTAL) The only time this implicit rule is not
enough is in a cloning situation. In CVS and VSS, it is
possible to "share" files between branches. VSS supports
and promotes this model in its user interface and
documentation while CVS allows it more subtlely by allowing
the same branch to have multiple branch tags. In either
case, there are multiple branches of a file that are
changed simultaneously. The CVS source recognizes this (and
the VSS source may by the time you read this) and chooses a
master revision from which to "clone" other revisions.
These cloned revisions appear on the child branch as
children of the master revision, not as children of the
preceding revision on the child branch. This is confusing,
but it works. In order to prevent this from confusing the
destinations, however, it can be important to make sure
that two revisions to a given branch of a given file do not
occur in the same revision; this is the purpose of the
explicit rule "source_filebranch_id notequal", covered
below.

The Explicit Rules
==================

Rules may be specified for the ChangeSets filter. If no
rules are specified, a set of default rules are used. If
any rules are specified, none of the default rules are
used. The default rules are explained after rule conditions
are explained.

Each rule is a pair of words: a data field and a condition.

There are three conditions: "notequal", "equal" and "<=N"
(where N is a number; note that no spaces are allowed
before the number unless the spec is quoted somehow):




equal
=====

The "equal" condition is valid for all fields and states
that all revisions in the same change must have identical
values for the indicated field. So:

    user_id                  equal

states that all revisions in a change must be submitted by
the same user.

All "equal" conditions are used before any other
conditions, regardless of the order they are specified in
to categorize revisions in to prototype changes. Once all
revisions have been categorized in to prototyps changes,
the "<=N" and "notequal" rules are applied in order to
split the change prototypes in to as many changes as are
needed to satisfy them.

notequal
========

The "notequal" condition is also valid for all fields and
specifies that no two revisions in a change may have equal
values for a field. It does not make sense to apply this to
time fields, and is usually only needed to ensure that two
revisions to the same file on the same branch do not get
bundled in to the same change.

<=N
===

The "<=N" specification is only available for the "time"
field. It specifices that no gaps larger than N seconds may
exist in a change.




The default rules are:

    time                     <=60     ## seconds
    user_id                  equal    ## case-sensitive equality
    comment                  equal    ## case-sensitive equality
    source_filebranch_id     notequal ## case-sensitive inequality

These rules

The "time <=60" condition sets a maximum allowable
difference between two revisions; revisions that are more
than this number of seconds apart are considered to be in
different changes.

The "user_id equal" and "comment equal" conditions assert
that two revisions must be by the same user and have the
same comment in order to be in the same change.

foo
===



The "branched_rev_branch_id equal" condition is a special
case to handle repositories like CVS which don't record
branch creation times. This condition kicks in when a user
creates several branches before changing any files on any
of them; in this case all of the branches get created at
the same time. That leaves odd looking conversions. This
condition also kicks in when multiple CVS branches exist
with no changes on them. In this case, VCP::Source::cvs
groups all of the branch creations after the last "real"
edit. In both cases, the changeset filter splits branch
creations so that only one branch is created per change.

The "branched_rev_branch_id" condition only applies to
revisions branching from one branch in to another.

foo
===

The "source_filebranch_id notequal" condition prevents
cloned revs of a file from appearing in the same change as
eachother (see the discussion above for more details).
TOPIC
########################################################################
'filter::stringedit' => <<'TOPIC',
NAME
    VCP::Filter::stringedit - alter any field character by character

SYNOPSIS
        StringEdit:
            ## Convert illegal p4 characters to ^NN hex escapes and the
            ## p4 wildcard "..." to a safe string.  The "^" is not an illegal
            ## char, it's replaced with an escape to allow us to use it as
            ## an escape character without the (extremely small) risk of
            ## running across a file name that actually uses it.
            ## Order is significant in this ruleset.
            # field(s)    match          replacement
            name,labels    /([\s@#*%^])/    ^%02x
            name,labels    "..."            ^___

        StringEdit:
            ## underscorify each unwanted character to a single "_"
            name,labels    /[\s@#*%^]/  _

        StringEdit:
            ## underscorify each run of unwanted characters to a single "_"
            name,labels    /[\s@#*%^]*/  _

        StringEdit:
            ## prefix labels that don't start with a letter or underscore:
            labels         /([^a-zA-Z_])/   _%c

DESCRIPTION
    Allows field by field string editing, using Perl regular expressions
    to match characters and substrings and sprintf-like replacement
    strings.

  Rules
    A rule is a triplet of expressions specifying a (1) set of fields to
    match, (2) a pattern to match against those fields' contents
    (matching contents are removed), and (3) a string to replace each of
    the removed bits with.

    NOTE 1: the "match" expression uses perl5 regular expressions, not
    filename wildcards used in most other places in VCP configurations.

    The list of rules is evaluated top down and all rules are applied to
    each string.

    NOTE 2: The all-rules-apply nature of this filter is different from
    the behaviors of the ...Map: filters, which stop after the first
    matching rule. This is because ...Map: filters are rewriting entire
    strings and there can be only one result string, while the
    StringEdit filter may be rewriting pieces of string and multiple
    rewrites may be combined to good effect.

  The Fields List
    A comma separated list of field names. Any field may be edited
    except those that begin with "source_".

  The Match Expression
    For each field, the match expression is run against the field and,
    if it matches, causes all matching portions of string to be
    replaced.

    The match expression is a full perl5 regular expression enclosed in
    /.../ delimiters or a plain string, either of which may be enclosed
    in '' or "" delimiters if inline spaces are needed (rare, we hope).

  The Replacement Expression
    Each match is replaced by one instance of the replacement
    expression, optionally enclosed in single or double quotation marks.

    The replacement expression provides a limited list of C sprintf
    style macros:

        %d      The decimal codes for each character in the match
        %o      The octal codes for each character in the match
        %x      The hex codes for each character in the match

    Any non-letter preceded by a backslash "\" character is replaced by
    itself. Some more or less useful examples:

        \% \\ \" \' \` \{ \} \$ \* \+ \? \1

    If a punctuation character other than a period (.) or slash "/"
    follows a letter macro, it must be escaped using the backslash
    character (this is to reserve room in the spec for postfix modifiers
    like "*", "+", and "?"). So, to put a literal star (*) after a hex
    code, you would do something like "%02x\*".

    The "normal" perl5 letter abbreviations are also allowed:

               \t          tab             (HT, TAB)
               \n          newline         (NL)
               \r          return          (CR)
               \f          form feed       (FF)
               \b          backspace       (BS)
               \a          alarm (bell)    (BEL)
               \e          escape          (ESC)
               \033        octal char      (ESC)
               \x1b        hex char        (ESC)
               \x{263a}    wide hex char   (SMILEY)
               \c[         control char    (ESC)
               \N{name}    named Unicode character

    including the following escape sequences are available in constructs
    that modify what follows:

               \l          lowercase next char
               \u          uppercase next char
               \L          lowercase till \E
               \U          uppercase till \E
               \E          end case modification
               \Q          quote non-word characters till \E

    As shown above, normal sprintf-style options may be included (and
    are recommended), so %02x produces results like "%09" (if the match
    was a single TAB character) or "%20" (if the match was a SPACE
    character). The dot precision modifiers (".3") are not supported,
    just the leading 0 and the field width specifier.

  Case sensitivity
    By default, all patterns are case sensitive. There is no way to
    override this at present; one will be added.

  Command Line Parsing
    For large stringedits or repeated use, the stringedit is best
    specified in a .vcp file. For quick one-offs or scripted situations,
    however, the stringedit: scheme may be used on the command line. In
    this case, each parameter is a "word" and every triple of words is a
    ( pattern, result ) pair.

    Because vcp command line parsing is performed incrementally and the
    next filter or destination specifications can look exactly like a
    pattern or result, the special token "--" is used to terminate the
    list of patterns if StringEdit: is used on the command line. This
    may also be the last word in the "StringEdit:" section of a .vcp
    file, but that is superfluous. It is an error to use "--" before the
    last word in a .vcp file.

LIMITATIONS
    There is no way (yet) of telling the stringeditor to continue
    processing the rules list. We could implement labels like "
    <<*label*"> > to be allowed before pattern expressions (but not
    between pattern and result), and we could then impelement " <<goto
    *label*"> >. And a " <<next"> > could be used to fall through to the
    next label. All of which is wonderful, but I want to gain some real
    world experience with the current system and find a use case for
    gotos and fallthroughs before I implement them. This comment is here
    to solicit feedback :).

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::stringedit usage' => <<'TOPIC',
Usage:
        StringEdit:
            ## Convert illegal p4 characters to ^NN hex escapes and the
            ## p4 wildcard "..." to a safe string.  The "^" is not an illegal
            ## char, it's replaced with an escape to allow us to use it as
            ## an escape character without the (extremely small) risk of
            ## running across a file name that actually uses it.
            ## Order is significant in this ruleset.
            # field(s)    match          replacement
            name,labels    /([\s@#*%^])/    ^%02x
            name,labels    "..."            ^___

        StringEdit:
            ## underscorify each unwanted character to a single "_"
            name,labels    /[\s@#*%^]/  _

        StringEdit:
            ## underscorify each run of unwanted characters to a single "_"
            name,labels    /[\s@#*%^]*/  _

        StringEdit:
            ## prefix labels that don't start with a letter or underscore:
            labels         /([^a-zA-Z_])/   _%c
TOPIC
########################################################################
'filter::stringedit description' => <<'TOPIC',

Allows field by field string editing, using Perl regular
expressions to match characters and substrings and
sprintf-like replacement strings.

Rules
=====

A rule is a triplet of expressions specifying a (1) set of
fields to match, (2) a pattern to match against those
fields' contents (matching contents are removed), and (3) a
string to replace each of the removed bits with.

NOTE 1: the "match" expression uses perl5 regular
expressions, not filename wildcards used in most other
places in VCP configurations.

The list of rules is evaluated top down and all rules are
applied to each string.

NOTE 2: The all-rules-apply nature of this filter is
different from the behaviors of the ...Map: filters, which
stop after the first matching rule. This is because ...Map:
filters are rewriting entire strings and there can be only
one result string, while the StringEdit filter may be
rewriting pieces of string and multiple rewrites may be
combined to good effect.

The Fields List
===============

A comma separated list of field names. Any field may be
edited except those that begin with "source_".

The Match Expression
====================

For each field, the match expression is run against the
field and, if it matches, causes all matching portions of
string to be replaced.

The match expression is a full perl5 regular expression
enclosed in /.../ delimiters or a plain string, either of
which may be enclosed in '' or "" delimiters if inline
spaces are needed (rare, we hope).

The Replacement Expression
==========================

Each match is replaced by one instance of the replacement
expression, optionally enclosed in single or double
quotation marks.

The replacement expression provides a limited list of C
sprintf style macros:

    %d      The decimal codes for each character in the match
    %o      The octal codes for each character in the match
    %x      The hex codes for each character in the match

Any non-letter preceded by a backslash "\" character is
replaced by itself. Some more or less useful examples:

    \% \\ \" \' \` \{ \} \$ \* \+ \? \1

If a punctuation character other than a period (.) or slash
"/" follows a letter macro, it must be escaped using the
backslash character (this is to reserve room in the spec
for postfix modifiers like "*", "+", and "?"). So, to put a
literal star (*) after a hex code, you would do something
like "%02x\*".

the_future %x* %x{1} %x{1,} %x{,3} %x{1,3}
==========================================

The "normal" perl5 letter abbreviations are also allowed:

           \t          tab             (HT, TAB)
           \n          newline         (NL)
           \r          return          (CR)
           \f          form feed       (FF)
           \b          backspace       (BS)
           \a          alarm (bell)    (BEL)
           \e          escape          (ESC)
           \033        octal char      (ESC)
           \x1b        hex char        (ESC)
           \x{263a}    wide hex char   (SMILEY)
           \c[         control char    (ESC)
           \N{name}    named Unicode character

including the following escape sequences are available in
constructs that modify what follows:

           \l          lowercase next char
           \u          uppercase next char
           \L          lowercase till \E
           \U          uppercase till \E
           \E          end case modification
           \Q          quote non-word characters till \E

As shown above, normal sprintf-style options may be
included (and are recommended), so %02x produces results
like "%09" (if the match was a single TAB character) or
"%20" (if the match was a SPACE character). The dot
precision modifiers (".3") are not supported, just the
leading 0 and the field width specifier.

Case sensitivity
================

By default, all patterns are case sensitive. There is no
way to override this at present; one will be added.

Command Line Parsing
====================

For large stringedits or repeated use, the stringedit is
best specified in a .vcp file. For quick one-offs or
scripted situations, however, the stringedit: scheme may be
used on the command line. In this case, each parameter is a
"word" and every triple of words is a ( pattern, result )
pair.

Because vcp command line parsing is performed incrementally
and the next filter or destination specifications can look
exactly like a pattern or result, the special token "--" is
used to terminate the list of patterns if StringEdit: is
used on the command line. This may also be the last word in
the "StringEdit:" section of a .vcp file, but that is
superfluous. It is an error to use "--" before the last
word in a .vcp file.

test_script t/61stringedit.t
============================
TOPIC
########################################################################
'filter::identity' => <<'TOPIC',
NAME
    VCP::Filter::identity - identity (ie noop)

SYNOPSIS
       vcp <source> identity: <dest>

DESCRIPTION
    A simple passthrough, used for testing to make sure that VCP::Filter
    really is a pass through and that vcp can load filters.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'filter::identity usage' => <<'TOPIC',
Usage:
       vcp <source> identity: <dest>
TOPIC
########################################################################
'filter::identity description' => <<'TOPIC',

A simple passthrough, used for testing to make sure that
VCP::Filter really is a pass through and that vcp can load
filters.

test_script t/10vcp.t
=====================
TOPIC
########################################################################
'dest::perl_data' => <<'TOPIC',
NAME
    VCP::Dest::perl_data - emit metadata to a log file

SYNOPSIS
        vcp ... perl_data:         # to vcp.log
        vcp ... perl_data:-:       # to STDOUT
        vcp ... perl_data:foo.log: # to foo.log

DESCRIPTION
    Dump all data structures to a log file or STDOUT.

    This is intended to be used when reproducing bugs to capture a
    metadata stream that can be copy-pasted-tweaked in to a t/99*.t test
    program.

    Not a supported module, API and behavior may change without warning.

    See source code and test suites for how to capture data structures
    in scalars, arrays and hashes.

AUTHOR
    Barrie Slaymaker <barries@slaysys.com>

COPYRIGHT
    Copyright (c) 2000, 2001, 2002 Perforce Software, Inc. All rights
    reserved.

    See VCP::License ("vcp help license") for the terms of use.
TOPIC
########################################################################
'dest::perl_data usage' => <<'TOPIC',
Usage:
        vcp ... perl_data:         # to vcp.log
        vcp ... perl_data:-:       # to STDOUT
        vcp ... perl_data:foo.log: # to foo.log
TOPIC
########################################################################
'dest::perl_data description' => <<'TOPIC',

Dump all data structures to a log file or STDOUT.

This is intended to be used when reproducing bugs to
capture a metadata stream that can be copy-pasted-tweaked
in to a t/99*.t test program.

Not a supported module, API and behavior may change without
warning.

See source code and test suites for how to capture data
structures in scalars, arrays and hashes.
TOPIC
