# # # add_file "docs/Regexp-Details.html" # content [292e1d632b1d603a8f6bdf8cf165954653b40511] # # add_file "docs/Regexp-Summary.html" # content [3bd9b164c47cfebf0b14099e9ca05f3be4d2cb11] # # add_file "docs/Regexps.html" # content [815bd739ce0d1f2d8d11a5e170d0802c305147e5] # # patch "INSTALL" # from [83edc001e560d3122afd0e58dfe34b5682c7ee57] # to [240d5caa067b94a749936660cba6302378484413] # # patch "NEWS" # from [0dee337ea44af422a00dfacf962d088d5a5cb0c0] # to [7d6bb127f7ecd2dc50c01b9ecbf5a4d85841b0c2] # # patch "UPGRADE" # from [dd53acd07dfebc6c40834f89af70696a6e6656a2] # to [b7e014bfa58db309cb29c06816070168e602fd14] # # patch "docs/Additional-Lua-Functions.html" # from [285e2b1228c364b9a93ae04192664a18fe95a8e7] # to [846e1a5b8af06f43d8d8356df8ec75a7e91a6972] # # patch "docs/Automation.html" # from [aeef8408071b1a6a89c6030e08903f40fddfe1c9] # to [356f171884468556f00bb7a1cf101d1a52e8d86c] # # patch "docs/Branching-and-Merging.html" # from [e3acb9bc720c229d0f49419237fb1938979aabc8] # to [2f0bd679bd2111d317fc4bbea458a889d1e655c2] # # patch "docs/Certificate.html" # from [c5e3af1fe0b79b52af89a16404c5f0351ef56fda] # to [2565b478678bf30855d226b656a5abe73791ff6a] # # patch "docs/Database.html" # from [143826efd443bd9e9a0d3b4c19700e4a560e4e10] # to [c6e7c8ae101dfccc96542336f0696e5e03d4480c] # # patch "docs/Default-hooks.html" # from [1e0a8b65e1e8eb0bf4cdc6cdc53afe6a2c382875] # to [a3abe4ba183ea064970139d1e081b21c499f3d74] # # patch "docs/General-Index.html" # from [a11b038433091e40b0854bbaa127d19b76348dc8] # to [d9860f42e76b6146e214ff8bd608285a2ba6f70a] # # patch "docs/Hook-Reference.html" # from [2dc893bf620477f318604f8c52863207ee23dafb] # to [7a412358366c360d81a2e428b6b824cccaecb272] # # patch "docs/Hooks.html" # from [ea0cfac9c17a82b5e6c81c581b1fca4175e8c79f] # to [673c5a499a3c72d3ef132f2470ae863c62d57ea4] # # patch "docs/Informative.html" # from [468bbecb358334b5ff81068ae5c000756c8ad89e] # to [ffe3b1df0339c4769f834001b639319d04ab2ea4] # # patch "docs/Key-and-Cert-Trust.html" # from [4e03afc439a3c78f282345064f1491799a309e57] # to [87061cac1bac7fc663f3b53fa7d2635f072e8083] # # patch "docs/Mark_002dMerge.html" # from [170d0082dea175d046bd02afac38a47877f4eeab] # to [7b95b1f2f7b6c7414d5523663e761de124a09529] # # patch "docs/Network-Service-Revisited.html" # from [e23c224b329abe7bc8bc7d9730306a6b1721f678] # to [2e8694d09a1dd96b0bf8d23b88887c651db4790d] # # patch "docs/Other-Transports.html" # from [0ed42ec35fac43d4952525eb0b1b48af8a38aed8] # to [12d427aa02fa9c04751b31f614488da0dff04d51] # # patch "docs/Packet-I_002fO.html" # from [7d822161a943e69f8067fc45e30602f382ee4f2d] # to [44d9a934e80c37a1bdd826fd86f9b5218563833f] # # patch "docs/RCS.html" # from [eb5d2079dc0b1773126a6ba6af83107711b64d00] # to [8e283147bdb9cd6822a2b262d73fef95d0ebf94b] # # patch "docs/Rebuilding-ancestry.html" # from [9da65a984054c62eca74ebb2a49b4da5108e77c8] # to [6424f9076b7d8ea298d9d616b003948e024624f4] # # patch "docs/Reserved-Certs.html" # from [7b2a14f984ee84b8740915b9480b72c750758579] # to [d6a34981e320fd16d5a567b0dc2d5b3970490790] # # patch "docs/Reserved-Files.html" # from [7810aa5ad950172eeb011bfda7407b13a1a42562] # to [4b48d549c9e557f0a6de1f98adfba40bcc04baea] # # patch "docs/Selectors.html" # from [9a60f992b48d07bbada4ce3c695f8ee65062268a] # to [e9adcb321fca68051ebe37e1f1d18894e0db656b] # # patch "docs/Special-Topics.html" # from [1d7f09e4100112085eb74a0dcdfd758df31cd628] # to [d54cb05bd92502bfe2a5c20a63e9c143f3709b36] # # patch "docs/Versions-of-files.html" # from [4f859f5d77dce0db8caa488ed73f3e2d8bc473a3] # to [ef657adad28e050daad568c6a40e338c8e7c2049] # # patch "docs/Workspace.html" # from [ebe5af40895414d95ec558500a321add67c6a97e] # to [40b4543d5606e6701dace2e33f41aac6786bfd8c] # # patch "docs/index.html" # from [db741d09103ba746b51e21fd18c683210dbdf819] # to [617ce735f037a9cd24f680774150561b890a2faa] # # patch "monotone.html" # from [187fc9e45bca3a1388b1854fb98036c92cd2b109] # to [0c51e9b6862604f9724e8aadf48028c579da628f] # # patch "monotone.pdf" # from [76db467cd3e846e2f84fca406c3285c313bd032d] # to [de109abbbb027f488eb8b286ed2cdc4a85737d37] # ============================================================ --- docs/Regexp-Details.html 292e1d632b1d603a8f6bdf8cf165954653b40511 +++ docs/Regexp-Details.html 292e1d632b1d603a8f6bdf8cf165954653b40511 @@ -0,0 +1,2152 @@ + + +Regexp Details - monotone documentation + + + + + + + + + + + +
+

+ +Previous: Regexp Summary, +Up: Regexps +


+
+ +

7.5.2 Regexp Details

+ +

The syntax and semantics of PCRE regular expressions, as used in +Monotone, are described in detail below. Regular expressions in +general are covered in a number of books, some of which have copious +examples. Jeffrey Friedl's “Mastering Regular Expressions,” +published by O'Reilly, covers regular expressions in great detail. +This description is intended as reference material. + +

Characters and Metacharacters
+ +

A regular expression is a pattern that is matched against a subject +string from left to right. Most characters stand for themselves in a +pattern, and match the corresponding characters in the subject. As a +trivial example, the pattern + +

+         The quick brown fox
+
+ +

matches a portion of a subject string that is identical to +itself. When caseless matching is specified, letters are matched +independently of case. + +

The power of regular expressions comes from the ability to include +alternatives and repetitions in the pattern. These are encoded in the +pattern by the use of metacharacters, which do not stand for +themselves but instead are interpreted in some special way. + +

There are two different sets of metacharacters: those that are +recognized anywhere in the pattern except within square brackets, and +those that are recognized within square brackets. Outside square +brackets, the metacharacters are as follows: + +

+
\
general escape character with several uses +
^
assert start of string (or line, in multiline mode) +
$
assert end of string (or line, in multiline mode) +
.
match any character except newline (by default) +
[
start character class definition +
|
start of alternative branch +
(
start subpattern +
)
end subpattern +
?
extends the meaning of `(' + also 0 or 1 quantifier + also quantifier minimizer +
*
0 or more quantifier +
+
1 or more quantifier + also “possessive quantifier” +
{
start min/max quantifier +
+ +

Part of a pattern that is in square brackets is called a "character +class". In a character class the only metacharacters are: + +

+
\
general escape character +
^
negate the class, but only if the first character +
-
indicates character range +
[
POSIX character class (only if followed by POSIX + syntax) +
]
terminates the character class +
+ +

The following sections describe the use of each of the metacharacters. + +

Backslash
+ +

The backslash character has several uses. Firstly, if it is followed +by a non-alphanumeric character, it takes away any special meaning +that character may have. This use of backslash as an escape character +applies both inside and outside character classes. + +

For example, if you want to match a `*' character, you write +`\*' in the pattern. This escaping action applies whether or not +the following character would otherwise be interpreted as a +metacharacter, so it is always safe to precede a non-alphanumeric with +backslash to specify that it stands for itself. In particular, if you +want to match a backslash, you write `\\'. + +

If a pattern is compiled with the `(?x)' option, whitespace in +the pattern (other than in a character class) and characters between a +`#' outside a character class and the next newline are +ignored. An escaping backslash can be used to include a whitespace or +`#' character as part of the pattern. + +

If you want to remove the special meaning from a sequence of +characters, you can do so by putting them between `\Q' and +`\E'. The `\Q...\E' sequence is recognized both inside and +outside character classes. + +

Non-printing Characters
+ +

A second use of backslash provides a way of encoding non-printing characters +in patterns in a visible manner. There is no restriction on the appearance of +non-printing characters, apart from the binary zero that terminates a pattern, +but when a pattern is being prepared by text editing, it is usually easier to +use one of the following escape sequences than the binary character it +represents: + +

+
\a
alarm, that is, the BEL character (hex 07) +
\cx
"control-x", where x is any character +
\e
escape (hex 1B) +
\f
formfeed (hex 0C) +
\n
linefeed (hex 0A) +
\r
carriage return (hex 0D) +
\t
tab (hex 09) +
\ddd
character with octal code ddd, or backreference +
\xhh
character with hex code hh +
\x{hhh...}
character with hex code hhh... +
+ +

The precise effect of `\cx' is as follows: if x is a lower +case letter, it is converted to upper case. Then bit 6 of the +character (hex 40) is inverted. Thus `\cz' becomes hex 1A (the +<SUB> control character, in ASCII), but `\c{' becomes hex 3B +(`;'), and `\c;' becomes hex 7B (`{'). + + + +

After `\x', from zero to two hexadecimal digits are read (letters +can be in upper or lower case). Any number of hexadecimal digits may +appear between `\x{' and `}', but the value of the +character code must be less than 256 in non-UTF-8 mode, and less than +231in UTF-8 mode. That is, the maximum value in hexadecimal is +7FFFFFFF. Note that this is bigger than the largest Unicode code +point, which is 10FFFF. + +

If characters other than hexadecimal digits appear between `\x{' +and `}', or if there is no terminating `}', this form of +escape is not recognized. Instead, the initial `\x' will be +interpreted as a basic hexadecimal escape, with no following digits, +giving a character whose value is zero. + +

Characters whose value is less than 256 can be defined by either of +the two syntaxes for `\x'. There is no difference in the way they +are handled. For example, `\xdc' is exactly the same as +`\x{dc}'. + +

After `\0' up to two further octal digits are read. If there are +fewer than two digits, just those that are present are used. Thus the +sequence `\0\x\07' specifies two binary zeros followed by a +<BEL> character (octal 007). Make sure you supply two digits after +the initial zero if the pattern character that follows is itself an +octal digit. + +

The handling of a backslash followed by a digit other than 0 is +complicated. Outside a character class, PCRE reads it and any +following digits as a decimal number. If the number is less than 10, +or if there have been at least that many previous capturing left +parentheses in the expression, the entire sequence is taken as a +back reference. A description of how this works is given later, +following the discussion of parenthesized subpatterns. + +

Inside a character class, or if the decimal number is greater than 9 +and there have not been that many capturing subpatterns, PCRE re-reads +up to three octal digits following the backslash, and uses them to +generate a data character. Any subsequent digits stand for +themselves. In non-UTF-8 mode, the value of a character specified in +octal must be less than `\400'. In UTF-8 mode, values up to +`\777' are permitted. For example: + +

+
\040
is another way of writing a space +
\40
is the same, provided there are fewer than 40 + previous capturing subpatterns +
\7
is always a back reference +
\11
might be a back reference, or another way of + writing a tab +
\011
is always a tab +
\0113
is a tab followed by the character `3' +
\113
might be a back reference, otherwise the + character with octal code 113 +
\377
might be a back reference, otherwise + the byte consisting entirely of 1 bits +
\81
is either a back reference, or a binary zero + followed by the two characters `8' and `1' +
+ +

Note that octal values of 100 or greater must not be introduced by a +leading zero, because no more than three octal digits are ever read. + +

All the sequences that define a single character value can be used +both inside and outside character classes. In addition, inside a +character class, the sequence `\b' is interpreted as the <BS> +character (hex 08), and the sequences `\R' and `\X' are +interpreted as the characters `R' and `X', +respectively. Outside a character class, these sequences have +different meanings (see below). + +

Absolute and Relative Back References
+ +

The sequence `\g' followed by an unsigned or a negative number, +optionally enclosed in braces, is an absolute or relative back +reference. A named back reference can be coded as +`\g{name}'. Back references are discussed later, following the +discussion of parenthesized subpatterns. + +

Generic character types
+ +

Another use of backslash is for specifying generic character types. The +following are always recognized: + +

+
\d
any decimal digit +
\D
any character that is not a decimal digit +
\h
any horizontal whitespace character +
\H
any character that is not a horizontal whitespace character +
\s
any whitespace character +
\S
any character that is not a whitespace character +
\v
any vertical whitespace character +
\V
any character that is not a vertical whitespace character +
\w
any “word” character +
\W
any “non-word” character +
+ +

Each pair of escape sequences partitions the complete set of +characters into two disjoint sets. Any given character matches one, +and only one, of each pair. + +

These character type sequences can appear both inside and outside +character classes. They each match one character of the appropriate +type. If the current matching point is at the end of the subject +string, all of them fail, since there is no character to match. + +

For compatibility with Perl, `\s' does not match the <VT> +character (code 11). This makes it different from the the POSIX +“space” class. The `\s' characters are <TAB> (9), <LF> +(10), <FF> (12), <CR> (13), and <SPACE> (32). + +

In UTF-8 mode, characters with values greater than 128 never match +`\d', `\s', or `\w', and always match `\D', +`\S', and `\W'. These sequences retain their original +meanings from before UTF-8 support was available, mainly for +efficiency reasons. + +

The sequences `\h', `\H', `\v', and `\V' are Perl +5.10 features. In contrast to the other sequences, these do match +certain high-valued codepoints in UTF-8 mode. The horizontal space +characters are: + +

+
U+0009
Horizontal tab +
U+0020
Space +
U+00A0
Non-break space +
U+1680
Ogham space mark +
U+180E
Mongolian vowel separator +
U+2000
En quad +
U+2001
Em quad +
U+2002
En space +
U+2003
Em space +
U+2004
Three-per-em space +
U+2005
Four-per-em space +
U+2006
Six-per-em space +
U+2007
Figure space +
U+2008
Punctuation space +
U+2009
Thin space +
U+200A
Hair space +
U+202F
Narrow no-break space +
U+205F
Medium mathematical space +
U+3000
Ideographic space +
+ +

The vertical space characters are: + +

+
U+000A
Linefeed +
U+000B
Vertical tab +
U+000C
Formfeed +
U+000D
Carriage return +
U+0085
Next line +
U+2028
Line separator +
U+2029
Paragraph separator +
+ +

A “word” character is an underscore or any character less than 256 +that is a letter or digit. The definition of letters and digits is +that used for the “C” locale. + +

Newline Conventions
+ +

PCRE supports five different conventions for indicating line breaks in +strings: a single CR (carriage return) character, a single LF +(linefeed) character, the two-character sequence CRLF, any of the +three preceding, or any Unicode newline sequence. The default is to +match any Unicode newline sequence. It is possible to override the +default newline convention by starting a pattern string with one of +the following five sequences: + +

+
(*CR)
carriage return +
(*LF)
linefeed +
(*CRLF)
carriage return, followed by linefeed +
(*ANYCRLF)
any of the three above +
(*ANY)
all Unicode newline sequences +
+ +

For example, the pattern + +

+         (*CR)a.b
+
+ +

changes the convention to CR. That pattern matches `a\nb' because +LF is no longer a newline. Note that these special settings, which are +not Perl-compatible, are recognized only at the very start of a +pattern, and that they must be in upper case. If more than one of them +is present, the last one is used. + +

The newline convention does not affect what the `\R' escape +sequence matches. By default, this is any Unicode newline sequence, +for Perl compatibility. However, this can be changed; see the +description of `\R' below. A change of `\R' setting can be +combined with a change of newline convention. + +

Newline Sequences
+ +

Outside a character class, by default, the escape sequence `\R' matches +any Unicode newline sequence. This is a Perl 5.10 feature. In +non-UTF-8 mode `\R' is equivalent to the following: + +

+         (?>\r\n|\n|\x0b|\f|\r|\x85)
+
+ +

This is an example of an "atomic group", details of which are given +below. This particular group matches either the two-character +sequence <CR> followed by <LF>, or one of the single +characters <LF> (linefeed, U+000A), <VT> (vertical tab, +U+000B), <FF> (formfeed, U+000C), <CR> (carriage +return, U+000D), or <NEL> (next line, U+0085). The +two-character sequence is treated as a single unit that cannot be +split. In UTF-8 mode, two additional characters whose codepoints are +greater than 255 are added: <LS> (line separator, U+2028) +and <PS> (paragraph separator, U+2029). + +

It is possible to change the meaning of `\R' by starting a +pattern string with one of the following sequences: + +

+
(*BSR_ANYCRLF)
<CR>, <LF>, or <CR><LF> only +
(*BSR_UNICODE)
any Unicode newline sequence (the default) +
+ +

Note that these special settings, which are not Perl-compatible, are +recognized only at the very start of a pattern, and that they must be +in upper case. If more than one of them is present, the last one is +used. They can be combined with a change of newline convention, for +example, a pattern can start with: + +

+         (*ANY)(*BSR_ANYCRLF)
+
+ +

Inside a character class, `\R' matches the letter `R'. + +

Unicode Character Properties
+ +

Three additional escape sequences match characters with specific +Unicode properties. When not in UTF-8 mode, these sequences are of +course limited to testing characters whose codepoints are less than +256, but they do work in this mode. The extra escape sequences are: + +

+
\p{xx}
a character with the xx property +
\P{xx}
a character without the xx property +
\X
an extended Unicode sequence +
+ +

The property names represented by xx above are limited to the +Unicode script names, the general category properties, and `Any', +which matches any character (including newline). Other properties such +as `InMusicalSymbols' are not currently supported by PCRE. Note +that `\P{Any}' does not match any characters, so always causes +a match failure. + +

Sets of Unicode characters are defined as belonging to certain +scripts. A character from one of these sets can be matched using a +script name. For example: + +

+         \p{Greek}
+         \P{Han}
+
+ +

Those that are not part of an identified script are lumped together as +“Common.” The current list of scripts is: + +

Arabic, +Armenian, +Balinese, +Bengali, +Bopomofo, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Cherokee, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Greek, +Gujarati, +Gurmukhi, +Han, +Hangul, +Hanunoo, +Hebrew, +Hiragana, +Inherited, +Kannada, +Katakana, +Kharoshthi, +Khmer, +Lao, +Latin, +Limbu, +Linear_B, +Malayalam, +Mongolian, +Myanmar, +New_Tai_Lue, +Nko, +Ogham, +Old_Italic, +Old_Persian, +Oriya, +Osmanya, +Phags_Pa, +Phoenician, +Runic, +Shavian, +Sinhala, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tamil, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Ugaritic, +Yi. + +

Each character has exactly one general category property, specified by a +two-letter abbreviation. For compatibility with Perl, negation can be specified +by including a circumflex between the opening brace and the property name. For +example, `\p{^Lu}' is the same as `\P{Lu}'. + +

If only one letter is specified with `\p' or `\P', it +includes all the general category properties that start with that +letter. In this case, in the absence of negation, the curly brackets +in the escape sequence are optional; these two examples have the same +effect: + +

+         \p{L}
+         \pL
+
+ +

The following general category property codes are supported: + +

+
C
Other +
Cc
Control +
Cf
Format +
Cn
Unassigned +
Co
Private use +
Cs
Surrogate + +
L
Letter +
Ll
Lower case letter +
Lm
Modifier letter +
Lo
Other letter +
Lt
Title case letter +
Lu
Upper case letter + +
M
Mark +
Mc
Spacing mark +
Me
Enclosing mark +
Mn
Non-spacing mark + +
N
Number +
Nd
Decimal number +
Nl
Letter number +
No
Other number + +
P
Punctuation +
Pc
Connector punctuation +
Pd
Dash punctuation +
Pe
Close punctuation +
Pf
Final punctuation +
Pi
Initial punctuation +
Po
Other punctuation +
Ps
Open punctuation + +
S
Symbol +
Sc
Currency symbol +
Sk
Modifier symbol +
Sm
Mathematical symbol +
So
Other symbol + +
Z
Separator +
Zl
Line separator +
Zp
Paragraph separator +
Zs
Space separator +
+ +

The special property `L&' is also supported: it matches a +character that has the `Lu', `Ll', or `Lt' property, in +other words, a letter that is not classified as a modifier or +“other.” + +

The `Cs' (Surrogate) property applies only to characters in the +range U+D800 to U+DFFF. Such characters are not valid in +UTF-8 strings (see RFC 3629) and so cannot be tested by PCRE. + +

The long synonyms for these properties that Perl supports (such as +`\p{Letter}') are not supported by PCRE, nor is it permitted to +prefix any of these properties with `Is'. + +

No character that is in the Unicode table has the `Cn' +(unassigned) property. Instead, this property is assumed for any code +point that is not in the Unicode table. + +

Specifying caseless matching does not affect these escape sequences. For +example, `\p{Lu}' always matches only upper case letters. + +

The `\X' escape matches any number of Unicode characters that +form an extended Unicode sequence. `\X' is equivalent to + +

+         (?>\PM\pM*)
+
+ +

That is, it matches a character without the “mark” property, +followed by zero or more characters with the “mark” property, and +treats the sequence as an atomic group (see below). Characters with +the “mark” property are typically accents that affect the preceding +character. None of them have codepoints less than 256, so in non-UTF-8 +mode `\X' matches any one character. + +

Matching characters by Unicode property is not fast, because PCRE has +to search a structure that contains data for over fifteen thousand +characters. That is why the traditional escape sequences such as +`\d' and `\w' do not use Unicode properties in PCRE. + +

Resetting the Match Start
+ +

The escape sequence `\K', which is a Perl 5.10 feature, causes +any previously matched characters not to be included in the final +matched sequence. For example, the pattern: + +

+         foo\Kbar
+
+ +

matches `foobar', but reports that it has matched +`bar'. This feature is similar to a lookbehind assertion +(described below). However, in this case, the part of the subject +before the real match does not have to be of fixed length, as +lookbehind assertions do. The use of `\K' does not interfere with the +setting of captured substrings. For example, when the pattern + +

+         (foo)\Kbar
+
+ +

matches `foobar', the first substring is still set to `foo'. + +

Simple assertions
+ +

The final use of backslash is for certain simple assertions. An +assertion specifies a condition that has to be met at a particular +point in a match, without consuming any characters from the subject +string. The use of subpatterns for more complicated assertions is +described below. The backslashed assertions are: + +

+
\b
matches at a word boundary +
\B
matches when not at a word boundary +
\A
matches at the start of the subject +
\Z
matches at the end of the subject + also matches before a newline at the end of the subject +
\z
matches only at the end of the subject +
\G
matches at the first matching position in the subject +
+ +

These assertions may not appear in character classes (but note that +`\b' has a different meaning, namely the backspace character, +inside a character class). + +

A word boundary is a position in the subject string where the current +character and the previous character do not both match `\w' or +`\W' (i.e. one matches `\w' and the other matches +`\W'), or the start or end of the string if the first or last +character matches `\w', respectively. + +

The `\A', `\Z', and `\z' assertions differ from the +traditional circumflex and dollar (described in the next section) in +that they only ever match at the very start and end of the subject +string, whatever options are set. Thus, they are independent of +multiline mode. The difference between `\Z' and `\z' is that +`\Z' matches before a newline at the end of the string as well as +at the very end, whereas `\z' matches only at the end. + +

The `\G' assertion is true only when the current matching +position is at the start point of the match. As used in Monotone, +`\G' is always equal to `\A'. + +

Circumflex and Dollar
+ +

Outside a character class, in the default matching mode, the +circumflex character, `^', is an assertion that is true only if +the current matching point is at the start of the subject string. +Inside a character class, circumflex has an entirely different meaning +(see below). + +

Circumflex need not be the first character of the pattern if a number +of alternatives are involved, but it should be the first thing in each +alternative in which it appears if the pattern is ever to match that +branch. If all possible alternatives start with a circumflex, that is, +if the pattern is constrained to match only at the start of the +subject, it is said to be an “anchored” pattern. (There are also +other constructs that can cause a pattern to be anchored.) + +

A dollar character, `$', is an assertion that is true only if the +current matching point is at the end of the subject string, or +immediately before a newline at the end of the string (by +default). Dollar need not be the last character of the pattern if a +number of alternatives are involved, but it should be the last item in +any branch in which it appears. Dollar has no special meaning in a +character class. + +

The meanings of the circumflex and dollar characters are changed if +the `(?m)' option is set. When this is the case, a circumflex +matches immediately after internal newlines as well as at the start of +the subject string. It does not match after a newline that ends the +string. A dollar matches before any newlines in the string, as well as +at the very end, when `(?m)' is set. When newline is specified as +the two-character sequence <CR><LF>, isolated <CR> and +<LF> characters do not indicate newlines. + +

For example, the pattern `^abc$' matches the subject string +`def\nabc' (where `\n' represents a newline) in multiline +mode, but not otherwise. Consequently, patterns that are anchored in +single line mode because all branches start with ^ are not anchored in +multiline mode. + +

Note that the sequences `\A', `\Z', and `\z' can be +used to match the start and end of the subject in both modes, and if +all branches of a pattern start with `\A' it is always anchored. + +

Full Stop (Period, Dot)
+ +

Outside a character class, a dot in the pattern matches any one +character in the subject string except (by default) a character that +signifies the end of a line. In UTF-8 mode, the matched character may +be more than one byte long. + +

When a line ending is defined as a single character, dot never matches +that character; when the two-character sequence <CR><LF> is +used, dot does not match <CR> if it is immediately followed by +<LF>, but otherwise it matches all characters (including isolated +<CR>s and <LF>s). When any Unicode line endings are being +recognized, dot does not match <CR> or <LF> or any of the +other line ending characters. + +

The behaviour of dot with regard to newlines can be changed. If the +`(?s)' option is set, a dot matches any one character, without +exception. If the two-character sequence <CR><LF> is present +in the subject string, it takes two dots to match it. + +

The handling of dot is entirely independent of the handling of circumflex and +dollar, the only relationship being that they both involve newlines. Dot has no +special meaning in a character class. + +

Matching a Single Byte
+ +

Outside a character class, the escape sequence `\C' matches any +one byte, both in and out of UTF-8 mode. Unlike a dot, it always +matches any line-ending characters. The feature is provided in Perl in +order to match individual bytes in UTF-8 mode. Because it breaks up +UTF-8 characters into individual bytes, what remains in the string may +be a malformed UTF-8 string. For this reason, the `\C' escape +sequence is best avoided. + +

PCRE does not allow `\C' to appear in lookbehind assertions +(described below), because in UTF-8 mode this would make it impossible +to calculate the length of the lookbehind. + +

Square Brackets and Character Classes
+ +

An opening square bracket introduces a character class, terminated by +a closing square bracket. A closing square bracket on its own is not +special. If a closing square bracket is required as a member of the +class, it should be the first data character in the class (after an +initial circumflex, if present) or escaped with a backslash. + +

A character class matches a single character in the subject. In UTF-8 +mode, the character may occupy more than one byte. A matched character +must be in the set of characters defined by the class, unless the +first character in the class definition is a circumflex, in which case +the subject character must not be in the set defined by the class. If +a circumflex is actually required as a member of the class, ensure it +is not the first character, or escape it with a backslash. + +

For example, the character class `[aeiou]' matches any lower case +vowel, while `[^aeiou]' matches any character that is not a lower +case vowel. Note that a circumflex is just a convenient notation for +specifying the characters that are in the class by enumerating those +that are not. A class that starts with a circumflex is not an +assertion: it still consumes a character from the subject string, and +therefore it fails if the current pointer is at the end of the string. + +

In UTF-8 mode, characters with values greater than 255 can be included +in a class as a literal string of bytes, or by using the `\x{' +escaping mechanism. + +

When caseless matching is set, any letters in a class represent both +their upper case and lower case versions, so for example, a caseless +`[aeiou]' matches `A' as well as `a', and a caseless [^aeiou] +does not match `A', whereas a caseful version would. In UTF-8 mode, +PCRE always understands the concept of case for characters whose +values are less than 128, so caseless matching is always possible. For +characters with higher values, the concept of case is supported if +PCRE is compiled with Unicode property support, but not otherwise. If +you want to use caseless matching for characters 128 and above, you +must ensure that PCRE is compiled with Unicode property support as +well as with UTF-8 support. + +

Characters that might indicate line breaks are never treated in any +special way when matching character classes, whatever line-ending +sequence is in use, and whatever setting of the `(?s)' and +`(?m)' options is used. A class such as `[^a]' always +matches one of these characters. + +

The minus (hyphen) character can be used to specify a range of +characters in a character class. For example, `[d-m]' matches any +letter between `d' and `m', inclusive. If a minus character +is required in a class, it must be escaped with a backslash or appear +in a position where it cannot be interpreted as indicating a range, +typically as the first or last character in the class. + +

It is not possible to have the literal character `]' as the end +character of a range. A pattern such as `[W-]46]' is interpreted +as a class of two characters (`W' and `-') followed by a +literal string `46]', so it would match `W46]' or +`-46]'. However, if the `]' is escaped with a backslash it +is interpreted as the end of range, so `[W-\]46]' is interpreted +as a class containing a range followed by two other characters. The +octal or hexadecimal representation of `]' can also be used to +end a range. + +

Ranges operate in the collating sequence of character values. They can +also be used for characters specified numerically, for example +`[\000-\037]'. In UTF-8 mode, ranges can include characters whose +values are greater than 255, for example `[\x{100}-\x{2ff}]'. + +

If a range that includes letters is used when caseless matching is +set, it matches the letters in either case. For example, `[W-c]' +is equivalent to `[][\\^_`wxyzabc]', matched caselessly. + +

The character types `\d', `\D', `\p', `\P', +`\s', `\S', `\w', and `\W' may also appear in a +character class, and add the characters that they match to the +class. For example, `[\dABCDEF]' matches any hexadecimal digit. A +circumflex can conveniently be used with the upper case character +types to specify a more restricted set of characters than the matching +lower case type. For example, the class `[^\W_]' matches any +letter or digit, but not underscore. + +

The only metacharacters that are recognized in character classes are +backslash, hyphen (only where it can be interpreted as specifying a +range), circumflex (only at the start), opening square bracket (only +when it can be interpreted as introducing a POSIX class name—see the +next section), and the terminating closing square bracket. However, +escaping other non-alphanumeric characters does no harm. + +

POSIX Character Classes
+ +

Perl supports the POSIX notation for character classes. This uses +names enclosed by `[:' and `:]' within the enclosing square +brackets. PCRE also supports this notation. For example, + +

+         [01[:alpha:]%]
+
+ +

matches `0', `1', any alphabetic character, or `%'. The +supported class names are + +

+
alnum
letters and digits +
alpha
letters +
ascii
character codes 0 – 127 +
blank
space or tab only +
cntrl
control characters +
digit
decimal digits (same as `\d') +
graph
printing characters, excluding space +
lower
lower case letters +
print
printing characters, including space +
punct
printing characters, excluding letters and digits +
space
white space (not quite the same as `\s') +
upper
upper case letters +
word
“word” characters (same as `\w') +
xdigit
hexadecimal digits +
+ +

The “space” characters are <HT> (9), <LF> (10), <VT> +(11), <FF> (12), <CR> (13), and space (32). Notice that this +list includes the <VT> character (code 11). This makes "space" +different to `\s', which does not include <VT> (for Perl +compatibility). + +

The name “word” is a Perl extension, and “blank” is a GNU +extension from Perl 5.8. Another Perl extension is negation, which is +indicated by a `^' character after the colon. For example, + +

+         [12[:^digit:]]
+
+ +

matches `1', `2', or any non-digit. PCRE (and Perl) also +recognize the POSIX syntax `[.ch.]' and `[=ch=]' +where ch is a “collating element,” but these are not +supported, and an error is given if they are encountered. + +

In UTF-8 mode, characters with values greater than 128 do not match +any of the POSIX character classes. + +

Vertical Bar
+ +

Vertical bar characters are used to separate alternative patterns. For +example, the pattern + +

+         gilbert|sullivan
+
+ +

matches either `gilbert' or `sullivan'. Any number of +alternatives may appear, and an empty alternative is permitted +(matching the empty string). The matching process tries each +alternative in turn, from left to right, and the first one that +succeeds is used. If the alternatives are within a subpattern (defined +below), "succeeds" means matching the rest of the main pattern as well +as the alternative in the subpattern. + +

Internal Option Setting
+ +

The behavior of the matching engine can be adjusted from within the +pattern by a sequence of option letters enclosed between `(?' and +`)'. The option letters are + +

+
i
Caseless: characters in one case match the corresponding + characters in other cases as well. +
m
Multiline: `^' and `$' match at newlines + as well as at beginning and end of string. +
s
Dotall: dot matches any character, including newline characters. +
x
Extended syntax: unescaped white space is ignored and embedded + comments are possible. +
J
Dupnames: names for capturing subpattern need not be unique. +
U
Ungreedy: quantifiers match as few times as possible by default. +
X
Extra: for forward compatibility, give an error if any escape sequence +with no defined meaning appears. +
+ +

For example, `(?im)' sets caseless, multiline matching. It is +also possible to unset these options by preceding the letters with a +hyphen, and a combined setting and unsetting such as `(?im-sx)' +is also permitted. (This would set the caseless and multiline options +while unsetting the dotall and extended-syntax options.) If a letter +appears both before and after the hyphen, the option is unset. The +lowercase option letters are Perl-compatible; the uppercase ones are +PCRE only. + +

When an option change occurs at top level (that is, not inside +subpattern parentheses), the change applies to the remainder of the +pattern that follows. An option change within a subpattern (see below +for a description of subpatterns) affects only that part of the +current pattern that follows it, so + +

+         (a(?i)b)c
+
+ +

matches `abc' and `aBc' and no other strings. By this +means, options can be made to have different settings in different +parts of the pattern. Any changes made in one alternative do carry on +into subsequent branches within the same subpattern. For example, + +

+         (a(?i)b|c)
+
+ +

matches `ab', `aB', `c', and `C', even though when +matching `C' the first branch is abandoned before the option +setting. This is because the effects of option settings happen when +the pattern is parsed. There would be some very weird behaviour +otherwise. + +

Note: Unlike these options, the similar, PCRE-specific option +sequences that start with `(*' may appear only at the very +beginning of the pattern. Details of these sequences are given in the +section entitled “Newline sequences,” above. + +

Subpatterns
+ +

Subpatterns are delimited by parentheses (round brackets), which can +be nested. Turning part of a pattern into a subpattern does two +things: + +

    +
  1. It localizes a set of alternatives. For example, the pattern + +
         
    +              cat(aract|erpillar|)
    +
    + +

    matches one of the words `cat', `cataract', or +`caterpillar'. Without the parentheses, it would match +`cataract', `erpillar' or an empty string. + +

  2. It sets up the subpattern as a capturing subpattern. As used in +Monotone this only means that during matching, the portion of the +subject string that matched the subpattern is available for back +references. Captured subpatterns are, for instance, not available to +callers of regex.search. Opening parentheses are counted from +left to right (starting from 1) to obtain numbers for the capturing +subpatterns. + +

    For example, if the string `the red king' is matched against the pattern + +

         
    +              the ((red|white) (king|queen))
    +
    + +

    the captured substrings are `red king', `red', and +`king', and are numbered 1, 2, and 3, respectively. +

+ +

The fact that plain parentheses fulfil two functions is not always +helpful. There are often times when a grouping subpattern is required +without a capturing requirement. If an opening parenthesis is followed +by a question mark and a colon, the subpattern does not do any +capturing, and is not counted when computing the number of any +subsequent capturing subpatterns. For example, if the string `the +white queen' is matched against the pattern + +

+         the ((?:red|white) (king|queen))
+
+ +

the captured substrings are `white queen' and `queen', and +are numbered 1 and 2. The maximum number of capturing subpatterns is +65535. + +

As a convenient shorthand, if any option settings are required at the +start of a non-capturing subpattern, the option letters may appear +between the `?' and the `:'. Thus the two patterns + +

+         (?i:saturday|sunday)
+         (?:(?i)saturday|sunday)
+
+ +

match exactly the same set of strings. Because alternative branches +are tried from left to right, and options are not reset until the end +of the subpattern is reached, an option setting in one branch does +affect subsequent branches, so the above patterns match `SUNDAY' +as well as `Saturday'. + +

Duplicate Subpattern Numbers
+ +

Perl 5.10 introduced a feature whereby each alternative in a +subpattern uses the same numbers for its capturing parentheses. Such a +subpattern starts with `(?|' and is itself a non-capturing +subpattern. For example, consider this pattern: + +

+         (?|(Sat)ur|(Sun))day
+
+ +

Because the two alternatives are inside a `(?|' group, both sets +of capturing parentheses are numbered one. Thus, when the pattern +matches, you can look at captured substring number one, whichever +alternative matched. This construct is useful when you want to capture +part, but not all, of one of a number of alternatives. Inside a +`(?|' group, parentheses are numbered as usual, but the number is +reset at the start of each branch. The numbers of any capturing +buffers that follow the subpattern start after the highest number used +in any branch. The following example is taken from the Perl +documentation. The numbers underneath show in which buffer the +captured content will be stored. + +

+  # before  ---------------branch-reset----------- after
+  / ( a )  (?| x ( y ) z | (p (q) r) | (t) u (v) ) ( z ) /x
+  # 1            2         2  3        2     3     4
+
+ +

A backreference or a recursive call to a numbered subpattern always +refers to the first one in the pattern with the given number. + +

An alternative approach to using this “branch reset” feature is to +use duplicate named subpatterns, as described in the next section. + +

Named Subpatterns
+ +

Identifying capturing parentheses by number is simple, but it can be +very hard to keep track of the numbers in complicated regular +expressions. Furthermore, if an expression is modified, the numbers +may change. To help with this difficulty, PCRE supports the naming of +subpatterns. This feature was not added to Perl until release +5.10. Python had the feature earlier, and PCRE introduced it at +release 4.0, using the Python syntax. PCRE now supports both the Perl +and the Python syntax. + +

In PCRE, a subpattern can be named in one of three ways: +`(?<name>...)' or `(?'name'...)' as in Perl, or +`(?P<name>...)' as in Python. References to capturing +parentheses from other parts of the pattern, such as backreferences, +recursion, and conditions, can be made by name as well as by number. + +

Names consist of up to 32 alphanumeric characters and +underscores. Named capturing parentheses are still allocated numbers +as well as names, exactly as if the names were not present. + +

By default, a name must be unique within a pattern, but it is possible +to relax this constraint by setting the `(?J)' option. This can +be useful for patterns where only one instance of the named +parentheses can match. Suppose you want to match the name of a +weekday, either as a 3-letter abbreviation or as the full name, and in +both cases you want to extract the abbreviation. This pattern +(ignoring the line breaks) does the job: + +

+         (?Jx)
+         (?<DN>Mon|Fri|Sun)(?:day)?|
+         (?<DN>Tue)(?:sday)?|
+         (?<DN>Wed)(?:nesday)?|
+         (?<DN>Thu)(?:rsday)?|
+         (?<DN>Sat)(?:urday)?
+
+ +

There are five capturing substrings, but only one is ever set after a +match. (An alternative way of solving this problem is to use a +“branch reset” subpattern, as described in the previous section.) + +

Repetition
+ +

Repetition is specified by quantifiers, which can follow any of +the following items: + +

+ +

The general repetition quantifier specifies a minimum and maximum +number of permitted matches, by giving the two numbers in curly +brackets (braces), separated by a comma. The numbers must be less than +65536, and the first must be less than or equal to the second. For +example: + +

+         z{2,4}
+
+ +

matches `zz', `zzz', or `zzzz'. A closing brace on its +own is not a special character. If the second number is omitted, but +the comma is present, there is no upper limit; if the second number +and the comma are both omitted, the quantifier specifies an exact +number of required matches. Thus + +

+         [aeiou]{3,}
+
+ +

matches at least 3 successive vowels, but may match many more, while + +

+         \d{8}
+
+ +

matches exactly 8 digits. An opening curly bracket that appears in a +position where a quantifier is not allowed, or one that does not match +the syntax of a quantifier, is taken as a literal character. For +example, `{,6}' is not a quantifier, but a literal string of four +characters. + +

In UTF-8 mode, quantifiers apply to UTF-8 characters rather than to +individual bytes. Thus, for example, `\x{100}{2}' matches two +UTF-8 characters, each of which is represented by a two-byte +sequence. Similarly, `\X{3}' matches three Unicode extended +sequences, each of which may be several bytes long (and they may be of +different lengths). + +

The quantifier `{0}' is permitted, causing the expression to +behave as if the previous item and the quantifier were not present. + +

For convenience, the three most common quantifiers have +single-character abbreviations: + +

+
*
is equivalent to {0,} +
+
is equivalent to {1,} +
?
is equivalent to {0,1} +
+ +

It is possible to construct infinite loops by following a subpattern that can +match no characters with a quantifier that has no upper limit, for example: + +

+         (a?)*
+
+ +

Earlier versions of Perl and PCRE used to give an error at compile +time for such patterns. However, because there are cases where this +can be useful, such patterns are now accepted, but if any repetition +of the subpattern does in fact match no characters, the loop is +forcibly broken. + +

By default, the quantifiers are greedy, that is, they match as +much as possible (up to the maximum number of permitted times), +without causing the rest of the pattern to fail. The classic example +of where this gives problems is in trying to match comments in C +programs. These appear between `/*' and `*/', and within the +comment, individual `*' and `/' characters may appear. An +attempt to match C comments by applying the pattern + +

+         /\*.*\*/
+
+ +

to the string + +

+         /* first comment */  not comment  /* second comment */
+
+ +

fails, because it matches the entire string owing to the greediness of +the `.*' item. + +

However, if a quantifier is followed by a question mark, it ceases to +be greedy, and instead matches the minimum number of times possible, +so the pattern + +

+         /\*.*?\*/
+
+ +

does the right thing with the C comments. The meaning of the various +quantifiers is not otherwise changed, just the preferred number of +matches. Do not confuse this use of question mark with its use as a +quantifier in its own right. Because it has two uses, it can sometimes +appear doubled, as in + +

+         \d??\d
+
+ +

which matches one digit by preference, but can match two if that is the only +way the rest of the pattern matches. + +

If the `(?U)' option is set (an option that is not available in +Perl), the quantifiers are not greedy by default, but individual ones +can be made greedy by following them with a question mark. In other +words, it inverts the default behaviour. + +

When a parenthesized subpattern is quantified with a minimum repeat count that +is greater than 1 or with a limited maximum, more memory is required for the +compiled pattern, in proportion to the size of the minimum or maximum. + +

If a pattern starts with `.*' or `.{0,}' and the +`(?s)' option is set, thus allowing the dot to match newlines, +the pattern is implicitly anchored, because whatever follows will be +tried against every character position in the subject string, so there +is no point in retrying the overall match at any position after the +first. PCRE normally treats such a pattern as though it were preceded +by `\A'. + +

In cases where it is known that the subject string contains no +newlines, it is worth setting `(?s)' in order to obtain this +optimization, or alternatively using `^' or `\A' to indicate +anchoring explicitly. + +

However, there is one situation where the optimization cannot be +used. When .* is inside capturing parentheses that are the subject of +a backreference elsewhere in the pattern, a match at the start may +fail where a later one succeeds. Consider, for example: + +

+         (.*)abc\1
+
+ +

If the subject is `xyz123abc123' the match point is the fourth +character. For this reason, such a pattern is not implicitly anchored. + +

When a capturing subpattern is repeated, the value captured is the +substring that matched the final iteration. For example, after + +

+         (tweedle[dume]{3}\s*)+
+
+ +

has matched `tweedledum tweedledee' the value of the captured +substring is `tweedledee'. However, if there are nested capturing +subpatterns, the corresponding captured values may have been set in +previous iterations. For example, after + +

+         (a|(b))+
+
+ +

matches `aba' the value of the second captured substring is `b'. + +

Atomic Grouping and Possessive Quantifiers
+ +

With both maximizing (greedy) and minimizing (ungreedy or +lazy) repetition, failure of what follows normally causes the +repeated item to be re-evaluated to see if a different number of +repeats allows the rest of the pattern to match. Sometimes it is +useful to prevent this, either to change the nature of the match, or +to cause it fail earlier than it otherwise might, when the author of +the pattern knows there is no point in carrying on. + +

Consider, for example, the pattern `\d+foo' when applied to the +subject line + +

+         123456bar
+
+ +

After matching all 6 digits and then failing to match `foo', the +normal action of the matcher is to try again with only 5 digits +matching the `\d+' item, and then with 4, and so on, before +ultimately failing. Atomic grouping (a term taken from Jeffrey +Friedl's book) provides the means for specifying that once a +subpattern has matched, it is not to be re-evaluated in this way. + +

If we use atomic grouping for the previous example, the matcher gives +up immediately on failing to match `foo' the first time. The +notation is a kind of special parenthesis, starting with `(?>' as in +this example: + +

+         (?>\d+)foo
+
+ +

This kind of parenthesis “locks up” the part of the pattern it +contains once it has matched, and a failure further into the pattern +is prevented from backtracking into it. Backtracking past it to +previous items, however, works as normal. Atomic grouping subpatterns +are not capturing subpatterns. + +

An alternative description is that a subpattern of this type matches +the string of characters that an identical standalone pattern would +match, if anchored at the current point in the subject string. + +

Simple cases such as the above example can be thought of as a +maximizing repeat that must swallow everything it can. So, while both +`\d+' and `\d+?' are prepared to adjust the number of digits +they match in order to make the rest of the pattern match, +`(?>\d+)' can only match an entire sequence of digits. + +

Atomic groups in general can of course contain arbitrarily complicated +subpatterns, and can be nested. However, when the subpattern for an +atomic group is just a single repeated item, as in the example above, +a simpler notation, called a possessive quantifier, can be +used. This consists of an additional `+' character following a +quantifier. Using this notation, the previous example can be rewritten +as + +

+         \d++foo
+
+ +

Note that a possessive quantifier can be used with an entire group, for +example: + +

+         (abc|xyz){2,3}+
+
+ +

Possessive quantifiers are always greedy; the setting of the +`(?U)' option is ignored. They are a convenient notation for the +simpler forms of atomic group. However, there is no difference in the +meaning of a possessive quantifier and the equivalent atomic group, +though there may be a performance difference; possessive quantifiers +should be slightly faster. + +

The possessive quantifier syntax is an extension to the Perl 5.8 +syntax. Jeffrey Friedl originated the idea (and the name) in the +first edition of his book. Mike McCloskey liked it, so implemented it +when he built Sun's Java package, and PCRE copied it from there. It +ultimately found its way into Perl at release 5.10. + +

PCRE has an optimization that automatically “possessifies” certain +simple pattern constructs. For example, the sequence `A+B' is +treated as `A++B' because there is no point in backtracking into +a sequence of `A's when `B' must follow. + +

When a pattern contains an unlimited repeat inside a subpattern that +can itself be repeated an unlimited number of times, the use of an +atomic group is the only way to avoid some failing matches taking a +very long time indeed. The pattern + +

+         (\D+|<\d+>)*[!?]
+
+ +

matches an unlimited number of substrings that either consist of +non-digits, or digits enclosed in `<>', followed by either +`!' or `?'. When it matches, it runs quickly. However, if it +is applied to + +

+         aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
+
+ +

it takes a long time before reporting failure. This is because the +string can be divided between the internal `\D+' repeat and the +external `*' repeat in a large number of ways, and all have to be +tried. (The example uses `[!?]' rather than a single character at +the end, because both PCRE and Perl have an optimization that allows +for fast failure when a single character is used. They remember the +last single character that is required for a match, and fail early if +it is not present in the string.) If the pattern is changed so that it +uses an atomic group, like this: + +

+         ((?>\D+)|<\d+>)*[!?]
+
+ +

sequences of non-digits cannot be broken, and failure happens quickly. + +

Back References
+ +

Outside a character class, a backslash followed by a digit greater +than 0 (and possibly further digits) is a back reference to a +capturing subpattern earlier (that is, to its left) in the pattern, +provided there have been that many previous capturing left +parentheses. + +

However, if the decimal number following the backslash is less than +10, it is always taken as a back reference, and causes an error only +if there are not that many capturing left parentheses in the entire +pattern. In other words, the parentheses that are referenced need not +be to the left of the reference for numbers less than 10. A “forward +back reference” of this type can make sense when a repetition is +involved and the subpattern to the right has participated in an +earlier iteration. + +

It is not possible to have a numerical “forward back reference” to a +subpattern whose number is 10 or more using this syntax because a +sequence such as `\50' is interpreted as a character defined in +octal. See the subsection entitled “Non-printing characters” above +for further details of the handling of digits following a +backslash. There is no such problem when named parentheses are used. A +back reference to any subpattern is possible using named parentheses +(see below). + +

Another way of avoiding the ambiguity inherent in the use of digits +following a backslash is to use the `\g' escape sequence, which +is a feature introduced in Perl 5.10. This escape must be followed by +an unsigned number or a negative number, optionally enclosed in +braces. These examples are all identical: + +

+         (ring), \1
+         (ring), \g1
+         (ring), \g{1}
+
+ +

An unsigned number specifies an absolute reference without the +ambiguity that is present in the older syntax. It is also useful when +literal digits follow the reference. A negative number is a relative +reference. Consider this example: + +

+         (abc(def)ghi)\g{-1}
+
+ +

The sequence `\g{-1}' is a reference to the most recently +started capturing subpattern before `\g', that is, is it +equivalent to `\2'. Similarly, `\g{-2}' would be +equivalent to `\1'. The use of relative references can be helpful +in long patterns, and also in patterns that are created by joining +together fragments that contain references within themselves. + +

A back reference matches whatever actually matched the capturing +subpattern in the current subject string, rather than anything +matching the subpattern itself (see “Subpatterns as subroutines” below +for a way of doing that). So the pattern + +

+         (sens|respons)e and \1ibility
+
+ +

matches `sense and sensibility' and `response and +responsibility', but not `sense and responsibility'. If caseful +matching is in force at the time of the back reference, the case of +letters is relevant. For example, + +

+         ((?i)rah)\s+\1
+
+ +

matches `rah rah' and `RAH RAH', but not `RAH rah', +even though the original capturing subpattern is matched caselessly. + +

There are several different ways of writing back references to named +subpatterns. The .NET syntax `\k{name}' and the Perl syntax +`\k<name>' or `\k'name'' are supported, as is the Python +syntax (?P=name). Perl 5.10's unified back reference syntax, in which +`\g' can be used for both numeric and named references, is also +supported. We could rewrite the above example in any of the following +ways: + +

+         (?<p1>(?i)rah)\s+\k<p1>
+         (?'p1'(?i)rah)\s+\k{p1}
+         (?P<p1>(?i)rah)\s+(?P=p1)
+         (?<p1>(?i)rah)\s+\g{p1}
+
+ +

A subpattern that is referenced by name may appear in the pattern +before or after the reference. + +

There may be more than one back reference to the same subpattern. If a +subpattern has not actually been used in a particular match, any back +references to it always fail. For example, the pattern + +

+         (a|(bc))\2
+
+ +

always fails if it starts to match `a' rather than +`bc'. Because there may be many capturing parentheses in a +pattern, all digits following the backslash are taken as part of a +potential back reference number. If the pattern continues with a digit +character, some delimiter must be used to terminate the back +reference. If the `(?x)' option is set, this can be whitespace. +Otherwise an empty comment (see “Comments” below) can be used. + +

A back reference that occurs inside the parentheses to which it refers +fails when the subpattern is first used, so, for example, `(a\1)' +never matches. However, such references can be useful inside repeated +subpatterns. For example, the pattern + +

+         (a|b\1)+
+
+ +

matches any number of `a's and also `aba', `ababbaa' +etc. At each iteration of the subpattern, the back reference matches +the character string corresponding to the previous iteration. In order +for this to work, the pattern must be such that the first iteration +does not need to match the back reference. This can be done using +alternation, as in the example above, or by a quantifier with a +minimum of zero. + +

Assertions
+ +

An assertion is a test on the characters following or preceding the +current matching point that does not actually consume any +characters. The simple assertions coded as `\b', `\B', +`\A', `\G', `\Z', `\z', `^' and `$' are +described above. + +

More complicated assertions are coded as subpatterns. There are two +kinds: those that look ahead of the current position in the subject +string, and those that look behind it. An assertion subpattern is +matched in the normal way, except that it does not cause the current +matching position to be changed. + +

Assertion subpatterns are not capturing subpatterns, and may not be +repeated, because it makes no sense to assert the same thing several +times. If any kind of assertion contains capturing subpatterns within +it, these are counted for the purposes of numbering the capturing +subpatterns in the whole pattern. However, substring capturing is +carried out only for positive assertions, because it does not make +sense for negative assertions. + +

Lookahead Assertions
+ +

Lookahead assertions start with `(?=' for positive assertions and +`(?!' for negative assertions. For example, + +

+         \w+(?=;)
+
+ +

matches a word followed by a semicolon, but does not include the semicolon in +the match, and + +

+         foo(?!bar)
+
+ +

matches any occurrence of `foo' that is not followed by +`bar'. Note that the apparently similar pattern + +

+         (?!foo)bar
+
+ +

does not find an occurrence of `bar' that is preceded by +something other than `foo'; it finds any occurrence of `bar' +whatsoever, because the assertion `(?!foo)' is always true when +the next three characters are `bar'. A lookbehind assertion is +needed to achieve the other effect. + +

If you want to force a matching failure at some point in a pattern, +the most convenient way to do it is with `(?!)' because an empty +string always matches, so an assertion that requires there not to be +an empty string must always fail. + +

Lookbehind Assertions
+ +

Lookbehind assertions start with `(?<=' for positive assertions +and `(?<!' for negative assertions. For example, + +

+         (?<!foo)bar
+
+ +

matches an occurrence of `bar' that is not preceded by +`foo'. The contents of a lookbehind assertion are restricted such +that all the strings it matches must have a fixed length. However, if +there are several top-level alternatives, they do not all have to have +the same fixed length. Thus + +

+         (?<=bullock|donkey)
+
+ +

is permitted, but + +

+         (?<!dogs?|cats?)
+
+ +

causes an error at compile time. Branches that match different length +strings are permitted only at the top level of a lookbehind +assertion. This is an extension compared with Perl (at least for 5.8), +which requires all branches to match the same length of string. An +assertion such as + +

+         (?<=ab(c|de))
+
+ +

is not permitted, because its single top-level branch can match two different +lengths, but it is acceptable if rewritten to use two top-level branches: + +

+         (?<=abc|abde)
+
+ +

In some cases, the Perl 5.10 escape sequence `\K' (see above) can +be used instead of a lookbehind assertion; this is not restricted to a +fixed-length. + +

The implementation of lookbehind assertions is, for each alternative, +to temporarily move the current position back by the fixed length and +then try to match. If there are insufficient characters before the +current position, the assertion fails. + +

PCRE does not allow the `\C' escape (which matches a single byte +in UTF-8 mode) to appear in lookbehind assertions, because it makes it +impossible to calculate the length of the lookbehind. The `\X' +and `\R' escapes, which can match different numbers of bytes, are +also not permitted. + +

Possessive quantifiers can be used in conjunction with lookbehind +assertions to specify efficient matching at the end of the subject +string. Consider a simple pattern such as + +

+         abcd$
+
+ +

when applied to a long string that does not match. Because matching +proceeds from left to right, PCRE will look for each `a' in the +subject and then see if what follows matches the rest of the +pattern. If the pattern is specified as + +

+         ^.*abcd$
+
+ +

the initial `.*' matches the entire string at first, but when this fails +(because there is no following `a'), it backtracks to match all +but the last character, then all but the last two characters, and so +on. Once again the search for `a' covers the entire string, from +right to left, so we are no better off. However, if the pattern is +written as + +

+         ^.*+(?<=abcd)
+
+ +

there can be no backtracking for the `.*+' item; it can match +only the entire string. The subsequent lookbehind assertion does a +single test on the last four characters. If it fails, the match fails +immediately. For long strings, this approach makes a significant +difference to the processing time. + +

Using multiple assertions
+ +

Several assertions (of any sort) may occur in succession. For example, + +

+         (?<=\d{3})(?<!999)foo
+
+ +

matches `foo' preceded by three digits that are not +`999'. Notice that each of the assertions is applied +independently at the same point in the subject string. First there is +a check that the previous three characters are all digits, and then +there is a check that the same three characters are not `999'. +This pattern does not match `foo' preceded by six +characters, the first of which are digits and the last three of which +are not `999'. For example, it doesn't match `123abcfoo'. A +pattern to do that is + +

+         (?<=\d{3}...)(?<!999)foo
+
+ +

This time the first assertion looks at the preceding six characters, +checking that the first three are digits, and then the second +assertion checks that the preceding three characters are not +`999'. + +

Assertions can be nested in any combination. For example, + +

+         (?<=(?<!foo)bar)baz
+
+ +

matches an occurrence of `baz' that is preceded by `bar' +which in turn is not preceded by `foo', while + +

+         (?<=\d{3}(?!999)...)foo
+
+ +

is another pattern that matches `foo' preceded by three digits +and any three characters that are not `999'. + +

Conditional Subpatterns
+ +

It is possible to cause the matching process to obey a subpattern +conditionally or to choose between two alternative subpatterns, +depending on the result of an assertion, or whether a previous +capturing subpattern matched or not. The two possible forms of +conditional subpattern are + +

+ +

If the condition is satisfied, the yes-pattern is used; +otherwise the no-pattern (if present) is used. If there are more +than two alternatives in the subpattern, a compile-time error occurs. + +

There are four kinds of condition: references to subpatterns, +references to recursion, a pseudo-condition called `DEFINE', and +assertions. + +

Checking for a used subpattern by number
+ +

If the text between the parentheses consists of a sequence of digits, +the condition is true if the capturing subpattern of that number has +previously matched. An alternative notation is to precede the digits +with a plus or minus sign. In this case, the subpattern number is +relative rather than absolute. The most recently opened parentheses +can be referenced by `(?(-1)', the next most recent by +`(?(-2)', and so on. In looping constructs it can also make sense +to refer to subsequent groups with constructs such as `(?(+2)'. + +

Consider the following pattern, which contains non-significant white +space to make it more readable and to divide it into three parts for +ease of discussion (assume a preceding `(?x)'): + +

+         ( \( )?    [^()]+    (?(1) \) )
+
+ +

The first part matches an optional opening parenthesis, and if that +character is present, sets it as the first captured substring. The +second part matches one or more characters that are not +parentheses. The third part is a conditional subpattern that tests +whether the first set of parentheses matched or not. If they did, that +is, if subject started with an opening parenthesis, the condition is +true, and so the yes-pattern is executed and a closing parenthesis is +required. Otherwise, since no-pattern is not present, the subpattern +matches nothing. In other words, this pattern matches a sequence of +non-parentheses, optionally enclosed in parentheses. + +

If you were embedding this pattern in a larger one, you could use a +relative reference: + +

+         ...other stuff... ( \( )?    [^()]+    (?(-1) \) ) ...
+
+ +

This makes the fragment independent of the parentheses in the larger pattern. + +

Checking for a used subpattern by name
+ +

Perl uses the syntax `(?(<name>)...)' or `(?('name')...)' to +test for a used subpattern by name. For compatibility with earlier +versions of PCRE, which had this facility before Perl, the syntax +`(?(name)...)' is also recognized. However, there is a possible +ambiguity with this syntax, because subpattern names may consist +entirely of digits. PCRE looks first for a named subpattern; if it +cannot find one and the name consists entirely of digits, PCRE looks +for a subpattern of that number, which must be greater than +zero. Using subpattern names that consist entirely of digits is not +recommended. + +

Rewriting the above example to use a named subpattern gives this: + +

+         (?<OPEN> \( )?    [^()]+    (?(<OPEN>) \) )
+
+ +
Checking for pattern recursion
+ +

If the condition is the string `(R)', and there is no subpattern +with the name `R', the condition is true if a recursive call to +the whole pattern or any subpattern has been made. If digits or a name +preceded by ampersand follow the letter `R', for example: + +

+         (?(R3)...) or (?(R&name)...)
+
+ +

the condition is true if the most recent recursion is into the +subpattern whose number or name is given. This condition does not +check the entire recursion stack. + +

At “top level,” all these recursion test conditions are false. Recursive +patterns are described below. + +

Defining subpatterns for use by reference only
+ +

If the condition is the string `(DEFINE)', and there is no +subpattern with the name `DEFINE', the condition is always +false. In this case, there may be only one alternative in the +subpattern. It is always skipped if control reaches this point in the +pattern; the idea of DEFINE is that it can be used to define +subroutines that can be referenced from elsewhere. (The use of +subroutines is described below.) For example, a pattern to match an +IPv4 address could be written like this (ignore whitespace and line +breaks): + +

+         (?(DEFINE) (?<byte> 2[0-4]\d | 25[0-5] | 1\d\d | [1-9]?\d) )
+         \b (?&byte) (\.(?&byte)){3} \b
+
+ +

The first part of the pattern is a DEFINE group inside which a another +group named "byte" is defined. This matches an individual component of +an IPv4 address (a number less than 256). When matching takes place, +this part of the pattern is skipped because DEFINE acts like a false +condition. + +

The rest of the pattern uses references to the named group to match +the four dot-separated components of an IPv4 address, insisting on a +word boundary at each end. + +

Assertion conditions
+ +

If the condition is not in any of the above formats, it must be an +assertion. This may be a positive or negative lookahead or lookbehind +assertion. Consider this pattern, again containing non-significant +white space, and with the two alternatives on the second line: + +

+         (?(?=[^a-z]*[a-z])
+         \d{2}-[a-z]{3}-\d{2}  |  \d{2}-\d{2}-\d{2} )
+
+ +

The condition is a positive lookahead assertion that matches an +optional sequence of non-letters followed by a letter. In other words, +it tests for the presence of at least one letter in the subject. If a +letter is found, the subject is matched against the first alternative; +otherwise it is matched against the second. This pattern matches +strings in one of the two forms `dd-aaa-dd' or +`dd-dd-dd', where aaa are letters and +dd are digits. + +

Comments
+ +

The sequence `(?#' marks the start of a comment that continues up +to the next closing parenthesis. Nested parentheses are not +permitted. The characters that make up a comment play no part in the +pattern matching at all. + +

If the `(?x)' option is set, an unescaped `#' character +outside a character class introduces a comment that continues to +immediately after the next newline in the pattern. + +

Recursive Patterns
+ +

Consider the problem of matching a string in parentheses, allowing for +unlimited nested parentheses. Without the use of recursion, the best +that can be done is to use a pattern that matches up to some fixed +depth of nesting. It is not possible to handle an arbitrary nesting +depth. + +

PCRE supports special syntax for recursion of the entire pattern, and +also for individual subpattern recursion. After its introduction in +PCRE and Python, this kind of recursion was introduced into Perl at +release 5.10. + +

A special item that consists of `(?' followed by a number greater +than zero and a closing parenthesis is a recursive call of the +subpattern of the given number, provided that it occurs inside that +subpattern. (If not, it is a subroutine call, which is described in +the next section.) The special item `(?R)' or `(?0)' is a +recursive call of the entire regular expression. + +

In PCRE (like Python, but unlike Perl), a recursive subpattern call is +always treated as an atomic group. That is, once it has matched some +of the subject string, it is never re-entered, even if it contains +untried alternatives and there is a subsequent matching failure. + +

This PCRE pattern solves the nested parentheses problem (whitespace is +insignificant): + +

+         \( ( (?>[^()]+) | (?R) )* \)
+
+ +

First it matches an opening parenthesis. Then it matches any number of +substrings which can either be a sequence of non-parentheses, or a +recursive match of the pattern itself (that is, a correctly +parenthesized substring). Finally there is a closing parenthesis. + +

If this were part of a larger pattern, you would not want to recurse +the entire pattern, so instead you could use this: + +

+         ( \( ( (?>[^()]+) | (?1) )* \) )
+
+ +

We have put the pattern into parentheses, and caused the recursion to +refer to them instead of the whole pattern. + +

In a larger pattern, keeping track of parenthesis numbers can be +tricky. This is made easier by the use of relative references. (A Perl +5.10 feature.) Instead of `(?1)' in the pattern above you can +write `(?-2)' to refer to the second most recently opened +parentheses preceding the recursion. In other words, a negative number +counts capturing parentheses leftwards from the point at which it is +encountered. + +

It is also possible to refer to subsequently opened parentheses, by +writing references such as `(?+2)'. However, these cannot be +recursive because the reference is not inside the parentheses that are +referenced. They are always subroutine calls, as described in the next +section. + +

An alternative approach is to use named parentheses instead. The Perl +syntax for this is `(?&name)'; PCRE's earlier syntax +`(?P>name)' is also supported. We could rewrite the above example +as follows: + +

+         (?<pn> \( ( (?>[^()]+) | (?&pn) )* \) )
+
+ +

If there is more than one subpattern with the same name, the earliest +one is used. + +

This particular example pattern that we have been looking at contains +nested unlimited repeats, and so the use of atomic grouping for +matching strings of non-parentheses is important when applying the +pattern to strings that do not match. For example, when this pattern +is applied to + +

+         (aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa()
+
+ +

it fails quickly. However, if atomic grouping is not used, the match +runs for a very long time indeed because there are so many different +ways the `+' and `*' repeats can carve up the subject, and +all have to be tested before failure can be reported. + +

At the end of a match, the values set for any capturing subpatterns +are those from the outermost level of the recursion at which the +subpattern value is set. If the pattern above is matched against + +

+         (ab(cd)ef)
+
+ +

the value for the capturing parentheses is `ef', which is the +last value taken on at the top level. If additional parentheses are +added, giving + +

+         \( ( ( (?>[^()]+) | (?R) )* ) \)
+            ^                        ^
address@hidden example
+
address@hidden
+the string they capture is @samp{ab(cd)ef}, the contents of the top
+level parentheses.
+
+Do not confuse the @samp{(?R)} item with the condition @samp{(?(R)},
+which tests for recursion.  Consider this pattern, which matches text
+in angle brackets, allowing for arbitrary nesting. Only digits are
+allowed in nested brackets (that is, when recursing), whereas any
+characters are permitted at the outer level.
+
address@hidden
+         < (?: (?(R) \d++  | [^<>]*+) | (?R)) * >
+
+ +

In this pattern, `(?(R)' is the start of a conditional +subpattern, with two different alternatives for the recursive and +non-recursive cases. The `(?R)' item is the actual recursive +call. + +

Subpatterns as Subroutines
+ +

If the syntax for a recursive subpattern reference (either by number +or by name) is used outside the parentheses to which it refers, it +operates like a subroutine in a programming language. The called +subpattern may be defined before or after the reference. A numbered +reference can be absolute or relative, as in these examples: + +

+         (...(absolute)...)...(?2)...
+         (...(relative)...)...(?-1)...
+         (...(?+1)...(relative)...
+
+ +

An earlier example pointed out that the pattern + +

+         (sens|respons)e and \1ibility
+
+ +

matches `sense and sensibility' and `response and +responsibility', but not `sense and responsibility'. If instead +the pattern + +

+         (sens|respons)e and (?1)ibility
+
+ +

is used, it does match `sense and responsibility' as well as the +other two strings. Another example is given in the discussion of +DEFINE above. + +

Like recursive subpatterns, a subroutine call is always treated as an +atomic group. That is, once it has matched some of the subject string, +it is never re-entered, even if it contains untried alternatives and +there is a subsequent matching failure. + +

When a subpattern is used as a subroutine, processing options such as +case-independence are fixed when the subpattern is defined. They +cannot be changed for different calls. For example, consider this +pattern: + +

+         (abc)(?i:(?-1))
+
+ +

It matches `abcabc'. It does not match `abcABC' because the +change of processing option does not affect the called subpattern. + +

Backtracking Control
+ +

Perl 5.10 introduced a number of special backtracking control +verbs, which are described in the Perl documentation as +“experimental and subject to change or removal in a future version of +Perl.” It goes on to say: “Their usage in production code should be +noted to avoid problems during upgrades.” The same remarks apply to +the PCRE features described in this section. + +

The new verbs make use of what was previously invalid syntax: an +opening parenthesis followed by an asterisk. In Perl, they are +generally of the form `(*VERB:ARG)' but PCRE does not support the +use of arguments, so its general form is just `(*VERB)'. Any +number of these verbs may occur in a pattern. There are two kinds: + +

Verbs that act immediately
+ +

The following verbs act as soon as they are encountered: + +

+
(*ACCEPT)
+This verb causes the match to end successfully, skipping the remainder +of the pattern. When inside a recursion, only the innermost pattern is +ended immediately. PCRE differs from Perl in what happens if the +`(*ACCEPT)' is inside capturing parentheses. In Perl, the data so +far is captured: in PCRE no data is captured. For example: + +
     
+              A(A|B(*ACCEPT)|C)D
+
+ +

This matches `AB', `AAD', or `ACD', but when it matches +`AB', no data is captured. + +

(*FAIL) or (*F)
+This verb causes the match to fail, forcing backtracking to occur. It +is equivalent to `(?!)' but easier to read. It is not clear +whether there is any use for this without the ability to execute code +in the middle of the pattern (which Perl has but PCRE in Monotone does +not). +
+ +
Verbs that act after backtracking
+ +

The following verbs do nothing when they are encountered. Matching +continues with what follows, but if there is no subsequent match, a +failure is forced. The verbs differ in exactly what kind of failure +occurs. + +

+
(*COMMIT)
+This verb causes the whole match to fail outright if the rest of the +pattern does not match. Even if the pattern is unanchored, no further +attempts to find a match by advancing the start point take place. Once +(*COMMIT) has been passed, the regular expression engine is +committed to finding a match at the current starting point, or not at +all. For example: + +
     
+              a+(*COMMIT)b
+
+ +

This matches `xxaab' but not `aacaab'. It can be thought of +as a kind of dynamic anchor, or “I've started, so I must finish.” + +

(*PRUNE)
+This verb causes the match to fail at the current position if the rest +of the pattern does not match. If the pattern is unanchored, the +normal “bump-along” advance to the next starting character then +happens. Backtracking can occur as usual to the left of +(*PRUNE), or when matching to the right of (*PRUNE), but +if there is no match to the right, backtracking cannot cross +(*PRUNE). In simple cases, the use of (*PRUNE) is just +an alternative to an atomic group or possessive quantifier, but there +are some uses of (*PRUNE) that cannot be expressed in any other +way. + +
(*SKIP)
+This verb is like (*PRUNE), except that if the pattern is +unanchored, the "bumpalong" advance is not to the next character, but +to the position in the subject where (*SKIP) was +encountered. (*SKIP) signifies that whatever text was matched +leading up to it cannot be part of a successful match. Consider: + +
     
+              a+(*SKIP)b
+
+ +

If the subject is `aaaac...', after the first match attempt fails +(starting at the first character in the string), the starting point +skips on to start the next attempt at `c'. Note that a possessive +quantifer does not have the same effect in this example; although it +would suppress backtracking during the first match attempt, the second +attempt would start at the second character instead of skipping on to +`c'. + +

(*THEN)
+This verb causes a skip to the next alternation if the rest of the +pattern does not match. That is, it cancels pending backtracking, but +only within the current alternation. Its name comes from the +observation that it can be used for a pattern-based if-then-else +block: + +
     
+              ( COND1 (*THEN) FOO 
+              | COND2 (*THEN) BAR 
+              | COND3 (*THEN) BAZ ) ...
+
+ +

If the `COND1' pattern matches, `FOO' is tried (and possibly +further items after the end of the group if `FOO' succeeds); on +failure the matcher skips to the second alternative and tries +`COND2', without backtracking into COND1. If (*THEN) is used +outside of any alternation, it acts exactly like (*PRUNE). +

+ + + ============================================================ --- docs/Regexp-Summary.html 3bd9b164c47cfebf0b14099e9ca05f3be4d2cb11 +++ docs/Regexp-Summary.html 3bd9b164c47cfebf0b14099e9ca05f3be4d2cb11 @@ -0,0 +1,438 @@ + + +Regexp Summary - monotone documentation + + + + + + + + + + + +
+

+ +Next: , +Up: Regexps +


+
+ +

7.5.1 Regexp Syntax Summary

+ +

This is a quick-reference summary of the regular expression syntax +used in Monotone. + +

Quoting
+ +
+
\x
where x is non-alphanumeric is a literal x +
\Q...\E
treat enclosed characters as literal +
+ +
Characters
+ +
+
\a
alarm, that is, the BEL character (hex 07) +
\cx
“control-x”, where x is any character +
\e
escape (hex 1B) +
\f
formfeed (hex 0C) +
\n
newline (hex 0A) +
\r
carriage return (hex 0D) +
\t
tab (hex 09) +
\ddd
character with octal code ddd, or backreference +
\xhh
character with hex code hh +
\x{hhh...}
character with hex code hhh... +
+ +
Character Types
+ +
+
.
any character except newline; + in dotall mode, any character whatsoever +
\C
one byte, even in UTF-8 mode (best avoided) +
\d
a decimal digit +
\D
a character that is not a decimal digit +
\h
a horizontal whitespace character +
\H
a character that is not a horizontal whitespace character +
\p{xx}
a character with the xx property +
\P{xx}
a character without the xx property +
\R
a newline sequence +
\s
a whitespace character +
\S
a character that is not a whitespace character +
\v
a vertical whitespace character +
\V
a character that is not a vertical whitespace character +
\w
a “word” character +
\W
a “non-word” character +
\X
an extended Unicode sequence +
+ +

`\d', `\D', `\s', `\S', `\w', and `\W' +recognize only ASCII characters. + +

General category property codes for `\p' and `\P'
+ +
+
C
Other +
Cc
Control +
Cf
Format +
Cn
Unassigned +
Co
Private use +
Cs
Surrogate + +
L
Letter +
Ll
Lower case letter +
Lm
Modifier letter +
Lo
Other letter +
Lt
Title case letter +
Lu
Upper case letter +
L&
Ll, Lu, or Lt + +
M
Mark +
Mc
Spacing mark +
Me
Enclosing mark +
Mn
Non-spacing mark + +
N
Number +
Nd
Decimal number +
Nl
Letter number +
No
Other number + +
P
Punctuation +
Pc
Connector punctuation +
Pd
Dash punctuation +
Pe
Close punctuation +
Pf
Final punctuation +
Pi
Initial punctuation +
Po
Other punctuation +
Ps
Open punctuation + +
S
Symbol +
Sc
Currency symbol +
Sk
Modifier symbol +
Sm
Mathematical symbol +
So
Other symbol + +
Z
Separator +
Zl
Line separator +
Zp
Paragraph separator +
Zs
Space separator +
+ +
Script names for `\p' and `\P'
+ +

Arabic, +Armenian, +Balinese, +Bengali, +Bopomofo, +Braille, +Buginese, +Buhid, +Canadian_Aboriginal, +Cherokee, +Common, +Coptic, +Cuneiform, +Cypriot, +Cyrillic, +Deseret, +Devanagari, +Ethiopic, +Georgian, +Glagolitic, +Gothic, +Greek, +Gujarati, +Gurmukhi, +Han, +Hangul, +Hanunoo, +Hebrew, +Hiragana, +Inherited, +Kannada, +Katakana, +Kharoshthi, +Khmer, +Lao, +Latin, +Limbu, +Linear_B, +Malayalam, +Mongolian, +Myanmar, +New_Tai_Lue, +Nko, +Ogham, +Old_Italic, +Old_Persian, +Oriya, +Osmanya, +Phags_Pa, +Phoenician, +Runic, +Shavian, +Sinhala, +Syloti_Nagri, +Syriac, +Tagalog, +Tagbanwa, +Tai_Le, +Tamil, +Telugu, +Thaana, +Thai, +Tibetan, +Tifinagh, +Ugaritic, +Yi. + +

Character Classes
+ +
+
[...]
positive character class +
[^...]
negative character class +
[x-y]
range (can be used for hex characters) +
[[:xxx:]]
positive POSIX named set +
[[^:xxx:]]
negative POSIX named set + +
alnum
alphanumeric +
alpha
alphabetic +
ascii
0-127 +
blank
space or tab +
cntrl
control character +
digit
decimal digit +
graph
printing, excluding space +
lower
lower case letter +
print
printing, including space +
punct
printing, excluding alphanumeric +
space
whitespace +
upper
upper case letter +
word
same as `\w' +
xdigit
hexadecimal digit +
+ +

In PCRE, POSIX character set names recognize only ASCII +characters. You can use `\Q...\E' inside a character class. + +

Quantifiers
+ +
+
?
0 or 1, greedy +
?+
0 or 1, possessive +
??
0 or 1, lazy +
*
0 or more, greedy +
*+
0 or more, possessive +
*?
0 or more, lazy +
+
1 or more, greedy +
++
1 or more, possessive +
+?
1 or more, lazy +
{n}
exactly n +
{n,m}
at least n, no more than m, greedy +
{n,m}+
at least n, no more than m, possessive +
{n,m}?
at least n, no more than m, lazy +
{n,}
n or more, greedy +
{n,}+
n or more, possessive +
{n,}?
n or more, lazy +
+ +
Anchors and Simple Assertions
+ +
+
\b
word boundary +
\B
not a word boundary +
^
start of subject + also after internal newline in multiline mode +
\A
start of subject +
$
end of subject + also before newline at end of subject + also before internal newline in multiline mode +
\Z
end of subject + also before newline at end of subject +
\z
end of subject +
\G
first matching position in subject +
+ +
Match Point Reset
+ +
+
\K
reset start of match +
+ +
Alternation
+ +
+
expr|expr|expr...
+ +
Capturing
+ +
+
(...)
capturing group +
(?<name>...)
named capturing group (like Perl) +
(?'name'...)
named capturing group (like Perl) +
(?P<name>...)
named capturing group (like Python) +
(?:...)
non-capturing group +
(?|...)
non-capturing group; reset group numbers for + capturing groups in each alternative +
+ +
Atomic Groups
+ +
+
(?>...)
atomic, non-capturing group +
+ +
Comment
+ +
+
(?#....)
comment (not nestable) +
+ +
Option Setting
+ +
+
(?i)
caseless +
(?J)
allow duplicate names +
(?m)
multiline +
(?s)
single line (dotall) +
(?U)
default ungreedy (lazy) +
(?x)
extended (ignore white space) +
(?-...)
unset option(s) +
+ +
Lookahead and Lookbehind Assertions
+ +
+
(?=...)
positive look ahead +
(?!...)
negative look ahead +
(?<=...)
positive look behind +
(?<!...)
negative look behind +
+ +

Each top-level branch of a look behind must be of a fixed length. + +

Backreferences
+ +
+
\n
reference by number (can be ambiguous) +
\gn
reference by number +
\g{n}
reference by number +
\g{-n}
relative reference by number +
\k<name>
reference by name (like Perl) +
\k'name'
reference by name (like Perl) +
\g{name}
reference by name (like Perl) +
\k{name}
reference by name (like .NET) +
(?P=name)
reference by name (like Python) +
+ +
Subroutine References (possibly recursive)
+ +
+
(?R)
recurse whole pattern +
(?n)
call subpattern by absolute number +
(?+n)
call subpattern by relative number +
(?-n)
call subpattern by relative number +
(?&name)
call subpattern by name (like Perl) +
(?P>name)
call subpattern by name (like Python) +
+ +
Conditional Patterns
+ +
+
(?(condition)yes-pattern)
(?(condition)yes-pattern|no-pattern) +
(?(n)...
absolute reference condition +
(?(+n)...
relative reference condition +
(?(-n)...
relative reference condition +
(?(<name>)...
named reference condition (like Perl) +
(?('name')...
named reference condition (like Perl) +
(?(name)...
named reference condition (PCRE only) +
(?(R)...
overall recursion condition +
(?(Rn)...
specific group recursion condition +
(?(R&name)...
specific recursion condition +
(?(DEFINE)...
define subpattern for reference +
(?(assert)...
assertion condition +
+ +
Backtracking Control
+ +

The following act immediately they are reached: + +

+
(*ACCEPT)
force successful match +
(*FAIL)
force backtrack; synonym `(*F)' +
+ +

The following act only when a subsequent match failure causes a backtrack to +reach them. They all force a match failure, but they differ in what happens +afterwards. Those that advance the start-of-match point do so only if the +pattern is not anchored. + +

+
(*COMMIT)
overall failure, no advance of starting point +
(*PRUNE)
advance to next starting character +
(*SKIP)
advance start to current matching position +
(*THEN)
local failure, backtrack to next alternation +
+ +
Newline Conventions
+ +

These are recognized only at the very start of the pattern or after a +`(*BSR_...)' option. + +

+
(*CR)
(*LF)
(*CRLF)
(*ANYCRLF)
(*ANY)
+ +
What `\R' Matches
+ +

These are recognized only at the very start of the pattern or after a +`(*...)' option that sets the newline convention. + +

+
(*BSR_ANYCRLF)
(*BSR_UNICODE)
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ============================================================ --- docs/Regexps.html 815bd739ce0d1f2d8d11a5e170d0802c305147e5 +++ docs/Regexps.html 815bd739ce0d1f2d8d11a5e170d0802c305147e5 @@ -0,0 +1,79 @@ + + +Regexps - monotone documentation + + + + + + + + + + + +
+

+ +Previous: Mark-Merge, +Up: Special Topics +


+
+ +

7.5 Regular Expression Syntax

+ +

Monotone expects user-provided regular expressions in +.mtn-ignore files and as the result of the +get_encloser_pattern Lua hook (for the --show-encloser +option to diff). User-written Lua hooks may also use the +function regex.search as they see fit. All these regular +expressions should be written with the same syntax, which is that +expected by the Perl-Compatible Regular Expression library (PCRE). + +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + ============================================================ --- INSTALL 83edc001e560d3122afd0e58dfe34b5682c7ee57 +++ INSTALL 240d5caa067b94a749936660cba6302378484413 @@ -16,7 +16,9 @@ 1. prerequisites: - automake. - gettext. - a supported C++ compiler: g++ 3.2 or later. - - an installed copy of boost 1.33.0 or later. + - boost 1.33.0 or later: either an installed copy or an extracted + tarball of its unbuilt sources somewhere in the file system are + supported. - zlib 1.1.4 or later. - libiconv if the iconv() function is missing. @@ -25,19 +27,18 @@ 1. prerequisites: apt-get install autoconf apt-get install automake apt-get install gettext - apt-get install libboost-regex-dev apt-get install libboost-dev apt-get install libz-dev apt-get install g++ on fedora: - apt-get install autoconf - apt-get install automake - apt-get install gettext - apt-get install boost-devel - apt-get install libz-devel - apt-get install g++ + yum install autoconf + yum install automake + yum install gettext + yum install boost-devel + yum install zlib-devel + yum install gcc-c++ on Windows (incomplete): @@ -49,46 +50,54 @@ 1. prerequisites: build some of these from source. if your package repository does not contain the libraries, see: - http://gcc.gnu.org for g++ - http://www.boost.org for boost + http://gcc.gnu.org/ for g++ + http://www.boost.org/ for boost -1.1 building boost: +1.1 using boost in the build process: - many people have reported difficulty building boost. the main - problem is that boost builds with an unorthodox build tool called - "bjam" which must, itself, be built or installed before boost can be - built. the bjam sources are contained within the boost distribution, - but somewhat hidden. there are instructions on - http://www.boost.org/more/getting_started.html, but we have - assembled this abbreviated bourne shell sequence for advanced users - who do not need all the preamble: + monotone uses the boost libraries in multiple parts of its code. + fortunately, it only uses the so-called header-only libraries: these + can be used very easily from other projects, as there is no need to + build them by hand prior usage. - wget http://internap.dl.sourceforge.net/sourceforge/boost/boost_1_33_1.tar.gz - tar -xzf boost_1_33_1.tar.gz - cd boost_1_33_1 - (cd tools/build/jam_src && ./build.sh) - BJAM=`find tools/build/jam_src/ -name bjam -a -type f` - $BJAM "-sBUILD=release single speed static" - for i in `find bin -type d -a -name \*.a`; - do for j in `find $i -type f -a -name \*.a`; - do mv $j libs/`basename $i`; - done; - done - ranlib libs/*.a + therefore you can use an installed version of boost if shipped with your + distribution but, if you do not want to mess with the Boost.Build build + system (which is hard to deal with for beginners), you can simply use an + extracted copy of the boost sources. the two procedures are detailed + below: - if this completes successfully, you will have a selection of boost - libraries in boost_1_33_1/libs and boost headers in - boost_1_33_1/boost. you can then either copy the .a files to your - standard library path and the directory "boost_1_33_1/boost" to your - standard include path, or you can pass additional configuration - options to your monotone configure build, such as: + * if your system already has the boost development libraries installed, + you must tell the compiler where to find them. their location will + usually be somewhere under /usr/include. try the following command: - ./configure CPPFLAGS="-Iboost_1_33_1" LDFLAGS="-Lboost_1_33_1/libs" + ls -d /usr/include/boost* - monotone does not use all of boost -- for instance, people often - have trouble building boost.python, which we do not use. you don't - need to build any libraries that we don't use! + if the command shows a single directory named 'boost', you do not have + to take any extra steps. configure will automatically find the + necessary files. instead, if the command shows a directory name of the + form boost_1_33_1, boost-1.33.1 or similar, you will have to pass that + to the configure script. do so as follows: + ./configure CPPFLAGS="-I/usr/include/boost-1.33.1" + + if no directories are shown, look for prebuilt boost packages for your + system and install them. if there aren't any, resort to the procedure + described in the following point. + + * if you do not have boost already installed, and you cannot easily + install it from prebuilt packages, fetch a copy of the boost sources + from their site (see previous section) and unpack them somewhere in + your system -- for example, your home directory. once done, tell the + configure script where the files are: + + ./configure CPPFLAGS="-I${HOME}/boost-1.33.1" + + it is important to note that, once monotone is built, you can get rid of + all the boost sources or boost development packages from your system. + the required header-only libraries will have been built into the final + binary, which will not rely on any binary boost library. in some sense, + you can think of it as static linkage. + 2. configuring monotone: * if there is no ./configure script in your source tree you'll need @@ -111,19 +120,6 @@ 2. configuring monotone: results. however, you can force IPv6 detection by saying 'yes' or completely disable it using 'no'. - --enable-static-boost[=prefix] - - this will attempt to link a "mostly static" version of monotone - using the .a files supplied with your installation of - boost. the resulting binary will be larger but more portable - than a normal (dynamic) link. - - you can optionally pass a prefix to the option, which will be - used to look for the static libraries; otherwise a list of - predefined directories will be used. for example: - - ./configure --enable-static-boost=/usr/local/boost - --disable-nls build a version of monotone without support for local message @@ -135,20 +131,6 @@ 2. configuring monotone: this will disable large file support from the builtin sqlite, to achieve maximum binary compatibility with old systems. - BOOST_SUFFIX=string - - this variable, to be set on configure's command line, can be used - to specify a special string suffix to be appended to boost library - names. many Linux distributions provide symlinks to hide this - suffix, but others do not. therefore, you need to pass it to the - configure script through this variable for correct detection of - boost. for example: - - ./configure BOOST_SUFFIX=-gcc - - note that, sometimes, the configure script will be able to guess - the correct suffix by itself. - --enable-pch this will enable precompiled headers, which should improve compile ============================================================ --- NEWS 0dee337ea44af422a00dfacf962d088d5a5cb0c0 +++ NEWS 7d6bb127f7ecd2dc50c01b9ecbf5a4d85841b0c2 @@ -1,3 +1,100 @@ +Fri Oct 25 22:35:33 UTC 2007 + + 0.37 release. + + Changes + + - mtn db kill_rev_locally now checks for an existing workspace + before the revision is killed and tries to apply the changes + of this particular revision back to the workspace to allow + easy re-committing afterwards + + - the "--brief" switch for mtn annotate has been renamed to + "--revs-only" for clarity + + - mtn help now lists the commands (and their aliases) available + within a group, so its easier to get an overview which commands + are available at all + + - the "MTN_MERGE=diffutils" merger (provided by std_hooks.lua) + was improved. It now accepts a MTN_MERGE_DIFFUTILS environment + variable which can be used to control its behaviour + through comma-separated "key[=value]" entries. Currently + supported entries are "partial" for doing a partial + batch/non-modal 3-way merge conflict "resolution" which uses + embedded content conflict markers and "diff3opts=[...]" and + "sdiffopts=[...]" for passing arbitrary options to the used + "diff3" and "sdiff" tools. When used in combination with "mtn + merge_into_workspace" this way one especially can achieve a + CVS/SVN style non-modal workspace-based merging. + + - There is a new revision selector: "p:REV" selects the + parent(s) of revision REV. For example, if a revision has + one parent, + + mtn diff -r p:REV -r REV + + will show the changes made in that revision. + + - Monotone now uses the Perl-Compatible Regular Expression + (PCRE) library for all regular expressions, instead of the + boost::regex library. This means that external Boost + libraries are no longer required to build or use Monotone. + If building from source, you will still need the Boost headers + available somewhere. See INSTALL for details. + + PCRE's syntax for regular expressions is a superset of + boost::regex's syntax; it is unlikely that any existing + .mtn-ignore files or other user uses of regexps will break. + The manual now contains detailed documentation of the regexp + syntax, borrowed from PCRE itself. + + - the format of "mtn automate inventory" has changed to basic_io. + This fixes a couple of corner cases where the old format + returned wrong information and introduces new capabilities like + restricted output, recognized attribute changes, and more. + For a complete overview on the new format, please take a look + in the appropriate manual section. + + Bugs fixed + + - mtn automate heads called without a branch argument now properly + returns the head revisions of the workspace's branch if called + over mtn automate stdio + + - mtn commit no longer crashes if it creates a revision whose + roster already exists, i.e. was left behind by the command + `mtn db kill_rev_locally REV` (savannah #18990) + + Documentation changes + + - the documentation of the "--revs-only" (formerly "--brief") + switch for the annotate command didn't match its actual + behavior, this has been fixed + + - documentation for the "ssh_agent_add" command was missing + and has been added + + Other + + - contrib/usher.cc has been removed. Please use the + net.venge.monotone.contrib.usher branch instead. + + Internal + + - Update SQLite to 3.4.1. + + - Update Lua to 5.1.2 plus latest bug fixes. + + - Update Botan to 1.5.10. + + - Internal use of regular expressions has been almost eliminated. + (Regular expressions are still used for .mtn-ignore and the + --show-encloser feature of mtn diff, and are still available to + Lua hooks.) + + + Fri Aug 3 06:08:36 UTC 2007 0.36 release. @@ -260,11 +357,11 @@ Wed Feb 28 22:02:43 UTC 2007 - update will detect directories with unversioned files before attempting to drop them and will refuse to run rather than - corrupting the workspace. such unversioned files must be + corrupting the workspace. such unversioned files must be removed manually. - - the character set and line separator conversion hooks - (get_system_linesep, get_charset_conv and get_linesep_conv) + - the character set and line separator conversion hooks + (get_system_linesep, get_charset_conv and get_linesep_conv) have been removed. Similar functionality (probably based on file type attributes) is planned and will be added in a future release. @@ -338,7 +435,7 @@ Wed Dec 27 09:57:48 UTC 2006 - "mtn serve" no longer takes patterns on the command line. Use the permissions hooks instead. - - the name of the option that denoted the revision from which + - the name of the option that denoted the revision from which "mtn log" should start logging was renamed from "--revision" to "--from" @@ -374,7 +471,7 @@ Wed Dec 27 09:57:48 UTC 2006 - "mtn automate content_diff" - - "mtn automate get_file_of" (same as get_file, but expects + - "mtn automate get_file_of" (same as get_file, but expects a file path and optionally a revision) - "mtn import" command @@ -521,7 +618,7 @@ Sun Sep 17 12:27:08 PDT 2006 faster, but also 'mtn commit', 'mtn update', and other commands, which were spending most of their time in this code. - + - The format used in the database to store the roster cache was rewritten. This makes initial pull approximately twice as fast, and somewhat improves the speed of restricted log, @@ -560,7 +657,7 @@ Sun Sep 17 12:27:08 PDT 2006 - The output of 'mtn annotate' and 'mtn annotate --brief' has been switched. The more human-readable output is now the - default. + default. - 'mtn pluck' now gives an error message if the requested operation would have no effect. @@ -671,7 +768,7 @@ Sun Aug 20 15:58:08 PDT 2006 - If, during an update, two files both had conflicts, which, when resolved, resulting the two files becoming identical, the - update would error out. This has been fixed. + update would error out. This has been fixed. - If _MTN/log exists and does not end in a newline, we now add a newline before using the log message. This removes a problem @@ -797,7 +894,7 @@ Sat Jun 17 14:43:12 PDT 2006 for more details. Minor new features: - + - Selectors now support escaping, e.g., b:foo\/bar can be used to refer to a branch with name "foo/bar" (normally / is a metacharacter that separates multiple selectors). @@ -814,8 +911,8 @@ Sat Jun 17 14:43:12 PDT 2006 - Bug in select() loop fixed, server should no longer pause in processing other clients while busy with one, but multiplex fairly. - - The database has a new write buffer which gives significant - speed improvements in initial pulls by cancelling redundant + - The database has a new write buffer which gives significant + speed improvements in initial pulls by cancelling redundant database writes. - There's been a fair bit of performance tuning all around. @@ -871,7 +968,7 @@ Sat Apr 8 19:33:35 PDT 2006 individually. Major changes since 0.25: - + - The most user-visible change is that the default name of the monotone binary has changed to 'mtn'. So, for example, you would now run 'mtn checkout', 'mtn diff', 'mtn commit', @@ -904,7 +1001,7 @@ Sat Apr 8 19:33:35 PDT 2006 It's mostly useful to know that when someone says something about 'roster-enabled monotone' or the like, they're referring to this body of new code. - + This change has a number of consequences: - The textual format for revisions and manifests changed. There is no conceptual change, they still contain the same @@ -1023,7 +1120,7 @@ Sat Apr 8 19:33:35 PDT 2006 - Translations updated, and 3 new translations added (de, it, sv). - + Reliability considerations: - This new codebase has received much less testing under real @@ -1056,7 +1153,7 @@ Wed Mar 29 05:20:10 PST 2006 chance! Major changes since 0.26pre2: - + - The name of the monotone binary has changed to 'mtn'. - Similarly, the name of the bookkeeping directory in workspaces has changed from 'MT' to '_MTN' (if you have an @@ -1117,7 +1214,7 @@ Wed Mar 29 05:20:10 PST 2006 - Monotone's "dumb server" support (repo distribution over HTTP/FTP/SFTP etc.) has been ported to 0.26, a first command line version written, etc. - - The 'usher' netsync proxy used for hosting many databases on + - The 'usher' netsync proxy used for hosting many databases on a single machine has received significant cleanups, and the 'webhost' project to provide a simple interface to shared monotone hosting providers has received even more work. @@ -1221,7 +1318,7 @@ Sun Jan 8 01:08:56 PST 2006 Let's say that again in caps: THIS CODE IS PROBABLY BUGGY, DO NOT USE IT IN PRODUCTION UNLESS YOU WANT TO BE A DAREDEVIL. - + However, testing of this version with real databases is a good idea, and we'd very much appreciate hearing about your experiences. @@ -1843,10 +1940,10 @@ Sun Nov 7 14:06:03 EST 2004 0.15 release. major changes. - - overhauled the internal representation of changes. see + - overhauled the internal representation of changes. see README.changesets for details - - fixed bugs in merkle trie synchronization code - - fixed echoing and progress UI bugs + - fixed bugs in merkle trie synchronization code + - fixed echoing and progress UI bugs (helps when using in emacs) - upgraded cryptopp to 5.2.1 - fixed bug 8715, diff hunk coordinate reporting @@ -1869,7 +1966,7 @@ Sat Jul 31 15:38:02 EDT 2004 - several critical rename-merging bugs fixed - renames vs. deletes - renames vs. deltas - - parallel renames + - parallel renames - bugs fixed from savannah bug tracker: - 9223 argv overflow - 9075 empty commits @@ -1889,7 +1986,7 @@ Thu May 20 22:26:27 EDT 2004 - remove (file|manifest) in several commands - "list missing" command - - fixed bugs: + - fixed bugs: - (critical) empty data netsync crash - mkstemp, platform lua - runtime error reporting chatter @@ -1916,7 +2013,7 @@ Sun Mar 28 12:41:07 EST 2004 Sun Mar 28 12:41:07 EST 2004 - 0.11 release. bug fixes and optimizations. + 0.11 release. bug fixes and optimizations. NOTE: this release expands the sqlite page size. YOU WILL NEED to dump existing databases before upgrading and reload it @@ -1932,23 +2029,23 @@ Sun Mar 28 12:41:07 EST 2004 - fixed bugs: - aliasing bug on debian (-O2 now works) - - netsync ppc portability / checksums + - netsync ppc portability / checksums - sha1 whitespace bug - netsync broken formatter - broken symlink handling - - merger execution pessimism + - merger execution pessimism - LCA bitset calculation pessimism - static object initialization order - CVS single-version import - CVS first-version changelog - CVS branch inference and topology - - cryptographic SSE2 paths enabled on linux/x86. - - builds against boost 1.31.0. + - cryptographic SSE2 paths enabled on linux/x86. + - builds against boost 1.31.0. - removed boost::socket - - removed documentation about old networking system. - - "officially" deprecated old networking system. + - removed documentation about old networking system. + - "officially" deprecated old networking system. - enable building with system-local libraries. - - upgraded bundled sqlite. + - upgraded bundled sqlite. - changed sqlite page size from 1k -> 8k Mon Mar 1 00:32:07 EST 2004 @@ -2025,7 +2122,7 @@ Mon Aug 25 18:00:37 EDT 2003 cleaned up command line processing. expanded testsuite. improved user-friendly error reporting. -Fri Aug 8 10:20:01 EDT 2003 +Fri Aug 8 10:20:01 EDT 2003 0.2 release. database compatibility broken. dropped many library dependencies. hand-reimplemented xdelta, parts of @@ -2035,6 +2132,6 @@ Fri Aug 8 10:20:01 EDT 2003 scalability tests against real world CVS archives show performance gap with CVS closing, but still present. -Sun Apr 6 20:20:42 EDT 2003 +Sun Apr 6 20:20:42 EDT 2003 initial release ============================================================ --- UPGRADE dd53acd07dfebc6c40834f89af70696a6e6656a2 +++ UPGRADE b7e014bfa58db309cb29c06816070168e602fd14 @@ -1,4 +1,4 @@ -upgrading monotone to 0.36 +upgrading monotone to 0.37 ========================== How to read this file: ============================================================ --- docs/Additional-Lua-Functions.html 285e2b1228c364b9a93ae04192664a18fe95a8e7 +++ docs/Additional-Lua-Functions.html 846e1a5b8af06f43d8d8356df8ec75a7e91a6972 @@ -36,7 +36,11 @@ hook writers. hook writers.
-
existonpath(possible_command)
+
alias_command(original, alias)
+This function adds a new alias for a monotone command. A call to this function would +normally be placed directly in the monotonerc file, rather than in a hook function. + +
existonpath(possible_command)
This function receives a string containing the name of an external program and returns 0 if it exists on path and is executable, -1 otherwise. @@ -46,56 +50,56 @@ for “xxdiff.exe”. program name. In the previous example, existonpath would search for “xxdiff.exe”. -
get_confdir()
+
get_confdir()
Returns the path to the configuration directory, either implied or given with --confdir. -
get_ostype()
+
get_ostype()
Returns the operating system flavor as a string. -
guess_binary_file_contents(filespec)
+
guess_binary_file_contents(filespec)
Returns true if the file appears to be binary, i.e. contains one or more of the following characters:
          0x00 thru 0x06
           0x0E thru 0x1a
           0x1c thru 0x1f
      
-
include(scriptfile)
+
include(scriptfile)
This function tries to load and execute the script contained into scriptfile. It returns true for success and false if there is an error. -
includedir(scriptpath)
+
includedir(scriptpath)
This function loads and executes in alphabetical order all the scripts contained into the directory scriptpath. If one of the scripts has an error, the functions doesn't process the remaining scripts and immediately returns false. -
includedirpattern(scriptpath, pattern)
+
includedirpattern(scriptpath, pattern)
This function loads and executes in alphabetical order all the scripts contained into the directory scriptpath that match the given pattern. If one of the scripts has an error, the functions doesn't process the remaining scripts and immediately returns false. -
is_executable(filespec)
+
is_executable(filespec)
This function returns true if the file is executable, false otherwise. On Windows this function returns always false. -
kill(pid [, signal])
+
kill(pid [, signal])
This function calls the kill() C library function on POSIX systems and TerminateProcess on Win32 (in that case pid is the process handle). If the optional signal parameter is missing, SIGTERM will be used. Returns 0 on success, -1 on error. -
make_executable(filespec)
+
make_executable(filespec)
This function marks the named file as executable. On Windows has no effect. -
match(glob, string)
+
match(glob, string)
Returns true if glob matches str, return false otherwise. -
mkstemp(template)
+
mkstemp(template)
Like its C library counterpart, mkstemp creates a unique name and returns a file descriptor for the newly created file. The value of template should be a pointer to a character buffer loaded @@ -117,7 +121,16 @@ For the definition of temp_file()< file in the standard TMP/TEMP directories. For the definition of temp_file(), see Default hooks. -
parse_basic_io(data)
+
mtn_automate( ... )
+The mtn_automate Lua function calls the Monotone +automate command passed in its arguments. The result of the +call is a pair consisting of a boolean return code, indicating whether +the call was successful or not, and a string being the stdout +output from the automate command. This function is not for use +in ordinary Lua hooks, but rather for Lua based commands as defined by +the Lua function register_command. + +
parse_basic_io(data)
Parse the string data, which should be in basic_io format. It returns nil if it can't parse the string; otherwise it returns a table. This will be a list of all statements, with each entry being a table having a "name" element that is @@ -133,17 +146,23 @@ the arguments.

The output table will be:

          {
-             1 = { name = "thingy", args = { 1 = "foo", 2 = "bar" } },
-             2 = { name = "thingy", args = { 1 = "baz" } },
-             3 = { name = "spork", args = { } },
-             4 = { name = "frob", args = { 1 = "oops" } }
+             1 = { name = "thingy", values = { 1 = "foo", 2 = "bar" } },
+             2 = { name = "thingy", values = { 1 = "baz" } },
+             3 = { name = "spork", values = { } },
+             4 = { name = "frob", values = { 1 = "oops" } }
           }
      
-
regex.search(regexp, string)
+
regex.search(regexp, string)
Returns true if a match for regexp is found in str, return -false otherwise. +false otherwise. See Regexps, for the syntax of regexp. -
server_request_sync(what, address, include, exclude)
+
register_command(name, params, abstract, description, function)
+Add a command named name to the user command group in monotone. This function is +normally called directly from a monotonerc file rather than a hook. When the user issues the +registered command, monotone will call the lua function name supplied. That function would +then normally use mtn_automate() calls to service the request. + +
server_request_sync(what, address, include, exclude)
Initiate a netsync connection to the server at address, with the given include and exclude patterns, of type sync, push, or pull, as given by the what argument. @@ -151,10 +170,10 @@ command, this function has no effect.

When called by a monotone instance which is not running the serve command, this function has no effect. -

sleep(seconds)
+
sleep(seconds)
Makes the calling process sleep for the specified number of seconds. -
spawn(executable [, args ...])
+
spawn(executable [, args ...])
Starts the named executable with the given arguments. Returns the process PID on POSIX systems, the process handle on Win32 or -1 if there was an error. @@ -166,17 +185,17 @@ in a standardized way. option. execute() builds on spawn() and wait() in a standardized way. -
spawn_pipe(executable [, args ...])
+
spawn_pipe(executable [, args ...])
Like spawn(), but returns three values, where the first two are the subprocess' standard input and standard output, and the last is the process PID on POSIX systems, the process handle on Win32 or -1 if there was an error. -
spawn_redirected(infile, outfile, errfile, executable [, args ...])
+
spawn_redirected(infile, outfile, errfile, executable [, args ...])
Like spawn(), but with standard input, standard output and standard error redirected to the given files. -
wait(pid)
+
wait(pid)
Wait until the process with given PID (process handle on Win32) exits. Returns two values: a result value and the exit code of the waited-for process. ============================================================ --- docs/Automation.html aeef8408071b1a6a89c6030e08903f40fddfe1c9 +++ docs/Automation.html 356f171884468556f00bb7a1cf101d1a52e8d86c @@ -40,7 +40,7 @@ messages. messages.
-
mtn automate interface_version
+
mtn automate interface_version
Arguments:
None. @@ -69,7 +69,7 @@ None.
-
mtn automate heads [branch]
+
mtn automate heads [branch]
Arguments:
One branch name, branch. If none is given, the current default branch is used. @@ -96,7 +96,7 @@ If the given branch contains no members
-
mtn automate ancestors rev1 [rev2 [...]]
+
mtn automate ancestors rev1 [rev2 [...]]
Arguments:
One or more revision IDs, rev1, rev2, etc. @@ -129,7 +129,7 @@ an error message to stderr, and exits wi
-
mtn automate common_ancestors rev1 [rev2 [...]]
+
mtn automate common_ancestors rev1 [rev2 [...]]
Arguments:
One or more revision IDs, rev1, rev2, etc. @@ -162,7 +162,7 @@ an error message to stderr, and exits wi
-
mtn automate parents rev
+
mtn automate parents rev
Arguments:
One revision ID, rev. @@ -192,7 +192,7 @@ stdout, prints an error message to stder
-
mtn automate descendents rev1 [rev2 [...]]
+
mtn automate descendents rev1 [rev2 [...]]
Arguments:
One or more revision IDs, rev1, rev2, etc. @@ -225,7 +225,7 @@ an error message to stderr, and exits wi
-
mtn automate children rev
+
mtn automate children rev
Arguments:
One revision ID, rev. @@ -255,7 +255,7 @@ stdout, prints an error message to stder
-
mtn automate graph
+
mtn automate graph
Arguments:
None. @@ -292,7 +292,7 @@ None.
-
mtn automate erase_ancestors [rev1 [rev2 [...]]]
+
mtn automate erase_ancestors [rev1 [rev2 [...]]]
Arguments:
One or more revision IDs, rev1, rev2, etc. @@ -326,7 +326,7 @@ an error message to stderr, and exits wi
-
mtn automate toposort [rev1 [rev2 [...]]]
+
mtn automate toposort [rev1 [rev2 [...]]]
Arguments:
One or more revision IDs, rev1, rev2, etc. @@ -358,7 +358,7 @@ an error message to stderr, and exits wi
-
mtn automate ancestry_difference new [old1 [old2 [...]]]
+
mtn automate ancestry_difference new [old1 [old2 [...]]]
Arguments:
A “new” revision ID new, followed by zero or more “old” @@ -395,7 +395,7 @@ an error message to stderr, and exits wi
-
mtn automate leaves
+
mtn automate leaves
Arguments:
None. @@ -427,7 +427,7 @@ None.
-
mtn automate roots
+
mtn automate roots
Arguments:
None. @@ -454,7 +454,7 @@ None.
-
mtn automate branches
+
mtn automate branches
Arguments:
None. @@ -481,7 +481,7 @@ None.
-
mtn automate tags [branch_pattern]
+
mtn automate tags [branch_pattern]
Arguments:
A branch pattern (optional). @@ -549,7 +549,7 @@ specified.
-
mtn automate select selector
+
mtn automate select selector
Arguments:
One selector (or combined selector). @@ -577,7 +577,7 @@ None.
-
mtn automate identify path
+
mtn automate identify path
Arguments:
A file path. @@ -604,227 +604,326 @@ marker.
-
mtn automate inventory
+
mtn automate inventory [files...]
Arguments:
-None. +One or more file paths (optional). If present, only show inventory for the given +files or directories (and their sub-directories). You can use --depth +and --exclude to control what is selected through this restriction. -
Added in:
-1.0 +
Changes:
+
    +
  • 6.0 – converted to basic_io format (restriction support, various fixes) +
  • 1.0 – initial, line-based format +
+
Purpose:
Prints the inventory of every file found in the workspace or its -associated base manifest. Each unique path is listed on a line prefixed by -three status characters and two numeric values used for identifying -renames. +associated base and revision manifests. Each unique path is +listed in a basic_io stanza. Stanzas are separated by blank lines.
Sample output:
-All basic status codes: +All basic status cases: +
          
+              path "added"
+          new_type "file"
+           fs_type "file"
+            status "added" "known"
+           changes "content"
+          
+              path "dropped"
+          old_type "file"
+           fs_type "none"
+            status "dropped"
+          
+             path "ignored~"
+          fs_type "file"
+           status "ignored"
+          
+              path "missing"
+          old_type "file"
+          new_type "file"
+           fs_type "none"
+            status "missing"
+          
+              path "original"
+          old_type "file"
+          new_path "renamed"
+           fs_type "none"
+            status "rename_source"
+          
+              path "patched"
+          old_type "file"
+          new_type "file"
+           fs_type "file"
+            status "known"
+           changes "content"
+          
+              path "renamed"
+          new_type "file"
+          old_path "original"
+           fs_type "file"
+            status "rename_target" "known"
+          
+              path "unchanged"
+          old_type "file"
+          new_type "file"
+           fs_type "file"
+            status "known"
+          
+             path "unknown"
+          fs_type "file"
+           status "unknown"
+     
-
          
-            M 0 0 missing
-           AP 0 0 added
-          D   0 0 dropped
-          R   1 0 renamed-from-this
-           R  0 1 renamed-to-this
-            P 0 0 patched
-              0 0 unchanged
-            U 0 0 unknown
-            I 0 0 ignored
+          

Two files swapped in both the revision manifest and the workspace: +

          
+              path "original"
+          old_type "file"
+          new_path "unchanged"
+          new_type "file"
+          old_path "unchanged"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+          
+              path "unchanged"
+          old_type "file"
+          new_path "original"
+          new_type "file"
+          old_path "original"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
      
-

Two files swapped: - -

          
-          RR  1 2 unchanged
-          RR  2 1 original
+          

Recorded in the revision manifest that two files were swapped, but +they were not actually swapped in the workspace. Thus they both appear +as patched: +

          
+              path "original"
+          old_type "file"
+          new_path "unchanged"
+          new_type "file"
+          old_path "unchanged"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+           changes "content"
+          
+              path "unchanged"
+          old_type "file"
+          new_path "original"
+          new_type "file"
+          old_path "original"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+           changes "content"
+          
      
-

Recorded with monotone that two files were swapped, but they were not -actually swapped in the filesystem. Thus they both appear as patched: +

Rename (in the manifest and the workspace) foo to bar; +add (in the manifest and the workspace) new file foo: +

          
+              path "foo"
+          old_type "file"
+          new_path "bar"
+          new_type "file"
+           fs_type "file"
+            status "rename_source" "added" "known"
+          
+              path "bar"
+          new_type "file"
+          old_path "foo"
+           fs_type "file"
+            status "rename_target" "known"
+     
-
          
-          RRP 1 2 unchanged
-          RRP 2 1 original
+          

Rotated files foo -> bar -> baz -> foo (in +the manifest and the workspace): +

          
+              path "foo"
+          old_type "file"
+          new_path "bar"
+          new_type "file"
+          old_path "baz"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+          
+              path "bar"
+          old_type "file"
+          new_path "baz"
+          new_type "file"
+          old_path "foo"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+          
+              path "baz"
+          old_type "file"
+          new_path "foo"
+          new_type "file"
+          old_path "bar"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
      
-

Rename foo to bar; add new file foo: +

Recorded in the revison manifest the rotation of files foo -> +bar -> baz -> foo, but the actual files in the +workspace were not moved, so monotone interprets all files as having +been patched: +

          
+              path "foo"
+          old_type "file"
+          new_path "bar"
+          new_type "file"
+          old_path "baz"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+           changes "content"
+          
+              path "bar"
+          old_type "file"
+          new_path "baz"
+          new_type "file"
+          old_path "foo"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+           changes "content"
+          
+              path "baz"
+          old_type "file"
+          new_path "foo"
+          new_type "file"
+          old_path "bar"
+           fs_type "file"
+            status "rename_source" "rename_target" "known"
+           changes "content"
+     
-
          
-          RAP 1 0 foo
-           R  0 1 bar
+          

Dropped from the manifest but not removed in the workspace and thus +unknown: +

          
+              path "dropped"
+          old_type "file"
+           fs_type "file"
+            status "dropped" "unknown"
      
-

Rotated files foo -> bar -> baz -> foo: +

Added in the manifest but not in the workspace, and thus missing: +

          
+              path "added"
+          new_type "file"
+           fs_type "none"
+            status "added" "missing"
+     
-
          
-          RR  1 3 foo
-          RR  2 1 bar
-          RR  3 2 baz
+          

Recorded a rename in the manifest, but not moved in the workspace, +and thus unknown source and missing target: +

          
+              path "original"
+          old_type "file"
+          new_path "renamed"
+           fs_type "file"
+            status "rename_source" "unknown"
+          
+              path "renamed"
+          new_type "file"
+          old_path "original"
+           fs_type "none"
+            status "rename_target" "missing"
      
-

Recorded the rotation of files foo -> bar -> baz -> -foo, but the actual files in the workspace were not -moved, so monotone interprets all files as having been patched: +

Moved in the workspace but no rename recorded in the manifest, and +thus missing source and unknown target: +

          
+              path "original"
+          old_type "file"
+          new_type "file"
+           fs_type "none"
+            status "missing"
+          
+             path "renamed"
+          fs_type "file"
+           status "unknown"
+     
-
          
-          RRP 1 3 foo
-          RRP 2 1 bar
-          RRP 3 2 baz
+          

Renamed in the manifest and the workspace and patched: +

          
+              path "original"
+          old_type "file"
+          new_path "renamed"
+           fs_type "none"
+            status "rename_source"
+          
+              path "renamed"
+          new_type "file"
+          old_path "original"
+           fs_type "file"
+            status "rename_target" "known"
+           changes "content"
      
-

Dropped but not removed and thus unknown: +

Output format:
+Each path is printed in one basic_io stanza. Stanzas are separated by +a blank line. Each stanza starts with a path line, and contains +up to seven lines. The order of the lines is not important, and may +change in future revisions, except that the first line will always be +path. -
          
-          D U 0 0 dropped
-     
+
+
'path'
the file or directory path, relative to the workspace root. The file +either exists in the workspace, or is listed in the base or revision +manifest. -

Added a non-existent file which is thus missing: +

old_type
gives the type of the file in the base manifest. “type” is +either file, directory, or none. +old_type is output only if it is different from the type in +the revision manifest or workspace. -
          
-           AM 0 0 added
-     
+
new_type
the type of the file in the revision manifest. new_type is +output only if it is different from the type in the base manifest or +workspace. -

Recorded a rename, but not moved in the filesystem, and thus unknown -source and missing target: +

fs_type
the type of the file in the workspace (also called the filesystem). +fs_type is always output. -
          
-          R U 1 0 original
-           RM 0 1 renamed
-     
+
old_path
the old path for the file, if it has been renamed in the revision +manifest. -

Moved in the filesystem but no rename recorded, and thus missing source -and unknown target: +

new_path
the new path for the file, if it has been renamed in the revision +manifest. -
          
-            M 0 0 original
-            U 0 0 renamed
-     
+
status
status is always output. Its value is one or more of: -

Renamed and patched: +

+
rename_source
the old name of a file that has been renamed. -
          
-          R   1 0 original
-           RP 0 1 renamed
-     
+
rename_target
the new name of a file that has been renamed. -
Output format:
-Each path is printed on its own line, prefixed by three status -characters described below. The status is followed by a single space and -two numbers, each separated by a single space, used for identifying renames. -The numbers are followed by a single space and then the pathname, which -includes the rest of the line. Directory paths are identified as ending with -the "/" character, file paths do not end in this character. +
added
the file is new in the revision manifest (not in the base +manifest). -

The three status characters are as follows. +

dropped
the file is deleted in the revision manifest and the workspace. -
          
-          column 1 pre-state
-                ' ' the path was unchanged in the pre-state
-                'D' the path was deleted from the pre-state
-                'R' the path was renamed from the pre-state name
-          column 2 post-state
-                ' ' the path was unchanged in the post-state
-                'R' the path was renamed to the post-state name
-                'A' the path was added to the post-state
-          column 3 file-state
-                ' ' the file is known and unchanged from the current manifest version
-                'P' the file is patched to a new version
-                'U' the file is unknown and not included in the current manifest
-                'I' the file is ignored and not included in the current manifest
-                'M' the file is missing but is included in the current manifest
-     
+
missing
the file is deleted in the workspace but not the revision manifest. -

Note that out of the 45 possible status code combinations, only 26 are valid, -detailed below. +

ignored
the file is ignored by monotone. -
          
-          '   ' unchanged
-          '  P' patched (contents changed)
-          '  U' unknown (exists on the filesystem but not tracked)
-          '  I' ignored (exists on the filesystem but excluded by Lua hook)
-          '  M' missing (exists in the manifest but not on the filesystem)
-          
-          ' A ' added (invalid, add should have associated patch)
-          ' AP' added and patched
-          ' AU' added but unknown (invalid)
-          ' AI' added but ignored (invalid, added files are no longer ignored)
-          ' AM' added but missing from the filesystem
-          
-          ' R ' rename target
-          ' RP' rename target and patched
-          ' RU' rename target but unknown (invalid)
-          ' RI' rename target but ignored (invalid)
-          ' RM' rename target but missing from the filesystem
-          
-          'D  ' dropped
-          'D P' dropped and patched (invalid)
-          'D U' dropped and unknown (still exists on the filesystem)
-          'D I' dropped and ignored
-          'D M' dropped and missing (invalid)
-          
-          'DA ' dropped and added (invalid, add should have associated patch)
-          'DAP' dropped and added and patched
-          'DAU' dropped and added but unknown (invalid)
-          'DAI' dropped and added but ignored (invalid, added files are no longer ignored)
-          'DAM' dropped and added but missing from the filesystem
-          
-          'DR ' dropped and rename target
-          'DRP' dropped and rename target and patched
-          'DRU' dropped and rename target but unknown (invalid)
-          'DRI' dropped and rename target but ignored (invalid)
-          'DRM' dropped and rename target but missing from the filesystem
-          
-          'R  ' rename source
-          'R P' rename source and patched (invalid)
-          'R U' rename source and unknown (still exists on the filesystem)
-          'R I' rename source and ignored
-          'R M' rename source and missing (invalid)
-          
-          'RA ' rename source and added (invalid, add should have associated patch)
-          'RAP' rename source and added and patched
-          'RAU' rename source and added but unknown (invalid)
-          'RAI' rename source and added but ignored (invalid, added files are no longer ignored)
-          'RAM' rename source and added but missing from the filesystem
-          
-          'RR ' rename source and target
-          'RRP' rename source and target and target patched
-          'RRU' rename source and target and target unknown (invalid)
-          'RRI' rename source and target and target ignored (invalid)
-          'RRM' rename source and target and target missing
-     
+
known
the file exists in the workspace, and in the revision manifest. -

The two numbers are used to match up the pre-state and post-state of a -rename. Imagine a situation where there are two renames. -automate inventory will print something like: +

unknown
the file exists in the workspace, but not in the revision manifest. -
          
-          R   1 0 a
-          R   2 0 b
-           R  0 2 c
-           R  0 1 d
-     
+
invalid
the file exists in the workspace and revision manifest, but with +different types (one is a directory, the other a file). + +
-

Here the status characters tell us that a and b were -renamed, and we can tell that one was renamed to c and one was -renamed to d, but we can't tell which was renamed to which. To -do that, we have to refer to the numbers. The numbers do not themselves -mean anything; their only purpose is to let you match up the two -“ends” of a rename. The 1 in the left column by a means that -a was the source of a rename, and the 1 in the right column by -d means that d was the target of that same rename. -Similarly, the two 2's tell us that b was renamed to c. +

changes
+
+
content
the file contents have changed. -

There are two columns of numbers because the same file can -simultaneously be the target and source of a rename. The number '0' is -used as a meaningless placeholder in all cases where a file is not a -source or target of a rename. Any non-zero number that occurs at all -will occur exactly once in the first column and exactly once in the -second column. +

attrs
the file attributes have changed. -

Full support for versioned directories is not yet complete and the -inventory will only list entries for renamed or dropped -directories. +

+

Error conditions:
When executed from outside of a workspace directory, prints an error @@ -832,7 +931,7 @@ message to stderr, and exits with status
-
mtn automate certs id
+
mtn automate certs id
Arguments:
A revision ID id, for which any certificates will be printed. @@ -844,12 +943,11 @@ the following values are provided: Prints all certificates associated with the given revision ID. Each certificate is contained in a basic IO stanza. For each certificate, the following values are provided: - -
          
+
          
           'key'
                 a string indicating the key used to sign this certificate.
           'signature'
-                a string indicating the status of the signature. Possible 
+                a string indicating the status of the signature. Possible
                 values of this string are:
                       'ok'        : the signature is correct
                       'bad'       : the signature is invalid
@@ -906,7 +1004,7 @@ or invalid prints an error message to st
 
      
-
mtn automate stdio
+
mtn automate stdio
Arguments:
none @@ -989,7 +1087,7 @@ whatever error message would have been g
-
mtn automate get_revision
mtn automate get_revision id
+
mtn automate get_revision
mtn automate get_revision id
Arguments:
Specifying the option id argument outputs the changeset @@ -1085,7 +1183,7 @@ to stderr and exits with status 1.
-
mtn automate get_base_revision_id
+
mtn automate get_base_revision_id
Arguments:
None. @@ -1111,7 +1209,7 @@ message to stderr, and exits with status
-
mtn automate get_current_revision_id
+
mtn automate get_current_revision_id
Arguments:
None. @@ -1138,7 +1236,7 @@ message to stderr, and exits with status
-
mtn automate get_manifest_of
mtn automate get_manifest_of revid
+
mtn automate get_manifest_of
mtn automate get_manifest_of revid
Arguments:
Specifying the optional revid argument outputs the manifest for the @@ -1275,7 +1373,7 @@ message to stderr and exits with status
-
mtn automate get_attributes path
+
mtn automate get_attributes path
Arguments:
The argument path determines which path's attributes should be printed. @@ -1350,7 +1448,7 @@ message to stderr and exits with status
-
mtn automate set_attribute path key value
+
mtn automate set_attribute path key value
Arguments:
A path, an attribute key and an attribute value. @@ -1372,7 +1470,7 @@ message to stderr and exits with status
-
mtn automate drop_attribute path [key]
+
mtn automate drop_attribute path [key]
Arguments:
A path and an attribute key (optional). @@ -1396,7 +1494,7 @@ status 1.
-
mtn automate content_diff [--revision=id1 [--revision=id2]] [files ...]
+
mtn automate content_diff [--revision=id1 [--revision=id2]] [files ...]
Arguments:
One or more file arguments restrict the diff output to these files, @@ -1449,7 +1547,7 @@ restrictions can't be applied, the comma
-
mtn automate get_file id
+
mtn automate get_file id
Arguments:
The id argument specifies the file hash of the file to be output. @@ -1478,7 +1576,7 @@ to stderr and exits with status 1.
-
mtn automate get_file_of filename [--revision=id]
+
mtn automate get_file_of filename [--revision=id]
Arguments:
The filename argument specifies the filename of the file to be output. @@ -1511,7 +1609,7 @@ with status 1.
-
mtn automate get_option option
+
mtn automate get_option option
Arguments:
The option argument specifies the option name of the option to be output. @@ -1536,7 +1634,7 @@ with status 1.
-
mtn automate keys
+
mtn automate keys
Arguments:
None. @@ -1578,7 +1676,7 @@ None.
-
mtn automate packet_for_rdata id
+
mtn automate packet_for_rdata id
Arguments:
The id specifies the revision to output an rdata packet for. @@ -1610,7 +1708,7 @@ and exits with status 1.
-
mtn automate packet_for_certs id
+
mtn automate packet_for_certs id
Arguments:
The id specifies the revision to output cert packets for. @@ -1668,7 +1766,7 @@ and exits with status 1.
-
mtn automate packet_for_fdata id
+
mtn automate packet_for_fdata id
Arguments:
The id specifies the file to output an fdata packet for. @@ -1698,7 +1796,7 @@ and exits with status 1.
-
mtn automate packet_for_fdelta from-id to-id
+
mtn automate packet_for_fdelta from-id to-id
Arguments:
The from-id specifies the file to use as the base of the delta, @@ -1730,7 +1828,7 @@ message to stderr and exits with status
-
mtn automate get_content_changed id file
+
mtn automate get_content_changed id file
Arguments:
The id specifies a revision ID, from which content change calculations will be based. @@ -1762,7 +1860,7 @@ message to stderr and exits with status
-
mtn automate get_corresponding_path source_id file target_id
+
mtn automate get_corresponding_path source_id file target_id
Arguments:
The source_id specifies a revision ID in which file is current extant. @@ -1800,7 +1898,7 @@ 1. Note that file not existin
-
mtn automate db_get domain name
+
mtn automate db_get domain name
Arguments:
The domain and name specify the database variable @@ -1813,8 +1911,8 @@ Read a database variable, see also Vars.
Sample output:
-
           
-          off.net 
+
          
+          off.net
      

Output format:
@@ -1827,7 +1925,7 @@ status 1.
-
mtn automate db_put domain name value
+
mtn automate db_put domain name value
Arguments:
The domain and name specify the database variable @@ -1840,8 +1938,8 @@ Change a database variable, see also Vars.
Sample usage:
-
           
-          mtn automate db_set database default-server off.net 
+
          
+          mtn automate db_set database default-server off.net
      

Output format:
@@ -1852,7 +1950,7 @@ None.
-
mtn automate put_file [base-id] contents
+
mtn automate put_file [base-id] contents
Arguments:
The optional base-id specifies a file-id on which the contents are @@ -1868,8 +1966,8 @@ See also aut automate stdio.
Sample output:
-
           
-          70a0f283898a18815a83df37c902e5f1492e9aa2 
+
          
+          70a0f283898a18815a83df37c902e5f1492e9aa2
      

Output format:
@@ -1881,7 +1979,7 @@ exits with status 1.
-
mtn automate put_revision revision-data
+
mtn automate put_revision revision-data
Arguments:
revision-data is the new revision. See example below. Note that @@ -1913,8 +2011,8 @@ Workspace-less commit. Normally used via content [5bf1fd927dfb8679496a2e6cf00cbe50c1c87145]
Sample output:
-
           
-          4c2c1d846fa561601254200918fba1fd71e6795d 
+
          
+          4c2c1d846fa561601254200918fba1fd71e6795d
      

Output format:
@@ -1928,7 +2026,7 @@ this fact, but otherwise works as normal
-
mtn automate cert revision name value
+
mtn automate cert revision name value
Arguments:
revision is an existing revision, name is the certificate name @@ -1942,8 +2040,8 @@ cert with a specific private key, use --key.
Sample usage:
-
           
-          mtn automate cert 4c2c1d846fa561601254200918fba1fd71e6795d author address@hidden 
+
          
+          mtn automate cert 4c2c1d846fa561601254200918fba1fd71e6795d author address@hidden
      

Output format:
============================================================ --- docs/Branching-and-Merging.html e3acb9bc720c229d0f49419237fb1938979aabc8 +++ docs/Branching-and-Merging.html 2f0bd679bd2111d317fc4bbea458a889d1e655c2 @@ -110,5 +110,26 @@ should be able to handle it. Whatever arrangement of branches you come up with, though, monotone should be able to handle it. +

If you are unsure of the name of a branch, you can list all branches using +the ls branches command. This is very useful, but if you create +a lot of branches then the list can become very long and unwieldy. To help +this monotone has the suspend command which partially hides +revisions/branches you are no longer using. Further commits on hidden branches +will automatically unhide the branches. + +

For example, if Beth is now finished with the muffins branch, she can stop +it from cluttering the list of branches by suspending the last revision in +that branch: + +

     $ mtn ls branches
+     jp.co.juicebot.jb7
+     jp.co.juicebot.jb7.muffins
+     $ mtn heads
+     mtn: branch 'jp.co.juicebot.jb7.muffins' is currently merged:
+     4e48e2c9a3d2ca8a708cb0cc545700544efb5021 address@hidden 2007-07-08T02:17:37
+     $ mtn suspend 4e48e2c9a3d2ca8a708cb0cc545700544efb5021
+     $ mtn ls branches
+     jp.co.juicebot.jb7
+
============================================================ --- docs/Certificate.html c5e3af1fe0b79b52af89a16404c5f0351ef56fda +++ docs/Certificate.html 2565b478678bf30855d226b656a5abe73791ff6a @@ -35,7 +35,7 @@ Up: 5.6 Certificate
-
mtn cert id certname
mtn cert id certname certval
+
mtn cert id certname
mtn cert id certname certval
These commands create a new certificate with name certname, for a revision with version id. The id argument can be a selector using certs already on the revision, such as h:branchname. @@ -43,18 +43,24 @@ Otherwise the certificate value is read

If certval is provided, it is the value of the certificate. Otherwise the certificate value is read from stdin. -

mtn approve id
+
mtn approve id
This command is a synonym for mtn cert id branch branchname where branchname is the current branch name (either deduced from the workspace or from the --branch option). -
mtn comment id
mtn comment id comment
+
mtn comment id
mtn comment id comment
These commands are synonyms for mtn cert id comment comment. If comment is not provided, it is read from stdin. -
mtn tag id tagname
+
mtn suspend id
+This command is a synonym for mtn cert id suspend +branchname where branchname is the current branch name +(either deduced from the workspace or from the --branch +option). + +
mtn tag id tagname
This command associates the symbolic name tagname with the revision id, so that symbolic name can later be used in Selectors for specifying revisions for commands like @@ -63,7 +69,7 @@ revision id, so that symbolic

This command is a synonym for mtn cert id tag tagname. -

mtn testresult id 0
mtn testresult id 1
+
mtn testresult id 0
mtn testresult id 1
These commands are synonyms for mtn cert id testresult 0 or mtn cert id testresult 1. ============================================================ --- docs/Database.html 143826efd443bd9e9a0d3b4c19700e4a560e4e10 +++ docs/Database.html c6e7c8ae101dfccc96542336f0696e5e03d4480c @@ -35,33 +35,33 @@ Up: 5.8 Database
-
mtn set domain name value
+
mtn set domain name value
Associates the value value to name in domain domain. See Vars for more information. -
mtn unset domain name
+
mtn unset domain name
Deletes any value associated with name in domain. See Vars for more information. -
mtn db init --db=dbfile
+
mtn db init --db=dbfile
This command initializes a new monotone database at dbfile. -
mtn db info --db=dbfile
+
mtn db info --db=dbfile
This command prints information about the monotone database dbfile, including its schema version and various table size statistics. -
mtn db version --db=dbfile
+
mtn db version --db=dbfile
This command prints out just the schema version of the monotone database dbfile. -
mtn db dump --db=dbfile
+
mtn db dump --db=dbfile
This command dumps an SQL statement representing the entire state of dbfile to the standard output stream. It is a very low-level command, and produces the most “recoverable” dumps of your database possible. It is sometimes also useful when migrating databases between variants of the underlying SQLite database format. -
mtn db load --db=dbfile
+
mtn db load --db=dbfile
This command applies a raw SQL statement, read from the standard input stream, to the database dbfile. It is most useful when loading a database dumped with the dump command. @@ -70,7 +70,7 @@ database is included in the dum database is included in the dump, so you should not try to init your database before a load. -
mtn db migrate --db=dbfile
+
mtn db migrate --db=dbfile
This command attempts to migrate the database dbfile to the newest schema known by the version of monotone you are currently running. If the migration fails, no changes should be made to the @@ -80,7 +80,7 @@ during migration. a copy of it before migrating, in case there is an untrapped error during migration. -
mtn db check --db=dbfile
+
mtn db check --db=dbfile
Monotone always works hard to verify the data it creates and accesses. For instance, if you have hard drive problems that corrupt data in monotone's database, and you attempt to retrieve this data, then @@ -210,11 +210,17 @@ and revision is correct.

This command also verifies that the sha1 hash of every file, manifest, and revision is correct. -

mtn db kill_rev_locally id
+
mtn db kill_rev_locally id
This command “kills”, i.e., deletes, a given revision, as well as any certs attached to it. It has an ugly name because it is a dangerous command; it permanently and irrevocably deletes historical information -from your database. There are a number of caveats: +from your database. If you execute this command in a workspace, whose +parent revision is the one you are about to delete, the killed revision +is re-applied to this workspace which makes it possible for you to fix +a problem and commit again later on easily. For this to work, the +workspace may not have any changes and/or missing files. + +

There are a number of other caveats with this command:

  • It can only be applied to revisions that have no descendants. If you want to kill a revision that has descendants, you must kill all of the @@ -237,7 +243,7 @@ work you can extract id's dat work you can extract id's data.
-
mtn db kill_branch_certs_locally branch
+
mtn db kill_branch_certs_locally branch
This command “kills” a branch by deleting all branch certs with that branch name. You should consider carefully whether you want to use it, because it can irrevocably delete important information. It does not @@ -251,7 +257,7 @@ certificates locally. you sync, unless the owners of those databases also delete those certificates locally. -
mtn db kill_tag_locally tag
+
mtn db kill_tag_locally tag
This command “kills” a tag by deleting all tag certs with that tag name. You should consider carefully whether you want to use it, because it can irrevocably delete important information. It does not modify or @@ -263,7 +269,7 @@ certificates locally. sync, unless the owners of those databases also delete those certificates locally. -
mtn db execute sql-statement
+
mtn db execute sql-statement
This is a debugging command which executes sql-statement against your database, and prints any results of the expression in a tabular form. It can be used to investigate the state of your database, or ============================================================ --- docs/Default-hooks.html 1e0a8b65e1e8eb0bf4cdc6cdc53afe6a2c382875 +++ docs/Default-hooks.html a3abe4ba183ea064970139d1e081b21c499f3d74 @@ -59,7 +59,7 @@ end return file, name end -function execute(path, ...) +function execute(path, ...) local pid local ret = -1 pid = spawn(path, unpack(arg)) @@ -67,11 +67,20 @@ end return ret end +function execute_redirected(stdin, stdout, stderr, path, ...) + local pid + local ret = -1 + io.flush(); + pid = spawn_redirected(stdin, stdout, stderr, path, unpack(arg)) + if (pid ~= -1) then ret, pid = wait(pid) end + return ret +end + -- Wrapper around execute to let user confirm in the case where a subprocess -- returns immediately -- This is needed to work around some brokenness with some merge tools -- (e.g. on OS X) -function execute_confirm(path, ...) +function execute_confirm(path, ...) ret = execute(path, unpack(arg)) if (ret ~= 0) @@ -94,30 +103,30 @@ end attr_init_functions = {} end -attr_init_functions["mtn:execute"] = +attr_init_functions["mtn:execute"] = function(filename) - if (is_executable(filename)) then - return "true" - else - return nil - end + if (is_executable(filename)) then + return "true" + else + return nil + end end -attr_init_functions["mtn:manual_merge"] = +attr_init_functions["mtn:manual_merge"] = function(filename) - if (binary_file(filename)) then + if (binary_file(filename)) then return "true" -- binary files must be merged manually - else + else return nil - end + end end if (attr_functions == nil) then attr_functions = {} end -attr_functions["mtn:execute"] = - function(filename, value) +attr_functions["mtn:execute"] = + function(filename, value) if (value == "true") then make_executable(filename) end @@ -149,17 +158,30 @@ function ignore_file(name) io.close(ignfile) end end + + local warn_reported_file = false for i, line in pairs(ignored_files) do - local pcallstatus, result = pcall(function() return regex.search(line, name) end) - if pcallstatus == true then - -- no error from the regex.search call - if result == true then return true end - else - -- regex.search had a problem, warn the user their .mtn-ignore file syntax is wrong - io.stderr:write("WARNING: the line '" .. line .. "' in your .mtn-ignore file caused error '" .. result .. "'" - .. " while matching filename '" .. name .. "'.\nignoring this regex for all remaining files.\n") - table.remove(ignored_files, i) + if (line ~= nil) then + local pcallstatus, result = pcall(function() + return regex.search(line, name) + end) + if pcallstatus == true then + -- no error from the regex.search call + if result == true then return true end + else + -- regex.search had a problem, warn the user their + -- .mtn-ignore file syntax is wrong + if not warn_reported_file then + io.stderr:write("mtn: warning: while matching file '" + .. name .. "':\n") + warn_reported_file = true + end + io.stderr:write(".mtn-ignore:" .. i .. ": warning: " .. result + .. "\n\t- skipping this regex for " + .. "all remaining files.\n") + ignored_files[i] = nil + end end end @@ -231,7 +253,7 @@ function binary_file(name) if string.find(lowname, pat) then return false end end - -- unknown - read file and use the guess-binary + -- unknown - read file and use the guess-binary -- monotone built-in function return guess_binary_file_contents(name) end @@ -242,7 +264,7 @@ function get_encloser_pattern(name) function get_encloser_pattern(name) -- texinfo has special sectioning commands if (string.find(name, "%.texi$")) then - -- sectioning commands in texinfo: @node, @chapter, @top, + -- sectioning commands in texinfo: @node, @chapter, @top, -- @((sub)?sub)?section, @unnumbered(((sub)?sub)?sec)?, -- @appendix(((sub)?sub)?sec)?, @(|major|chap|sub(sub)?)heading return ("^@(" @@ -316,7 +338,7 @@ function edit_comment(basetext, user_log if (tmp == nil) then os.remove(tname); return nil end local res = "" local line = tmp:read() - while(line ~= nil) do + while(line ~= nil) do if (not string.find(line, "^MTN:")) then res = res .. line .. "\n" end @@ -397,11 +419,18 @@ mergers = {} -- `merger' variable or the MTN_MERGE environment variable. mergers = {} +-- This merger is designed to fail if there are any conflicts without trying to resolve them +mergers.fail = { + cmd = function (tbl) return false end, + available = function () return true end, + wanted = function () return true end +} + mergers.meld = { cmd = function (tbl) io.write (string.format("\nWARNING: 'meld' was choosen to perform external 3-way merge.\n".. "You should merge all changes to *CENTER* file due to limitation of program\n".. - "arguments.\n\n")) + "arguments.\n\n")) local path = "meld" local ret = execute(path, tbl.lfile, tbl.afile, tbl.rfile) if (ret ~= 0) then @@ -500,48 +529,114 @@ mergers.rcsmerge = { wanted = function () return os.getenv("MTN_RCSMERGE") ~= nil end } +-- GNU diffutils based merging mergers.diffutils = { - cmd = function (tbl) - local ret = execute( - "diff3", - "--merge", - "--label", string.format("%s [left]", tbl.left_path ), - "--label", string.format("%s [ancestor]", tbl.anc_path ), - "--label", string.format("%s [right]", tbl.right_path), - tbl.lfile, - tbl.afile, - tbl.rfile - ) - if (ret ~= 0) then - io.write(gettext("Error running GNU diffutils 3-way difference tool 'diff3'\n")) - return false - end - local ret = execute( - "sdiff", - "--diff-program=diff", - "--suppress-common-lines", - "--minimal", - "--output", tbl.outfile, - tbl.lfile, - tbl.rfile - ) - if (ret == 2) then - io.write(gettext("Error running GNU diffutils 2-two merging tool 'sdiff'\n")) - return false - end - return tbl.outfile - end, - available = - function () - return program_exists_in_path("diff3") and - program_exists_in_path("sdiff"); - end, - wanted = - function () - return true - end -} + -- merge procedure execution + cmd = function (tbl) + -- parse options + local option = {} + option.partial = false + option.diff3opts = "" + option.sdiffopts = "" + local options = os.getenv("MTN_MERGE_DIFFUTILS") + if options ~= nil then + for spec in string.gmatch(options, "%s*(%w[^,]*)%s*,?") do + local name, value = string.match(spec, "^(%w+)=([^,]*)") + if name == nil then + name = spec + value = true + end + if type(option[name]) == "nil" then + io.write("mtn: " .. string.format(gettext("invalid \"diffutils\" merger option \"%s\""), name) .. "\n") + return false + end + option[name] = value + end + end + -- determine the diff3(1) command + local diff3 = { + "diff3", + "--merge", + "--label", string.format("%s [left]", tbl.left_path ), + "--label", string.format("%s [ancestor]", tbl.anc_path ), + "--label", string.format("%s [right]", tbl.right_path), + } + if option.diff3opts ~= "" then + for opt in string.gmatch(option.diff3opts, "%s*([^%s]+)%s*") do + table.insert(diff3, opt) + end + end + table.insert(diff3, string.gsub(tbl.lfile, "\\", "/") .. "") + table.insert(diff3, string.gsub(tbl.afile, "\\", "/") .. "") + table.insert(diff3, string.gsub(tbl.rfile, "\\", "/") .. "") + + -- dispatch according to major operation mode + if option.partial then + -- partial batch/non-modal 3-way merge "resolution": + -- simply merge content with help of conflict markers + io.write("mtn: " .. gettext("3-way merge via GNU diffutils, resolving conflicts via conflict markers") .. "\n") + local ret = execute_redirected("", string.gsub(tbl.outfile, "\\", "/"), "", unpack(diff3)) + if ret == 2 then + io.write("mtn: " .. gettext("error running GNU diffutils 3-way difference/merge tool \"diff3\"") .. "\n") + return false + end + return tbl.outfile + else + -- real interactive/modal 3/2-way merge resolution: + -- display 3-way merge conflict and perform 2-way merge resolution + io.write("mtn: " .. gettext("3-way merge via GNU diffutils, resolving conflicts via interactive prompt") .. "\n") + + -- display 3-way merge conflict (batch) + io.write("\n") + io.write("mtn: " .. gettext("---- CONFLICT SUMMARY ------------------------------------------------") .. "\n") + local ret = execute(unpack(diff3)) + if ret == 2 then + io.write("mtn: " .. gettext("error running GNU diffutils 3-way difference/merge tool \"diff3\"") .. "\n") + return false + end + + -- perform 2-way merge resolution (interactive) + io.write("\n") + io.write("mtn: " .. gettext("---- CONFLICT RESOLUTION ---------------------------------------------") .. "\n") + local sdiff = { + "sdiff", + "--diff-program=diff", + "--suppress-common-lines", + "--minimal", + "--output=" .. string.gsub(tbl.outfile, "\\", "/") + } + if option.sdiffopts ~= "" then + for opt in string.gmatch(option.sdiffopts, "%s*([^%s]+)%s*") do + table.insert(sdiff, opt) + end + end + table.insert(sdiff, string.gsub(tbl.lfile, "\\", "/") .. "") + table.insert(sdiff, string.gsub(tbl.rfile, "\\", "/") .. "") + local ret = execute(unpack(sdiff)) + if ret == 2 then + io.write("mtn: " .. gettext("error running GNU diffutils 2-way merging tool \"sdiff\"") .. "\n") + return false + end + return tbl.outfile + end + end, + + -- merge procedure availability check + available = function () + -- make sure the GNU diffutils tools are available + return program_exists_in_path("diff3") and + program_exists_in_path("sdiff") and + program_exists_in_path("diff"); + end, + + -- merge procedure request check + wanted = function () + -- assume it is requested (if it is available at all) + return true + end +} + mergers.emacs = { cmd = function (tbl) local emacs @@ -551,15 +646,21 @@ mergers.emacs = { emacs = "emacs" end local elisp = "(ediff-merge-files-with-ancestor \"%s\" \"%s\" \"%s\" nil \"%s\")" - local ret = execute(emacs, "--eval", - string.format(elisp, tbl.lfile, tbl.rfile, tbl.afile, tbl.outfile)) + -- Converting backslashes is necessary on Win32 MinGW; emacs + -- lisp string syntax says '\' is an escape. + local ret = execute(emacs, "--eval", + string.format(elisp, + string.gsub (tbl.lfile, "\\", "/"), + string.gsub (tbl.rfile, "\\", "/"), + string.gsub (tbl.afile, "\\", "/"), + string.gsub (tbl.outfile, "\\", "/"))) if (ret ~= 0) then io.write(string.format(gettext("Error running merger '%s'\n"), emacs)) return false end return tbl.outfile end, - available = + available = function () return program_exists_in_path("xemacs") or program_exists_in_path("emacs") @@ -579,12 +680,12 @@ mergers.xxdiff = { mergers.xxdiff = { cmd = function (tbl) local path = "xxdiff" - local ret = execute(path, + local ret = execute(path, "--title1", tbl.left_path, "--title2", tbl.right_path, "--title3", tbl.merged_path, - tbl.lfile, tbl.afile, tbl.rfile, - "--merge", + tbl.lfile, tbl.afile, tbl.rfile, + "--merge", "--merged-filename", tbl.outfile, "--exit-with-merge-status") if (ret ~= 0) then @@ -600,12 +701,12 @@ mergers.kdiff3 = { mergers.kdiff3 = { cmd = function (tbl) local path = "kdiff3" - local ret = execute(path, + local ret = execute(path, "--L1", tbl.anc_path, "--L2", tbl.left_path, "--L3", tbl.right_path, - tbl.afile, tbl.lfile, tbl.rfile, - "--merge", + tbl.afile, tbl.lfile, tbl.rfile, + "--merge", "--o", tbl.outfile) if (ret ~= 0) then io.write(string.format(gettext("Error running merger '%s'\n"), path)) @@ -637,8 +738,8 @@ function write_to_temporary_file(data, n function write_to_temporary_file(data, namehint) tmp, filename = temp_file(namehint) - if (tmp == nil) then - return nil + if (tmp == nil) then + return nil end; tmp:write(data) io.close(tmp) @@ -662,7 +763,7 @@ function read_contents_of_file(filename, end function read_contents_of_file(filename, mode) - tmp = io.open(filename, mode) + tmp = io.open(filename, mode) if (tmp == nil) then return nil end @@ -708,39 +809,39 @@ end end end -function merge3 (anc_path, left_path, right_path, merged_path, ancestor, left, right) +function merge3 (anc_path, left_path, right_path, merged_path, ancestor, left, right) local ret = nil local tbl = {} - - tbl.anc_path = anc_path - tbl.left_path = left_path - tbl.right_path = right_path - tbl.merged_path = merged_path - tbl.afile = nil - tbl.lfile = nil - tbl.rfile = nil - tbl.outfile = nil - tbl.meld_exists = false + tbl.anc_path = anc_path + tbl.left_path = left_path + tbl.right_path = right_path + + tbl.merged_path = merged_path + tbl.afile = nil + tbl.lfile = nil + tbl.rfile = nil + tbl.outfile = nil + tbl.meld_exists = false tbl.lfile = write_to_temporary_file (left, "left") tbl.afile = write_to_temporary_file (ancestor, "ancestor") tbl.rfile = write_to_temporary_file (right, "right") tbl.outfile = write_to_temporary_file ("", "merged") - - if tbl.lfile ~= nil and tbl.rfile ~= nil and tbl.afile ~= nil and tbl.outfile ~= nil - then + + if tbl.lfile ~= nil and tbl.rfile ~= nil and tbl.afile ~= nil and tbl.outfile ~= nil + then local cmd,mkey = get_preferred_merge3_command (tbl) - if cmd ~=nil - then - io.write (string.format(gettext("executing external 3-way merge command\n"))) + if cmd ~=nil + then + io.write ("mtn: " .. string.format(gettext("executing external 3-way merge via \"%s\" merger\n"), mkey)) ret = cmd (tbl) if not ret then ret = nil else ret = read_contents_of_file (ret, "r") - if string.len (ret) == 0 - then - ret = nil + if string.len (ret) == 0 + then + ret = nil end end else @@ -748,23 +849,23 @@ function merge3 (anc_path, left_path, ri io.write (string.format("The possible commands for the "..mkey.." merger aren't available.\n".. "You may want to check that $MTN_MERGE or the lua variable `merger' is set\n".. "to something available. If you want to use vim or emacs, you can also\n".. - "set $EDITOR to something appropriate")) + "set $EDITOR to something appropriate.\n")) else io.write (string.format("No external 3-way merge command found.\n".. "You may want to check that $EDITOR is set to an editor that supports 3-way\n".. "merge, set this explicitly in your get_preferred_merge3_command hook,\n".. - "or add a 3-way merge program to your path.\n\n")) + "or add a 3-way merge program to your path.\n")) end end end - + os.remove (tbl.lfile) os.remove (tbl.rfile) os.remove (tbl.afile) os.remove (tbl.outfile) - + return ret -end +end -- expansion of values used in selector completion @@ -800,7 +901,7 @@ function expand_selector(str) then return ("d:" .. dtstr) end - + return nil end @@ -813,35 +914,35 @@ function expand_date(str) return (str) end - -- "now" + -- "now" if str == "now" then local t = os.time(os.date('!*t')) return os.date("%FT%T", t) end - + -- today don't uses the time # for xgettext's sake, an extra quote if str == "today" then local t = os.time(os.date('!*t')) return os.date("%F", t) end - + -- "yesterday", the source of all hangovers if str == "yesterday" then local t = os.time(os.date('!*t')) return os.date("%F", t - 86400) end - + -- "CVS style" relative dates such as "3 weeks ago" - local trans = { - minute = 60; - hour = 3600; - day = 86400; - week = 604800; - month = 2678400; - year = 31536000 + local trans = { + minute = 60; + hour = 3600; + day = 86400; + week = 604800; + month = 2678400; + year = 31536000 } local pos, len, n, type = string.find(str, "(%d+) ([minutehordaywk]+)s? ago") if trans[type] ~= nil @@ -850,11 +951,11 @@ function expand_date(str) if trans[type] <= 3600 then return os.date("%FT%T", t - (n * trans[type])) - else + else return os.date("%F", t - (n * trans[type])) end end - + return nil end @@ -959,8 +1060,8 @@ function get_netsync_connect_command(uri local argv = nil - if uri["scheme"] == "ssh" - and uri["host"] + if uri["scheme"] == "ssh" + and uri["host"] and uri["path"] then argv = { "ssh" } @@ -973,7 +1074,7 @@ function get_netsync_connect_command(uri table.insert(argv, uri["port"]) end - -- ssh://host/~/dir/file.mtn or + -- ssh://host/~/dir/file.mtn or -- ssh://host/~user/dir/file.mtn should be home-relative if string.find(uri["path"], "^/~") then uri["path"] = string.sub(uri["path"], 2) @@ -981,33 +1082,61 @@ function get_netsync_connect_command(uri table.insert(argv, uri["host"]) end - + if uri["scheme"] == "file" and uri["path"] then argv = { } end - if argv then + if uri["scheme"] == "ssh+ux" + and uri["host"] + and uri["path"] then - table.insert(argv, get_mtn_command(uri["host"])) + argv = { "ssh" } + if uri["user"] then + table.insert(argv, "-l") + table.insert(argv, uri["user"]) + end + if uri["port"] then + table.insert(argv, "-p") + table.insert(argv, uri["port"]) + end - if args["debug"] then - table.insert(argv, "--debug") - else - table.insert(argv, "--quiet") + -- ssh://host/~/dir/file.mtn or + -- ssh://host/~user/dir/file.mtn should be home-relative + if string.find(uri["path"], "^/~") then + uri["path"] = string.sub(uri["path"], 2) end - table.insert(argv, "--db") - table.insert(argv, uri["path"]) - table.insert(argv, "serve") - table.insert(argv, "--stdio") - table.insert(argv, "--no-transport-auth") + table.insert(argv, uri["host"]) + table.insert(argv, get_remote_unix_socket_command(uri["host"])) + table.insert(argv, "-") + table.insert(argv, "UNIX-CONNECT:" .. uri["path"]) + else + -- start remote monotone process + if argv then + table.insert(argv, get_mtn_command(uri["host"])) + + if args["debug"] then + table.insert(argv, "--debug") + else + table.insert(argv, "--quiet") + end + + table.insert(argv, "--db") + table.insert(argv, uri["path"]) + table.insert(argv, "serve") + table.insert(argv, "--stdio") + table.insert(argv, "--no-transport-auth") + + end end return argv end function use_transport_auth(uri) - if uri["scheme"] == "ssh" + if uri["scheme"] == "ssh" + or uri["scheme"] == "ssh+ux" or uri["scheme"] == "file" then return false else @@ -1018,6 +1147,74 @@ end function get_mtn_command(host) return "mtn" end + +function get_remote_unix_socket_command(host) + return "socat" +end + +-- Netsync notifiers are tables containing 5 functions: +-- start, revision_received, cert_received, pubkey_received and end +-- Those functions take exactly the same arguments as the corresponding +-- note_netsync functions, but return a different kind of value, a tuple +-- composed of a return code and a value to be returned back to monotone. +-- The codes are strings: +-- "continue" and "stop" +-- When the code "continue" is returned and there's another notifier, the +-- second value is ignored and the next notifier is called. Otherwise, +-- the second value is returned immediately. +netsync_notifiers = {} + +function _note_netsync_helper(f,...) + local s = "continue" + local v = nil + for _,n in pairs(netsync_notifiers) do + if n[f] then + s,v = n[f](...) + end + if s ~= "continue" then + break + end + end + return v +end +function note_netsync_start(...) + return _note_netsync_helper("start",...) +end +function note_netsync_revision_received(...) + return _note_netsync_helper("revision_received",...) +end +function note_netsync_cert_received(...) + return _note_netsync_helper("cert_received",...) +end +function note_netsync_pubkey_received(...) + return _note_netsync_helper("pubkey_received",...) +end +function note_netsync_end(...) + return _note_netsync_helper("end",...) +end + +function add_netsync_notifier(notifier, precedence) + if type(notifier) ~= "table" or type(precedence) ~= "number" then + return false, "Invalid tyoe" + end + if netsync_notifiers[precedence] then + return false, "Precedence already taken" + end + local warning = nil + for n,f in pairs(notifier) do + if type(n) ~= "string" or n ~= "start" + and n ~= "revision_received" + and n ~= "cert_received" + and n ~= "pubkey_received" + and n ~= "end" then + warning = "Unknown item found in notifier table" + elseif type(f) ~= "function" then + return false, "Value for notifier item "..n.." isn't a function" + end + end + netsync_notifiers[precedence] = notifier + return true, warning +end
============================================================ --- docs/General-Index.html a11b038433091e40b0854bbaa127d19b76348dc8 +++ docs/General-Index.html d9860f42e76b6146e214ff8bd608285a2ba6f70a @@ -32,39 +32,40 @@ Up: General Index ============================================================ --- docs/Hook-Reference.html 2dc893bf620477f318604f8c52863207ee23dafb +++ docs/Hook-Reference.html 7a412358366c360d81a2e428b6b824cccaecb272 @@ -75,8 +75,8 @@ functions exposing functionality not ava functions exposing functionality not available with standard Lua. ============================================================ --- docs/Hooks.html ea0cfac9c17a82b5e6c81c581b1fca4175e8c79f +++ docs/Hooks.html 673c5a499a3c72d3ef132f2470ae863c62d57ea4 @@ -46,7 +46,7 @@ are taken. are taken.
-
note_commit (new_id, revision, certs)
+
note_commit (new_id, revision, certs)
Called by monotone after the version new_id is committed. The second parameter, revision is the text of the revision, what would be given by mtn automate get_revision new_id. The third @@ -61,7 +61,7 @@ should not perform any security-critical commit-notification systems such as mailing lists or news services. It should not perform any security-critical operations. -
note_netsync_start (session_id, my_role, sync_type,
remote_host, remote_keyname, includes, excludes) +
note_netsync_start (session_id, my_role, sync_type,
remote_host, remote_keyname, includes, excludes)

Called by monotone before any other of the netsync notification hooks are called. The session_id helps keep track of the current netsync @@ -92,7 +92,7 @@ The include and exclude patterns used by

-
note_netsync_revision_received (new_id, revision, certs, session_id)
+
note_netsync_revision_received (new_id, revision, certs, session_id)
Called by monotone after the revision new_id is received through netsync. revision is the text of the revision, what would be given by mtn automate get_revision new_id. certs is a @@ -104,7 +104,7 @@ tracking, you can ignore that variable e note_netsync_end. If you're not interested in that type of tracking, you can ignore that variable entirely. -
note_netsync_cert_received (rev_id, key, name, value, session_id)
+
note_netsync_cert_received (rev_id, key, name, value, session_id)
Called by monotone after a cert is received through netsync, if the revision that the cert is attached to was not also received in the same netsync operation. rev_id is the revision id that the cert is attached to, @@ -115,14 +115,14 @@ tracking, you can ignore that variable e note_netsync_end. If you're not interested in that type of tracking, you can ignore that variable entirely. -
note_netsync_pubkey_received (keyname, session_id)
+
note_netsync_pubkey_received (keyname, session_id)
Called by monotone after a pubkey is received through netsync. keyname is the name of the key received. There is no default definition for this hook. session_id is used together with note_netsync_start and note_netsync_end. If you're not interested in that type of tracking, you can ignore that variable entirely. -
note_netsync_end (session_id, status,
bytes_in, bytes_out, certs_in, certs_out, +
note_netsync_end (session_id, status,
bytes_in, bytes_out, certs_in, certs_out, revs_in, revs_out, keys_in, keys_out)

Called by monotone after all other the netsync notification hooks have @@ -170,7 +170,7 @@ data was transferred. have been transferred, xx2 means no data was transferred, and xx0 means all data was transferred. -

note_mtn_startup (...)
+
note_mtn_startup (...)
Called by monotone when it is first started, this hook was added so that usage of monotone could be monitored for user interface testing. Note that by default, no monitoring occurs. The arguments to the hook @@ -195,7 +195,7 @@ prompted for. prompted for.
-
get_branch_key (branchname)
+
get_branch_key (branchname)
Returns a string which is the name of an rsa private key used to sign certificates in a particular branch branchname. There is no default definition for this hook. The command-line option @@ -205,7 +205,7 @@ to use the unique private key. --key=keyname option; monotone will guess that you want to use the unique private key. -
get_netsync_key(server, include, exclude)
+
get_netsync_key(server, include, exclude)
Returns a string which is the name of the key to use to authenticate the given netsync connection. When called by the serve command, server is the address monotone is listening on, include is @@ -215,7 +215,7 @@ hook function. --key=keyname overrides any value returned from this hook function. -
get_passphrase (keypair_id)
+
get_passphrase (keypair_id)
Returns a string which is the passphrase used to encrypt the private half of keypair_id in your database, using the arc4 symmetric cipher. keypair_id is a Lua string containing the label that you @@ -224,7 +224,7 @@ a passphrase each time it needs to use a this hook is not defined or returns false, monotone will prompt you for a passphrase each time it needs to use a private key. -
get_author (branchname, keypair_id)
+
get_author (branchname, keypair_id)
Returns a string which is used as a value for automatically generated author certificates when you commit changes to branchname with the keypair identity keypair_id. Generally @@ -250,7 +250,7 @@ definitions might be: return keypair_id end -
edit_comment (commentary, user_log_message)
+
edit_comment (commentary, user_log_message)
Returns a log entry for a given set of changes, described in commentary. The commentary is identical to the output of mtn status. This hook is intended to interface with @@ -267,7 +267,7 @@ the system up for another edit/commit cy

For the default definition of this hook, see Default hooks. -

persist_phrase_ok ()
+
persist_phrase_ok ()
Returns true if you want monotone to remember the passphrase of a private key for the duration of a single command, or false if you want monotone to prompt you for a passphrase for each certificate @@ -280,7 +280,7 @@ probably want this hook to return return true end -
use_inodeprints ()
+
use_inodeprints ()
Returns true if you want monotone to automatically enable Inodeprints support in all workspaces. Only affects working copies created after you modify the hook. @@ -290,7 +290,7 @@ copies created after you modify the hook return false end -
ignore_file (filename)
+
ignore_file (filename)
Returns true if filename should be ignored while adding, dropping, or moving files. Otherwise returns false. This is most important when performing recursive actions on directories, which @@ -303,7 +303,7 @@ default definition of this hook, see Default hooks. -
ignore_branch (branchname)
+
ignore_branch (branchname)
Returns true if branchname should be ignored while listing branches. Otherwise returns false. This hook has no default definition, therefore the default behavior is to list all branches. @@ -320,7 +320,7 @@ changed. changed.
-
get_netsync_read_permitted (branch, identity)
+
get_netsync_read_permitted (branch, identity)
Returns true if a peer authenticated as key identity should be allowed to read from your database certs, revisions, manifests, and files associated with branch; otherwise false. @@ -362,7 +362,7 @@ key fingerprints of each key in your dat key fingerprints of each key in your database, as key ID strings are “convenience names”, not security tokens. -
get_netsync_write_permitted (identity)
+
get_netsync_write_permitted (identity)
Returns true if a peer authenticated as key identity should be allowed to write into your database certs, revisions, manifests, and files; otherwise false. The default definition of this hook reads a file @@ -401,7 +401,7 @@ a TCP socket. a TCP socket.
-
get_netsync_connect_command (uri, args)
+
get_netsync_connect_command (uri, args)
Returns a table describing a command to run to connect to the specified host. The uri argument is a table containing between 0 and 7 components: @@ -480,7 +480,7 @@ components: return argv end -
use_transport_auth (uri)
+
use_transport_auth (uri)
Returns a boolean indicating whether monotone should use transport authentication mechanisms when communicating with uri. If this hook fails, the return value is assumed to be true. The form of @@ -506,7 +506,7 @@ authentication assumptions. end end -
get_mtn_command(host)
+
get_mtn_command(host)
Returns a string containing the monotone command to be executed on host when communicating over ssh. The host argument is a string containing the name of the host to which @@ -538,7 +538,7 @@ valid revisions, according to their own valid revisions, according to their own preferences and purposes.
-
get_revision_cert_trust (signers, id, name, val)
+
get_revision_cert_trust (signers, id, name, val)
Returns whether or not you trust the assertion name=value on a given revision id, given a valid signature from all the keys in signers. The signers @@ -584,7 +584,7 @@ the revision has been approved by an ext the revision has been approved by an extra “reviewer” who used the approve command. -
accept_testresult_change (old_results, new_results)
+
accept_testresult_change (old_results, new_results)
This hook is used by the update algorithm to determine whether a change in test results between update source and update target is acceptable. The hook is called with two tables, each of which maps a @@ -622,7 +622,7 @@ customisation of the way file difference customisation of the way file differences are shown.
-
get_encloser_pattern (file_path)
+
get_encloser_pattern (file_path)
Called for each file when diff is given the --show-encloser option (and not the --external option). file_path is the pathname of the @@ -635,9 +635,10 @@ hook; and if you send it to the monotone regular expressions that match their particular syntax. If you have a better regular expression for some language, you can add it to this hook; and if you send it to the monotone developers, we will likely -make it to the default for that language. +make it the default for that language. See Regexps, for the +regular expression syntax. -
external_diff (file_path, old_data, new_data, is_binary,
diff_args, old_rev, new_rev) +
external_diff (file_path, old_data, new_data, is_binary,
diff_args, old_rev, new_rev)

Called for each file when diff is given the --external option. file_path is the pathname of the @@ -672,7 +673,7 @@ you have a tool specific to certain file

-
merge3 (ancestor_path, left_path, right_path, merged_path, ancestor_text, left_text, right_text)
+
merge3 (ancestor_path, left_path, right_path, merged_path, ancestor_text, left_text, right_text)
This hook is called to resolve merges that monotone could not resolve automatically. The actual ancestor, left, and right contents of the file are passed in the ancestor_text, left_text, and @@ -695,7 +696,7 @@ local system. For details, see the code local system. For details, see the code in Default hooks.

-

get_preferred_merge3_command(tbl)
+
get_preferred_merge3_command(tbl)
Returns the results of running an external merge on three strings. tbl wraps up the various arguments for each merge command and is always provided by merge3. If there is a particular editor @@ -715,7 +716,7 @@ expand them to their full form.

For more detail on the use of selectors, see Selectors.

-
expand_selector (str)
+
expand_selector (str)
Attempts to expand str as a selector. Expansion generally means providing a type prefix for the selector, such as a: for authors or d: for dates. This hook is called once for each element of a @@ -723,7 +724,7 @@ evaluation of the selector. For the defa evaluation of the selector. For the default definition of this hook, see Default hooks. -
expand_date (str)
+
expand_date (str)
Attempts to expand str as a date expression. Expansion means recognizing and interpreting special words such as yesterday or 6 months ago and converting them into well formed date @@ -746,7 +747,7 @@ according to its attributes when the wor according to its attributes when the workspace is changed.
-
attr_functions [attribute] (filename, value)
+
attr_functions [attribute] (filename, value)
This is not a hook function, but a table of hook functions. Each entry in the table attr_functions, at table entry attribute, is a function taking a file name filename @@ -769,7 +770,7 @@ attribute. Its definition is: end end -
attr_init_functions [attribute] (filename)
+
attr_init_functions [attribute] (filename)
This is not a hook function, but a table of hook functions. Each entry in the table attr_init_functions, at table entry attribute, is a function taking a file (or @@ -809,7 +810,7 @@ allow a client to validate or reject cer allow a client to validate or reject certain behaviors.
-
validate_commit_message (message, revision_text, branchname)
+
validate_commit_message (message, revision_text, branchname)
This hook is called after the user has entered his/her commit message. message is the commit message that the user has entered and revision_text is the full text of the changes for this revision, ============================================================ --- docs/Informative.html 468bbecb358334b5ff81068ae5c000756c8ad89e +++ docs/Informative.html ffe3b1df0339c4769f834001b639319d04ab2ea4 @@ -96,17 +96,16 @@ where those files are changed.

If one or more files are given, the command will only log the revisions where those files are changed. -

mtn annotate file
mtn annotate [--revision=id] [--brief] file
-Dumps an annotated copy of the file to stdout. In the absence of the ---brief flag, each line of the file -is translated to <revision id>: <line> in the output, where <revision id> -is the revision in which that line of the file was last edited. +
mtn annotate file
mtn annotate [--revision=id] [--revs-only] file
+Dumps an annotated copy of the file to stdout. The output is in the form +<short revision id>.. by <author> <date>: <line> Only the first 8 +characters of the revision id are displayed, the author cert value is +truncated at the first @ or space character and the date field +is truncated to remove the time of day. -

If --brief is specified, the output is in the form -<short revision id>.. by <author> <date>: <line> -Only the first 8 characters of the revision id are displayed, the -author cert value is truncated at the first @ or space -character and the date field is truncated to remove the time of day. +

If --revs-only is specified, each line of the file is +translated to <revision id>: <line> in the output, where <revision id> +is the revision in which that line of the file was last edited.

mtn complete file partial-id
mtn complete [--brief] key partial-id
mtn complete [--brief] revision partial-id
These commands print out all known completions of a partial sha1 @@ -182,7 +181,8 @@ adjust the expression used with the Lua hunk for a line that matches a regular expression. The default regular expression is correct for many programming languages. You can adjust the expression used with the Lua hook -get_encloser_pattern; Hooks. +get_encloser_pattern; Hooks. For the regular expression +syntax, See Regexps.

--unified requests the “unified diff” format, the default. --context requests the “context diff” format (analogous to ============================================================ --- docs/Key-and-Cert-Trust.html 4e03afc439a3c78f282345064f1491799a309e57 +++ docs/Key-and-Cert-Trust.html 87061cac1bac7fc663f3b53fa7d2635f072e8083 @@ -69,7 +69,31 @@ those keys. Monotone would trust a cert on that revision with that value signed by those keys. -

mtn ssh_agent_export filename
+
mtn ssh_agent_add
+This command will add your monotone keys to your current ssh-agent session. +You will be asked for the passphrase for each of your monotone private keys +and they will be added to the ssh-agent. Once this is done you should be able +to type ssh-add -l and see your monotone key listed. When you +subsequently use these keys through monotone it will use ssh-agent for signing +without asking your for your passphrase. + +

This command is mainly for use in a session script as monotone will automatically +add your keys to ssh-agent on first use if it is available. For example the +following two examples are equivalent: + +

          $ mtn ssh_agent_add
+          enter passphrase for key ID address@hidden:
+          $ mtn ci -m"Changed foo to bar"
+          $ mtn push -k address@hidden
+     
+
          $ mtn ci -m"Changed foo to bar"
+          enter passphrase for key ID address@hidden:
+          $ mtn push -k address@hidden
+     
+

In the second example, monotone automatically added the key to ssh-agent, making +entering the passphrase not needed during the push. + +

mtn ssh_agent_export filename
This command will export your private key in a format that ssh-agent can read (PKCS8, PEM). You will be asked for your current key's password and a new password to encrypt the key with. The key will be printed to @@ -87,7 +111,7 @@ will cache the key for you. Enter passphrase for /home/user/.ssh/id_monotone: Identity added: /home/user/.ssh/id_monotone (/home/user/.ssh/id_monotone) $ mtn ci -m"Changed foo to bar" - $ mtn push + $ mtn push -k address@hidden

You can also use the --ssh-sign option to control whether ssh-agent will be used for signing. If set to yes, ssh-agent will be used to sign. If your ============================================================ --- docs/Mark_002dMerge.html 170d0082dea175d046bd02afac38a47877f4eeab +++ docs/Mark_002dMerge.html 7b95b1f2f7b6c7414d5523663e761de124a09529 @@ -7,6 +7,7 @@ +