602 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			602 lines
		
	
	
		
			22 KiB
		
	
	
	
		
			HTML
		
	
	
	
<html>
 | 
						|
<head>
 | 
						|
<title>pcre2syntax specification</title>
 | 
						|
</head>
 | 
						|
<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 | 
						|
<h1>pcre2syntax man page</h1>
 | 
						|
<p>
 | 
						|
Return to the <a href="index.html">PCRE2 index page</a>.
 | 
						|
</p>
 | 
						|
<p>
 | 
						|
This page is part of the PCRE2 HTML documentation. It was generated
 | 
						|
automatically from the original man page. If there is any nonsense in it,
 | 
						|
please consult the man page, in case the conversion went wrong.
 | 
						|
<br>
 | 
						|
<ul>
 | 
						|
<li><a name="TOC1" href="#SEC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a>
 | 
						|
<li><a name="TOC2" href="#SEC2">QUOTING</a>
 | 
						|
<li><a name="TOC3" href="#SEC3">ESCAPED CHARACTERS</a>
 | 
						|
<li><a name="TOC4" href="#SEC4">CHARACTER TYPES</a>
 | 
						|
<li><a name="TOC5" href="#SEC5">GENERAL CATEGORY PROPERTIES FOR \p and \P</a>
 | 
						|
<li><a name="TOC6" href="#SEC6">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a>
 | 
						|
<li><a name="TOC7" href="#SEC7">BINARY PROPERTIES FOR \p AND \P</a>
 | 
						|
<li><a name="TOC8" href="#SEC8">SCRIPT MATCHING WITH \p AND \P</a>
 | 
						|
<li><a name="TOC9" href="#SEC9">THE BIDI_CLASS PROPERTY FOR \p AND \P</a>
 | 
						|
<li><a name="TOC10" href="#SEC10">CHARACTER CLASSES</a>
 | 
						|
<li><a name="TOC11" href="#SEC11">QUANTIFIERS</a>
 | 
						|
<li><a name="TOC12" href="#SEC12">ANCHORS AND SIMPLE ASSERTIONS</a>
 | 
						|
<li><a name="TOC13" href="#SEC13">REPORTED MATCH POINT SETTING</a>
 | 
						|
<li><a name="TOC14" href="#SEC14">ALTERNATION</a>
 | 
						|
<li><a name="TOC15" href="#SEC15">CAPTURING</a>
 | 
						|
<li><a name="TOC16" href="#SEC16">ATOMIC GROUPS</a>
 | 
						|
<li><a name="TOC17" href="#SEC17">COMMENT</a>
 | 
						|
<li><a name="TOC18" href="#SEC18">OPTION SETTING</a>
 | 
						|
<li><a name="TOC19" href="#SEC19">NEWLINE CONVENTION</a>
 | 
						|
<li><a name="TOC20" href="#SEC20">WHAT \R MATCHES</a>
 | 
						|
<li><a name="TOC21" href="#SEC21">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a>
 | 
						|
<li><a name="TOC22" href="#SEC22">NON-ATOMIC LOOKAROUND ASSERTIONS</a>
 | 
						|
<li><a name="TOC23" href="#SEC23">SCRIPT RUNS</a>
 | 
						|
<li><a name="TOC24" href="#SEC24">BACKREFERENCES</a>
 | 
						|
<li><a name="TOC25" href="#SEC25">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a>
 | 
						|
<li><a name="TOC26" href="#SEC26">CONDITIONAL PATTERNS</a>
 | 
						|
<li><a name="TOC27" href="#SEC27">BACKTRACKING CONTROL</a>
 | 
						|
<li><a name="TOC28" href="#SEC28">CALLOUTS</a>
 | 
						|
<li><a name="TOC29" href="#SEC29">SEE ALSO</a>
 | 
						|
<li><a name="TOC30" href="#SEC30">AUTHOR</a>
 | 
						|
<li><a name="TOC31" href="#SEC31">REVISION</a>
 | 
						|
</ul>
 | 
						|
<br><a name="SEC1" href="#TOC1">PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY</a><br>
 | 
						|
<P>
 | 
						|
The full syntax and semantics of the regular expressions that are supported by
 | 
						|
PCRE2 are described in the
 | 
						|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 | 
						|
documentation. This document contains a quick-reference summary of the syntax.
 | 
						|
</P>
 | 
						|
<br><a name="SEC2" href="#TOC1">QUOTING</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  \x         where x is non-alphanumeric is a literal x
 | 
						|
  \Q...\E    treat enclosed characters as literal
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC3" href="#TOC1">ESCAPED CHARACTERS</a><br>
 | 
						|
<P>
 | 
						|
This table applies to ASCII and Unicode environments. An unrecognized escape
 | 
						|
sequence causes an error.
 | 
						|
<pre>
 | 
						|
  \a         alarm, that is, the BEL character (hex 07)
 | 
						|
  \cx        "control-x", where x is any ASCII printing character
 | 
						|
  \e         escape (hex 1B)
 | 
						|
  \f         form feed (hex 0C)
 | 
						|
  \n         newline (hex 0A)
 | 
						|
  \r         carriage return (hex 0D)
 | 
						|
  \t         tab (hex 09)
 | 
						|
  \0dd       character with octal code 0dd
 | 
						|
  \ddd       character with octal code ddd, or backreference
 | 
						|
  \o{ddd..}  character with octal code ddd..
 | 
						|
  \N{U+hh..} character with Unicode code point hh.. (Unicode mode only)
 | 
						|
  \xhh       character with hex code hh
 | 
						|
  \x{hh..}   character with hex code hh..
 | 
						|
</pre>
 | 
						|
If PCRE2_ALT_BSUX or PCRE2_EXTRA_ALT_BSUX is set ("ALT_BSUX mode"), the
 | 
						|
following are also recognized:
 | 
						|
<pre>
 | 
						|
  \U         the character "U"
 | 
						|
  \uhhhh     character with hex code hhhh
 | 
						|
  \u{hh..}   character with hex code hh.. but only for EXTRA_ALT_BSUX
 | 
						|
</pre>
 | 
						|
When \x is not followed by {, from zero to two hexadecimal digits are read,
 | 
						|
but in ALT_BSUX mode \x must be followed by two hexadecimal digits to be
 | 
						|
recognized as a hexadecimal escape; otherwise it matches a literal "x".
 | 
						|
Likewise, if \u (in ALT_BSUX mode) is not followed by four hexadecimal digits
 | 
						|
or (in EXTRA_ALT_BSUX mode) a sequence of hex digits in curly brackets, it
 | 
						|
matches a literal "u".
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Note that \0dd is always an octal code. The treatment of backslash followed by
 | 
						|
a non-zero digit is complicated; for details see the section
 | 
						|
<a href="pcre2pattern.html#digitsafterbackslash">"Non-printing characters"</a>
 | 
						|
in the
 | 
						|
<a href="pcre2pattern.html"><b>pcre2pattern</b></a>
 | 
						|
documentation, where details of escape processing in EBCDIC environments are
 | 
						|
also given. \N{U+hh..} is synonymous with \x{hh..} in PCRE2 but is not
 | 
						|
supported in EBCDIC environments. Note that \N not followed by an opening
 | 
						|
curly bracket has a different meaning (see below).
 | 
						|
</P>
 | 
						|
<br><a name="SEC4" href="#TOC1">CHARACTER TYPES</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  .          any character except newline;
 | 
						|
               in dotall mode, any character whatsoever
 | 
						|
  \C         one code unit, even in UTF mode (best avoided)
 | 
						|
  \d         a decimal digit
 | 
						|
  \D         a character that is not a decimal digit
 | 
						|
  \h         a horizontal white space character
 | 
						|
  \H         a character that is not a horizontal white space character
 | 
						|
  \N         a character that is not a newline
 | 
						|
  \p{<i>xx</i>}     a character with the <i>xx</i> property
 | 
						|
  \P{<i>xx</i>}     a character without the <i>xx</i> property
 | 
						|
  \R         a newline sequence
 | 
						|
  \s         a white space character
 | 
						|
  \S         a character that is not a white space character
 | 
						|
  \v         a vertical white space character
 | 
						|
  \V         a character that is not a vertical white space character
 | 
						|
  \w         a "word" character
 | 
						|
  \W         a "non-word" character
 | 
						|
  \X         a Unicode extended grapheme cluster
 | 
						|
</pre>
 | 
						|
\C is dangerous because it may leave the current matching point in the middle
 | 
						|
of a UTF-8 or UTF-16 character. The application can lock out the use of \C by
 | 
						|
setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
 | 
						|
with the use of \C permanently disabled.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
 | 
						|
or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
 | 
						|
happening, \s and \w may also match characters with code points in the range
 | 
						|
128-255. If the PCRE2_UCP option is set, the behaviour of these escape
 | 
						|
sequences is changed to use Unicode properties and they match many more
 | 
						|
characters.
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
Property descriptions in \p and \P are matched caselessly; hyphens,
 | 
						|
underscores, and white space are ignored, in accordance with Unicode's "loose
 | 
						|
matching" rules.
 | 
						|
</P>
 | 
						|
<br><a name="SEC5" href="#TOC1">GENERAL CATEGORY PROPERTIES FOR \p and \P</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  C          Other
 | 
						|
  Cc         Control
 | 
						|
  Cf         Format
 | 
						|
  Cn         Unassigned
 | 
						|
  Co         Private use
 | 
						|
  Cs         Surrogate
 | 
						|
 | 
						|
  L          Letter
 | 
						|
  Ll         Lower case letter
 | 
						|
  Lm         Modifier letter
 | 
						|
  Lo         Other letter
 | 
						|
  Lt         Title case letter
 | 
						|
  Lu         Upper case letter
 | 
						|
  Lc         Ll, Lu, or Lt
 | 
						|
  L&         Ll, Lu, or Lt
 | 
						|
 | 
						|
  M          Mark
 | 
						|
  Mc         Spacing mark
 | 
						|
  Me         Enclosing mark
 | 
						|
  Mn         Non-spacing mark
 | 
						|
 | 
						|
  N          Number
 | 
						|
  Nd         Decimal number
 | 
						|
  Nl         Letter number
 | 
						|
  No         Other number
 | 
						|
 | 
						|
  P          Punctuation
 | 
						|
  Pc         Connector punctuation
 | 
						|
  Pd         Dash punctuation
 | 
						|
  Pe         Close punctuation
 | 
						|
  Pf         Final punctuation
 | 
						|
  Pi         Initial punctuation
 | 
						|
  Po         Other punctuation
 | 
						|
  Ps         Open punctuation
 | 
						|
 | 
						|
  S          Symbol
 | 
						|
  Sc         Currency symbol
 | 
						|
  Sk         Modifier symbol
 | 
						|
  Sm         Mathematical symbol
 | 
						|
  So         Other symbol
 | 
						|
 | 
						|
  Z          Separator
 | 
						|
  Zl         Line separator
 | 
						|
  Zp         Paragraph separator
 | 
						|
  Zs         Space separator
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC6" href="#TOC1">PCRE2 SPECIAL CATEGORY PROPERTIES FOR \p and \P</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  Xan        Alphanumeric: union of properties L and N
 | 
						|
  Xps        POSIX space: property Z or tab, NL, VT, FF, CR
 | 
						|
  Xsp        Perl space: property Z or tab, NL, VT, FF, CR
 | 
						|
  Xuc        Univerally-named character: one that can be
 | 
						|
               represented by a Universal Character Name
 | 
						|
  Xwd        Perl word: property Xan or underscore
 | 
						|
</pre>
 | 
						|
Perl and POSIX space are now the same. Perl added VT to its space character set
 | 
						|
at release 5.18.
 | 
						|
</P>
 | 
						|
<br><a name="SEC7" href="#TOC1">BINARY PROPERTIES FOR \p AND \P</a><br>
 | 
						|
<P>
 | 
						|
Unicode defines a number of binary properties, that is, properties whose only
 | 
						|
values are true or false. You can obtain a list of those that are recognized by
 | 
						|
\p and \P, along with their abbreviations, by running this command:
 | 
						|
<pre>
 | 
						|
  pcre2test -LP
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC8" href="#TOC1">SCRIPT MATCHING WITH \p AND \P</a><br>
 | 
						|
<P>
 | 
						|
Many script names and their 4-letter abbreviations are recognized in
 | 
						|
\p{sc:...} or \p{scx:...} items, or on their own with \p (and also \P of
 | 
						|
course). You can obtain a list of these scripts by running this command:
 | 
						|
<pre>
 | 
						|
  pcre2test -LS
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC9" href="#TOC1">THE BIDI_CLASS PROPERTY FOR \p AND \P</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  \p{Bidi_Class:<class>}   matches a character with the given class
 | 
						|
  \p{BC:<class>}           matches a character with the given class
 | 
						|
</pre>
 | 
						|
The recognized classes are:
 | 
						|
<pre>
 | 
						|
  AL          Arabic letter
 | 
						|
  AN          Arabic number
 | 
						|
  B           paragraph separator
 | 
						|
  BN          boundary neutral
 | 
						|
  CS          common separator
 | 
						|
  EN          European number
 | 
						|
  ES          European separator
 | 
						|
  ET          European terminator
 | 
						|
  FSI         first strong isolate
 | 
						|
  L           left-to-right
 | 
						|
  LRE         left-to-right embedding
 | 
						|
  LRI         left-to-right isolate
 | 
						|
  LRO         left-to-right override
 | 
						|
  NSM         non-spacing mark
 | 
						|
  ON          other neutral
 | 
						|
  PDF         pop directional format
 | 
						|
  PDI         pop directional isolate
 | 
						|
  R           right-to-left
 | 
						|
  RLE         right-to-left embedding
 | 
						|
  RLI         right-to-left isolate
 | 
						|
  RLO         right-to-left override
 | 
						|
  S           segment separator
 | 
						|
  WS          which space
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC10" href="#TOC1">CHARACTER CLASSES</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  [...]       positive character class
 | 
						|
  [^...]      negative character class
 | 
						|
  [x-y]       range (can be used for hex characters)
 | 
						|
  [[:xxx:]]   positive POSIX named set
 | 
						|
  [[:^xxx:]]  negative POSIX named set
 | 
						|
 | 
						|
  alnum       alphanumeric
 | 
						|
  alpha       alphabetic
 | 
						|
  ascii       0-127
 | 
						|
  blank       space or tab
 | 
						|
  cntrl       control character
 | 
						|
  digit       decimal digit
 | 
						|
  graph       printing, excluding space
 | 
						|
  lower       lower case letter
 | 
						|
  print       printing, including space
 | 
						|
  punct       printing, excluding alphanumeric
 | 
						|
  space       white space
 | 
						|
  upper       upper case letter
 | 
						|
  word        same as \w
 | 
						|
  xdigit      hexadecimal digit
 | 
						|
</pre>
 | 
						|
In PCRE2, POSIX character set names recognize only ASCII characters by default,
 | 
						|
but some of them use Unicode properties if PCRE2_UCP is set. You can use
 | 
						|
\Q...\E inside a character class.
 | 
						|
</P>
 | 
						|
<br><a name="SEC11" href="#TOC1">QUANTIFIERS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  ?           0 or 1, greedy
 | 
						|
  ?+          0 or 1, possessive
 | 
						|
  ??          0 or 1, lazy
 | 
						|
  *           0 or more, greedy
 | 
						|
  *+          0 or more, possessive
 | 
						|
  *?          0 or more, lazy
 | 
						|
  +           1 or more, greedy
 | 
						|
  ++          1 or more, possessive
 | 
						|
  +?          1 or more, lazy
 | 
						|
  {n}         exactly n
 | 
						|
  {n,m}       at least n, no more than m, greedy
 | 
						|
  {n,m}+      at least n, no more than m, possessive
 | 
						|
  {n,m}?      at least n, no more than m, lazy
 | 
						|
  {n,}        n or more, greedy
 | 
						|
  {n,}+       n or more, possessive
 | 
						|
  {n,}?       n or more, lazy
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC12" href="#TOC1">ANCHORS AND SIMPLE ASSERTIONS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  \b          word boundary
 | 
						|
  \B          not a word boundary
 | 
						|
  ^           start of subject
 | 
						|
                also after an internal newline in multiline mode
 | 
						|
                (after any newline if PCRE2_ALT_CIRCUMFLEX is set)
 | 
						|
  \A          start of subject
 | 
						|
  $           end of subject
 | 
						|
                also before newline at end of subject
 | 
						|
                also before internal newline in multiline mode
 | 
						|
  \Z          end of subject
 | 
						|
                also before newline at end of subject
 | 
						|
  \z          end of subject
 | 
						|
  \G          first matching position in subject
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC13" href="#TOC1">REPORTED MATCH POINT SETTING</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  \K          set reported start of match
 | 
						|
</pre>
 | 
						|
From release 10.38 \K is not permitted by default in lookaround assertions,
 | 
						|
for compatibility with Perl. However, if the PCRE2_EXTRA_ALLOW_LOOKAROUND_BSK
 | 
						|
option is set, the previous behaviour is re-enabled. When this option is set,
 | 
						|
\K is honoured in positive assertions, but ignored in negative ones.
 | 
						|
</P>
 | 
						|
<br><a name="SEC14" href="#TOC1">ALTERNATION</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  expr|expr|expr...
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC15" href="#TOC1">CAPTURING</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (...)           capture group
 | 
						|
  (?<name>...)    named capture group (Perl)
 | 
						|
  (?'name'...)    named capture group (Perl)
 | 
						|
  (?P<name>...)   named capture group (Python)
 | 
						|
  (?:...)         non-capture group
 | 
						|
  (?|...)         non-capture group; reset group numbers for
 | 
						|
                   capture groups in each alternative
 | 
						|
</pre>
 | 
						|
In non-UTF modes, names may contain underscores and ASCII letters and digits;
 | 
						|
in UTF modes, any Unicode letters and Unicode decimal digits are permitted. In
 | 
						|
both cases, a name must not start with a digit.
 | 
						|
</P>
 | 
						|
<br><a name="SEC16" href="#TOC1">ATOMIC GROUPS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (?>...)         atomic non-capture group
 | 
						|
  (*atomic:...)   atomic non-capture group
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC17" href="#TOC1">COMMENT</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (?#....)        comment (not nestable)
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC18" href="#TOC1">OPTION SETTING</a><br>
 | 
						|
<P>
 | 
						|
Changes of these options within a group are automatically cancelled at the end
 | 
						|
of the group.
 | 
						|
<pre>
 | 
						|
  (?i)            caseless
 | 
						|
  (?J)            allow duplicate named groups
 | 
						|
  (?m)            multiline
 | 
						|
  (?n)            no auto capture
 | 
						|
  (?s)            single line (dotall)
 | 
						|
  (?U)            default ungreedy (lazy)
 | 
						|
  (?x)            extended: ignore white space except in classes
 | 
						|
  (?xx)           as (?x) but also ignore space and tab in classes
 | 
						|
  (?-...)         unset option(s)
 | 
						|
  (?^)            unset imnsx options
 | 
						|
</pre>
 | 
						|
Unsetting x or xx unsets both. Several options may be set at once, and a
 | 
						|
mixture of setting and unsetting such as (?i-x) is allowed, but there may be
 | 
						|
only one hyphen. Setting (but no unsetting) is allowed after (?^ for example
 | 
						|
(?^in). An option setting may appear at the start of a non-capture group, for
 | 
						|
example (?i:...).
 | 
						|
</P>
 | 
						|
<P>
 | 
						|
The following are recognized only at the very start of a pattern or after one
 | 
						|
of the newline or \R options with similar syntax. More than one of them may
 | 
						|
appear. For the first three, d is a decimal number.
 | 
						|
<pre>
 | 
						|
  (*LIMIT_DEPTH=d) set the backtracking limit to d
 | 
						|
  (*LIMIT_HEAP=d)  set the heap size limit to d * 1024 bytes
 | 
						|
  (*LIMIT_MATCH=d) set the match limit to d
 | 
						|
  (*NOTEMPTY)      set PCRE2_NOTEMPTY when matching
 | 
						|
  (*NOTEMPTY_ATSTART) set PCRE2_NOTEMPTY_ATSTART when matching
 | 
						|
  (*NO_AUTO_POSSESS) no auto-possessification (PCRE2_NO_AUTO_POSSESS)
 | 
						|
  (*NO_DOTSTAR_ANCHOR) no .* anchoring (PCRE2_NO_DOTSTAR_ANCHOR)
 | 
						|
  (*NO_JIT)       disable JIT optimization
 | 
						|
  (*NO_START_OPT) no start-match optimization (PCRE2_NO_START_OPTIMIZE)
 | 
						|
  (*UTF)          set appropriate UTF mode for the library in use
 | 
						|
  (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
 | 
						|
</pre>
 | 
						|
Note that LIMIT_DEPTH, LIMIT_HEAP, and LIMIT_MATCH can only reduce the value of
 | 
						|
the limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>,
 | 
						|
not increase them. LIMIT_RECURSION is an obsolete synonym for LIMIT_DEPTH. The
 | 
						|
application can lock out the use of (*UTF) and (*UCP) by setting the
 | 
						|
PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at compile time.
 | 
						|
</P>
 | 
						|
<br><a name="SEC19" href="#TOC1">NEWLINE CONVENTION</a><br>
 | 
						|
<P>
 | 
						|
These are recognized only at the very start of the pattern or after option
 | 
						|
settings with a similar syntax.
 | 
						|
<pre>
 | 
						|
  (*CR)           carriage return only
 | 
						|
  (*LF)           linefeed only
 | 
						|
  (*CRLF)         carriage return followed by linefeed
 | 
						|
  (*ANYCRLF)      all three of the above
 | 
						|
  (*ANY)          any Unicode newline sequence
 | 
						|
  (*NUL)          the NUL character (binary zero)
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC20" href="#TOC1">WHAT \R MATCHES</a><br>
 | 
						|
<P>
 | 
						|
These are recognized only at the very start of the pattern or after option
 | 
						|
setting with a similar syntax.
 | 
						|
<pre>
 | 
						|
  (*BSR_ANYCRLF)  CR, LF, or CRLF
 | 
						|
  (*BSR_UNICODE)  any Unicode newline sequence
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC21" href="#TOC1">LOOKAHEAD AND LOOKBEHIND ASSERTIONS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (?=...)                     )
 | 
						|
  (*pla:...)                  ) positive lookahead
 | 
						|
  (*positive_lookahead:...)   )
 | 
						|
 | 
						|
  (?!...)                     )
 | 
						|
  (*nla:...)                  ) negative lookahead
 | 
						|
  (*negative_lookahead:...)   )
 | 
						|
 | 
						|
  (?<=...)                    )
 | 
						|
  (*plb:...)                  ) positive lookbehind
 | 
						|
  (*positive_lookbehind:...)  )
 | 
						|
 | 
						|
  (?<!...)                    )
 | 
						|
  (*nlb:...)                  ) negative lookbehind
 | 
						|
  (*negative_lookbehind:...)  )
 | 
						|
</pre>
 | 
						|
Each top-level branch of a lookbehind must be of a fixed length.
 | 
						|
</P>
 | 
						|
<br><a name="SEC22" href="#TOC1">NON-ATOMIC LOOKAROUND ASSERTIONS</a><br>
 | 
						|
<P>
 | 
						|
These assertions are specific to PCRE2 and are not Perl-compatible.
 | 
						|
<pre>
 | 
						|
  (?*...)                                )
 | 
						|
  (*napla:...)                           ) synonyms
 | 
						|
  (*non_atomic_positive_lookahead:...)   )
 | 
						|
 | 
						|
  (?<*...)                               )
 | 
						|
  (*naplb:...)                           ) synonyms
 | 
						|
  (*non_atomic_positive_lookbehind:...)  )
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC23" href="#TOC1">SCRIPT RUNS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (*script_run:...)           ) script run, can be backtracked into
 | 
						|
  (*sr:...)                   )
 | 
						|
 | 
						|
  (*atomic_script_run:...)    ) atomic script run
 | 
						|
  (*asr:...)                  )
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC24" href="#TOC1">BACKREFERENCES</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  \n              reference by number (can be ambiguous)
 | 
						|
  \gn             reference by number
 | 
						|
  \g{n}           reference by number
 | 
						|
  \g+n            relative reference by number (PCRE2 extension)
 | 
						|
  \g-n            relative reference by number
 | 
						|
  \g{+n}          relative reference by number (PCRE2 extension)
 | 
						|
  \g{-n}          relative reference by number
 | 
						|
  \k<name>        reference by name (Perl)
 | 
						|
  \k'name'        reference by name (Perl)
 | 
						|
  \g{name}        reference by name (Perl)
 | 
						|
  \k{name}        reference by name (.NET)
 | 
						|
  (?P=name)       reference by name (Python)
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC25" href="#TOC1">SUBROUTINE REFERENCES (POSSIBLY RECURSIVE)</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (?R)            recurse whole pattern
 | 
						|
  (?n)            call subroutine by absolute number
 | 
						|
  (?+n)           call subroutine by relative number
 | 
						|
  (?-n)           call subroutine by relative number
 | 
						|
  (?&name)        call subroutine by name (Perl)
 | 
						|
  (?P>name)       call subroutine by name (Python)
 | 
						|
  \g<name>        call subroutine by name (Oniguruma)
 | 
						|
  \g'name'        call subroutine by name (Oniguruma)
 | 
						|
  \g<n>           call subroutine by absolute number (Oniguruma)
 | 
						|
  \g'n'           call subroutine by absolute number (Oniguruma)
 | 
						|
  \g<+n>          call subroutine by relative number (PCRE2 extension)
 | 
						|
  \g'+n'          call subroutine by relative number (PCRE2 extension)
 | 
						|
  \g<-n>          call subroutine by relative number (PCRE2 extension)
 | 
						|
  \g'-n'          call subroutine by relative number (PCRE2 extension)
 | 
						|
</PRE>
 | 
						|
</P>
 | 
						|
<br><a name="SEC26" href="#TOC1">CONDITIONAL PATTERNS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (?(condition)yes-pattern)
 | 
						|
  (?(condition)yes-pattern|no-pattern)
 | 
						|
 | 
						|
  (?(n)               absolute reference condition
 | 
						|
  (?(+n)              relative reference condition
 | 
						|
  (?(-n)              relative reference condition
 | 
						|
  (?(<name>)          named reference condition (Perl)
 | 
						|
  (?('name')          named reference condition (Perl)
 | 
						|
  (?(name)            named reference condition (PCRE2, deprecated)
 | 
						|
  (?(R)               overall recursion condition
 | 
						|
  (?(Rn)              specific numbered group recursion condition
 | 
						|
  (?(R&name)          specific named group recursion condition
 | 
						|
  (?(DEFINE)          define groups for reference
 | 
						|
  (?(VERSION[>]=n.m)  test PCRE2 version
 | 
						|
  (?(assert)          assertion condition
 | 
						|
</pre>
 | 
						|
Note the ambiguity of (?(R) and (?(Rn) which might be named reference
 | 
						|
conditions or recursion tests. Such a condition is interpreted as a reference
 | 
						|
condition if the relevant named group exists.
 | 
						|
</P>
 | 
						|
<br><a name="SEC27" href="#TOC1">BACKTRACKING CONTROL</a><br>
 | 
						|
<P>
 | 
						|
All backtracking control verbs may be in the form (*VERB:NAME). For (*MARK) the
 | 
						|
name is mandatory, for the others it is optional. (*SKIP) changes its behaviour
 | 
						|
if :NAME is present. The others just set a name for passing back to the caller,
 | 
						|
but this is not a name that (*SKIP) can see. The following act immediately they
 | 
						|
are reached:
 | 
						|
<pre>
 | 
						|
  (*ACCEPT)       force successful match
 | 
						|
  (*FAIL)         force backtrack; synonym (*F)
 | 
						|
  (*MARK:NAME)    set name to be passed back; synonym (*:NAME)
 | 
						|
</pre>
 | 
						|
The following act only when a subsequent match failure causes a backtrack to
 | 
						|
reach them. They all force a match failure, but they differ in what happens
 | 
						|
afterwards. Those that advance the start-of-match point do so only if the
 | 
						|
pattern is not anchored.
 | 
						|
<pre>
 | 
						|
  (*COMMIT)       overall failure, no advance of starting point
 | 
						|
  (*PRUNE)        advance to next starting character
 | 
						|
  (*SKIP)         advance to current matching position
 | 
						|
  (*SKIP:NAME)    advance to position corresponding to an earlier
 | 
						|
                  (*MARK:NAME); if not found, the (*SKIP) is ignored
 | 
						|
  (*THEN)         local failure, backtrack to next alternation
 | 
						|
</pre>
 | 
						|
The effect of one of these verbs in a group called as a subroutine is confined
 | 
						|
to the subroutine call.
 | 
						|
</P>
 | 
						|
<br><a name="SEC28" href="#TOC1">CALLOUTS</a><br>
 | 
						|
<P>
 | 
						|
<pre>
 | 
						|
  (?C)            callout (assumed number 0)
 | 
						|
  (?Cn)           callout with numerical data n
 | 
						|
  (?C"text")      callout with string data
 | 
						|
</pre>
 | 
						|
The allowed string delimiters are ` ' " ^ % # $ (which are the same for the
 | 
						|
start and the end), and the starting delimiter { matched with the ending
 | 
						|
delimiter }. To encode the ending delimiter within the string, double it.
 | 
						|
</P>
 | 
						|
<br><a name="SEC29" href="#TOC1">SEE ALSO</a><br>
 | 
						|
<P>
 | 
						|
<b>pcre2pattern</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
 | 
						|
<b>pcre2matching</b>(3), <b>pcre2</b>(3).
 | 
						|
</P>
 | 
						|
<br><a name="SEC30" href="#TOC1">AUTHOR</a><br>
 | 
						|
<P>
 | 
						|
Philip Hazel
 | 
						|
<br>
 | 
						|
Retired from University Computing Service
 | 
						|
<br>
 | 
						|
Cambridge, England.
 | 
						|
<br>
 | 
						|
</P>
 | 
						|
<br><a name="SEC31" href="#TOC1">REVISION</a><br>
 | 
						|
<P>
 | 
						|
Last updated: 12 January 2022
 | 
						|
<br>
 | 
						|
Copyright © 1997-2022 University of Cambridge.
 | 
						|
<br>
 | 
						|
<p>
 | 
						|
Return to the <a href="index.html">PCRE2 index page</a>.
 | 
						|
</p>
 |