192 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			HTML
		
	
	
	
			
		
		
	
	
			192 lines
		
	
	
		
			7.7 KiB
		
	
	
	
		
			HTML
		
	
	
	
| <html>
 | |
| <head>
 | |
| <title>pcre2convert specification</title>
 | |
| </head>
 | |
| <body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
 | |
| <h1>pcre2convert man page</h1>
 | |
| <p>
 | |
| Return to the <a href="index.html">PCRE2 index page</a>.
 | |
| </p>
 | |
| <p>
 | |
| This page is part of the PCRE2 HTML documentation. It was generated
 | |
| automatically from the original man page. If there is any nonsense in it,
 | |
| please consult the man page, in case the conversion went wrong.
 | |
| <br>
 | |
| <ul>
 | |
| <li><a name="TOC1" href="#SEC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a>
 | |
| <li><a name="TOC2" href="#SEC2">THE CONVERT CONTEXT</a>
 | |
| <li><a name="TOC3" href="#SEC3">THE CONVERSION FUNCTION</a>
 | |
| <li><a name="TOC4" href="#SEC4">CONVERTING GLOBS</a>
 | |
| <li><a name="TOC5" href="#SEC5">CONVERTING POSIX PATTERNS</a>
 | |
| <li><a name="TOC6" href="#SEC6">AUTHOR</a>
 | |
| <li><a name="TOC7" href="#SEC7">REVISION</a>
 | |
| </ul>
 | |
| <br><a name="SEC1" href="#TOC1">EXPERIMENTAL PATTERN CONVERSION FUNCTIONS</a><br>
 | |
| <P>
 | |
| This document describes a set of functions that can be used to convert
 | |
| "foreign" patterns into PCRE2 regular expressions. This facility is currently
 | |
| experimental, and may be changed in future releases. Two kinds of pattern,
 | |
| globs and POSIX patterns, are supported.
 | |
| </P>
 | |
| <br><a name="SEC2" href="#TOC1">THE CONVERT CONTEXT</a><br>
 | |
| <P>
 | |
| <b>pcre2_convert_context *pcre2_convert_context_create(</b>
 | |
| <b>  pcre2_general_context *<i>gcontext</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| <b>pcre2_convert_context *pcre2_convert_context_copy(</b>
 | |
| <b>  pcre2_convert_context *<i>cvcontext</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| <b>void pcre2_convert_context_free(pcre2_convert_context *<i>cvcontext</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| <b>int pcre2_set_glob_escape(pcre2_convert_context *<i>cvcontext</i>,</b>
 | |
| <b>  uint32_t <i>escape_char</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| <b>int pcre2_set_glob_separator(pcre2_convert_context *<i>cvcontext</i>,</b>
 | |
| <b>  uint32_t <i>separator_char</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| A convert context is used to hold parameters that affect the way that pattern
 | |
| conversion works. Like all PCRE2 contexts, you need to use a context only if
 | |
| you want to override the defaults. There are the usual create, copy, and free
 | |
| functions. If custom memory management functions are set in a general context
 | |
| that is passed to <b>pcre2_convert_context_create()</b>, they are used for all
 | |
| memory management within the conversion functions.
 | |
| </P>
 | |
| <P>
 | |
| There are only two parameters in the convert context at present. Both apply
 | |
| only to glob conversions. The escape character defaults to grave accent under
 | |
| Windows, otherwise backslash. It can be set to zero, meaning no escape
 | |
| character, or to any punctuation character with a code point less than 256.
 | |
| The separator character defaults to backslash under Windows, otherwise forward
 | |
| slash. It can be set to forward slash, backslash, or dot.
 | |
| </P>
 | |
| <P>
 | |
| The two setting functions return zero on success, or PCRE2_ERROR_BADDATA if
 | |
| their second argument is invalid.
 | |
| </P>
 | |
| <br><a name="SEC3" href="#TOC1">THE CONVERSION FUNCTION</a><br>
 | |
| <P>
 | |
| <b>int pcre2_pattern_convert(PCRE2_SPTR <i>pattern</i>, PCRE2_SIZE <i>length</i>,</b>
 | |
| <b>  uint32_t <i>options</i>, PCRE2_UCHAR **<i>buffer</i>,</b>
 | |
| <b>  PCRE2_SIZE *<i>blength</i>, pcre2_convert_context *<i>cvcontext</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| <b>void pcre2_converted_pattern_free(PCRE2_UCHAR *<i>converted_pattern</i>);</b>
 | |
| <br>
 | |
| <br>
 | |
| The first two arguments of <b>pcre2_pattern_convert()</b> define the foreign
 | |
| pattern that is to be converted. The length may be given as
 | |
| PCRE2_ZERO_TERMINATED. The <b>options</b> argument defines how the pattern is to
 | |
| be processed. If the input is UTF, the PCRE2_CONVERT_UTF option should be set.
 | |
| PCRE2_CONVERT_NO_UTF_CHECK may also be set if you are sure the input is valid.
 | |
| One or more of the glob options, or one of the following POSIX options must be
 | |
| set to define the type of conversion that is required:
 | |
| <pre>
 | |
|   PCRE2_CONVERT_GLOB
 | |
|   PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR
 | |
|   PCRE2_CONVERT_GLOB_NO_STARSTAR
 | |
|   PCRE2_CONVERT_POSIX_BASIC
 | |
|   PCRE2_CONVERT_POSIX_EXTENDED
 | |
| </pre>
 | |
| Details of the conversions are given below. The <b>buffer</b> and <b>blength</b>
 | |
| arguments define how the output is handled:
 | |
| </P>
 | |
| <P>
 | |
| If <b>buffer</b> is NULL, the function just returns the length of the converted
 | |
| pattern via <b>blength</b>. This is one less than the length of buffer needed,
 | |
| because a terminating zero is always added to the output.
 | |
| </P>
 | |
| <P>
 | |
| If <b>buffer</b> points to a NULL pointer, an output buffer is obtained using
 | |
| the allocator in the context or <b>malloc()</b> if no context is supplied. A
 | |
| pointer to this buffer is placed in the variable to which <b>buffer</b> points.
 | |
| When no longer needed the output buffer must be freed by calling
 | |
| <b>pcre2_converted_pattern_free()</b>. If this function is called with a NULL
 | |
| argument, it returns immediately without doing anything.
 | |
| </P>
 | |
| <P>
 | |
| If <b>buffer</b> points to a non-NULL pointer, <b>blength</b> must be set to the
 | |
| actual length of the buffer provided (in code units).
 | |
| </P>
 | |
| <P>
 | |
| In all cases, after successful conversion, the variable pointed to by
 | |
| <b>blength</b> is updated to the length actually used (in code units), excluding
 | |
| the terminating zero that is always added.
 | |
| </P>
 | |
| <P>
 | |
| If an error occurs, the length (via <b>blength</b>) is set to the offset
 | |
| within the input pattern where the error was detected. Only gross syntax errors
 | |
| are caught; there are plenty of errors that will get passed on for
 | |
| <b>pcre2_compile()</b> to discover.
 | |
| </P>
 | |
| <P>
 | |
| The return from <b>pcre2_pattern_convert()</b> is zero on success or a non-zero
 | |
| PCRE2 error code. Note that PCRE2 error codes may be positive or negative:
 | |
| <b>pcre2_compile()</b> uses mostly positive codes and <b>pcre2_match()</b>
 | |
| negative ones; <b>pcre2_convert()</b> uses existing codes of both kinds. A
 | |
| textual error message can be obtained by calling
 | |
| <b>pcre2_get_error_message()</b>.
 | |
| </P>
 | |
| <br><a name="SEC4" href="#TOC1">CONVERTING GLOBS</a><br>
 | |
| <P>
 | |
| Globs are used to match file names, and consequently have the concept of a
 | |
| "path separator", which defaults to backslash under Windows and forward slash
 | |
| otherwise. If PCRE2_CONVERT_GLOB is set, the wildcards * and ? are not
 | |
| permitted to match separator characters, but the double-star (**) feature
 | |
| (which does match separators) is supported.
 | |
| </P>
 | |
| <P>
 | |
| PCRE2_CONVERT_GLOB_NO_WILD_SEPARATOR matches globs with wildcards allowed to
 | |
| match separator characters. PCRE2_CONVERT_GLOB_NO_STARSTAR matches globs with
 | |
| the double-star feature disabled. These options may be given together.
 | |
| </P>
 | |
| <br><a name="SEC5" href="#TOC1">CONVERTING POSIX PATTERNS</a><br>
 | |
| <P>
 | |
| POSIX defines two kinds of regular expression pattern: basic and extended.
 | |
| These can be processed by setting PCRE2_CONVERT_POSIX_BASIC or
 | |
| PCRE2_CONVERT_POSIX_EXTENDED, respectively.
 | |
| </P>
 | |
| <P>
 | |
| In POSIX patterns, backslash is not special in a character class. Unmatched
 | |
| closing parentheses are treated as literals.
 | |
| </P>
 | |
| <P>
 | |
| In basic patterns, ? + | {} and () must be escaped to be recognized
 | |
| as metacharacters outside a character class. If the first character in the
 | |
| pattern is * it is treated as a literal. ^ is a metacharacter only at the start
 | |
| of a branch.
 | |
| </P>
 | |
| <P>
 | |
| In extended patterns, a backslash not in a character class always
 | |
| makes the next character literal, whatever it is. There are no backreferences.
 | |
| </P>
 | |
| <P>
 | |
| Note: POSIX mandates that the longest possible match at the first matching
 | |
| position must be found. This is not what <b>pcre2_match()</b> does; it yields
 | |
| the first match that is found. An application can use <b>pcre2_dfa_match()</b>
 | |
| to find the longest match, but that does not support backreferences (but then
 | |
| neither do POSIX extended patterns).
 | |
| </P>
 | |
| <br><a name="SEC6" href="#TOC1">AUTHOR</a><br>
 | |
| <P>
 | |
| Philip Hazel
 | |
| <br>
 | |
| University Computing Service
 | |
| <br>
 | |
| Cambridge, England.
 | |
| <br>
 | |
| </P>
 | |
| <br><a name="SEC7" href="#TOC1">REVISION</a><br>
 | |
| <P>
 | |
| Last updated: 28 June 2018
 | |
| <br>
 | |
| Copyright © 1997-2018 University of Cambridge.
 | |
| <br>
 | |
| <p>
 | |
| Return to the <a href="index.html">PCRE2 index page</a>.
 | |
| </p>
 |