Link Search Menu Expand Document

RegexPattern Class

A class that holds regular expression patterns.

The Compile static method compiles the specified regular expression and creates a RegexPattern object. The generated RegexPattern object is used to generate an object of the RegexMatcher class, which is a regular expression engine for matching with any input string.

You can create multiple RegexMathcer objects from a single RegexPattern object. Each RegexMatcher object shares the same regular expression pattern.

Default properties and ValueType

The default property is Pattern. The ValueType specification is invalid.

A Unicode mode has been added that handles more Unicode strings. ->

About Unicode mode

If you specify RegexPattern.Unicode as an argument of Compile method and Matches method, you can generate a regular expression engine in Unicode mode.

In Unicode mode, regular expressions and input strings are treated as UString type, and Unicode characters can be used.

The index that represents the position of the character (Start of RegexMatcher class, result of End method, etc.) is in character units (byte units when not in Unicode mode).

<-Up to here

Typical usage

  1. Compile the regular expression with the Compile static method to create a RegexPattern object.
  2. Create a RegexMatcher object that matches the regular expression pattern and the input string by the Matcher method of the RegexPattern object generated in 1.
  3. Match with the Matches method, Find method, etc. of the RegexMatcher object generated in 2.
/* Compile the regular expression */
var p = RegexPattern.Compile("Biz/([a-zA-Z]+)");
/* Set the input string and generate a regular expression engine */
var m = p.Matcher("Biz/Browser, Biz/Designer");
 
/* Perform a partial match search */
while (m.Find()) {
    /* Show the entire partially matched part */
    print(m.Group());
    /* Show first forward reference group */
    print(" [", m.Group(1) , "]", "\n");
}
/* Fully matched */
print("Matches:", m.Matches(), "\n");
 
/* Match from the beginning */
print("LookingAt:", m.LookingAt(), "\n");
 
----- Execution result -----
Biz/Browser [ Browser ]
Biz/Designer [ Designer ]
Matches: 0
LookingAt: 1

Regular expression syntax summary

You can use Perl-like syntax for regular expressions.

  • PCRE (Perl Compatible Regular Expressions) is used in the regular expression class.

Metacharacters

\ Quote the meta character immediately after
^ Matches the beginning of a line. The beginning of a line in multi-line mode.
. Matches any character (except line breaks).
$ Matches the end of a line. End of line in multi-line mode.
| Selection
() Grouping
[] Character class

Metacharacters that can be used in character classes

\ Quote the meta character immediately after
^ Negate the class only when used for the first character
- Character range ※

※ The regular expression of Biz / Browser uses UTF-8 as the character code of the internal data. When specifying the character range, the UTF-8 character code is used to determine the range. (It is always UTF-8 regardless of the Unicode mode specification.) Be careful when specifying characters such as “Kanji” in the range.

Binary character

\t Tab
\n New line
\r Return
\f New page
\a Alarm (bell)
\e Escape
\033 Eighth character
\x1B Hexadecimal character
\c[ Control character
\E Ends quoting of regular expression operators started with \Q
\Q Treats all special characters up to \E as normal characters

General character

\w Matches “word” characters (alphabets, numbers, “_”)
\W Matches non-word characters
\s Matches whitespace characters
\S Matches non-blank characters
\d Matches numbers
\D Matches non-numbers

Position specifier

\b Matches word boundaries
\B Matches other than word boundaries
\A Matches only at the beginning of the string
\Z Matches only at the end of the string or just before the newline at the end
\z Matches only at the end of the string
\G Matches the search start position

Character class[:class:]

alnum alphanumeric
alpha english alphabet
ascii character code 0 - 127
blank Blank or tab
cntrl Control character
digit 10-decimal digits
graph Excluding displayable character spaces
lower Lowercase letters
print Indicates possible text
punct Punctuation characters
space Space characters
upper Uppercase characters
word Matches “word” characters (alphabets, numbers, “_”)
xdigit 16 digits

Quantum specifier

* Matches zero or more iterations
+ Matches more than once
? Matches zero or one iteration
{n} Matches n iterations
{n,} Matches at least n repeats
{n,m} Matches repeats between n and m times

Table of contents