Regular Expressions
#
Regular expressions are a powerful tool for describing patterns in text. They are used in several OBITools
like
obigrep
,
obiannotate
or
obiscript
.
Single characters
#
| Pattern |
Description |
. |
any character, possibly including newline (flag s=true) |
[xyz] |
character class |
[^xyz] |
negated character class |
[[:alpha:]] |
ASCII character class |
[[:^alpha:]] |
negated ASCII character class |
Composites
#
| Pattern |
Description |
xy |
x followed by y |
x|y |
x or y (prefer x) |
Repetitions
#
| Pattern |
Description |
x* |
zero or more x, prefer more |
x+ |
one or more x, prefer more |
x? |
zero or one x, prefer one |
x{n,m} |
n or n+1 or … or m x, prefer more |
x{n,} |
n or more x, prefer more |
x{n} |
exactly n x |
x*? |
zero or more x, prefer fewer |
x+? |
one or more x, prefer fewer |
x?? |
zero or one x, prefer zero |
x{n,m}? |
n or n+1 or … or m x, prefer fewer |
x{n,}? |
n or more x, prefer fewer |
x{n}? |
exactly n x |
Grouping
#
| Pattern |
Description |
(re) |
numbered capturing group (submatch) |
(?P<name>re) |
named & numbered capturing group (submatch) |
(?<name>re) |
named & numbered capturing group (submatch) |
(?:re) |
non-capturing group |
(?flags) |
set flags within current group; non-capturing |
(?flags:re) |
set flags during re; non-capturing |
Character classes
#
| Pattern |
Description |
[\d] |
digits (== \d) |
[^\d] |
not digits (== \D) |
[\D] |
not digits (== \D) |
[^\D] |
not not digits (== \d) |
[[:name:]] |
named ASCII class inside character class (== [:name:]) |
[^[:name:]] |
named ASCII class inside negated character class (== [:^name:]) |
[\p{Name}] |
named Unicode property inside character class (== \p{Name}) |
[^\p{Name}] |
named Unicode property inside negated character class (== \P{Name}) |
Named character classes
#
| Pattern |
Description |
[[:alnum:]] |
alphanumeric (== [0-9A-Za-z]) |
[[:alpha:]] |
alphabetic (== [A-Za-z]) |
[[:ascii:]] |
ASCII (== [\x00-\x7F]) |
[[:blank:]] |
blank (== [\t ]) |
[[:cntrl:]] |
control (== [\x00-\x1F\x7F]) |
[[:digit:]] |
digits (== [0-9]) |
[[:graph:]] |
graphical (== [!-~] == [A-Za-z0-9!"#$%&'()*+,\-./:;<=>?@[\\\]^_`{|}~]) |
[[:lower:]] |
lower case (== [a-z]) |
[[:print:]] |
printable (== [ -~] == [[:graph:]]) |
[[:punct:]] |
punctuation (== [!-/:-@[-\`{-~]) |
[[:space:]] |
whitespace (== [\t\n\v\f\r ]) |
[[:upper:]] |
upper case (== [A-Z]) |
[[:word:]] |
word characters (== [0-9A-Za-z_]) |
[[:xdigit:]] |
hex digit (== [0-9A-Fa-f]) |