Skip to main content

Using Regular Expressions

  • 6 minutes to read

Preface

This specification covers the regular expressions syntax.

We use regular expressions to validate user input in the TcxMaskEdit via mask. This editor provides two mask representations:

  • Easy to use Delphi standard mask

  • A powerful regular expressions mask.

Reference

A regular expression can consist of characters (letters, digits and others which are not command characters), command characters and metacharacters.

1. Characters

If one of the characters listed below occurs in the mask, the input string must contain the same character at this position.

The following characters are supported:

A-Z, a-z, 0-9,

and non-letter and non-digit characters:

~, , !, @, #, $, %, ^, &, -, _, =, ,, <, >, /, ;, :.

Note

you can use other characters by placing \ before them. For instance, [ will be treated as [ by the regular expression compiler.

To specify a string, enclose it in single quotes - ‘This is a string’. Each single quote must have a matching single quote. To denote a single quote in a string, use two single quote characters - ‘This ‘’A’’ is in single quotes’.

The compiler ignores spaces unless they are enclosed in single quotes.

Example:

The Making’ ‘RAD’ ‘a’ ‘Reality! or ‘Making RAD a Reality!’ expression matches only the Making RAD a Reality! string.

2. Command characters

These characters are not interpreted as symbols but as commands by the compiler. They are used to organize grouping, quantifications and conditions.

  • Grouping:

Character Sequence Grouping – ( )

This grouping is commonly used in quantifier operations: ?, +, *****. (These are described later.)

Examples:

(a)+ matches a, aaa or aaaa. (+ means that the preceding character can occur one or more times).

(abc)+ matches abc, abcabc or abcabcabc.

This grouping is also used in back references. A back reference is a part of regular expression that can be referenced via a special metacharacter \n (where 1 =< n =< 9).

You can use them in the expression to make it shorter and easier to read.

The 3abcabcabc expression will be better written as 3(abc)\1\1. In this example the \1 references abc sequence.

OR Grouping – [ ]

Means that any symbol listed within square brackets can represent a character.

Examples:

[bfc]at matches bat, fat and cat.

OR grouping supports all quantifiers: ?, ***** and + (described later). But quantifiers enclosed in square brackets are treated as simple characters.

[bfc]?at matches bat, fat, cat and at.

b[oa]+t matches bot, bat, boot and boat.

a[in]*t matches at, ait and ant.

OR grouping supports ranges:

[A-Z] matches any Latin capital letter.

[0-9A-Fa-f] matches any hexadecimal number.

Exclusive OR Grouping – [^]

Means that any symbol except those in square brackets can represent a character.

Example:

[^abc] matches any symbol except a, b, and c.

Exclusive OR grouping supports all quantifiers: ?, ***** and +.(described later). But quantifiers enclosed in square brackets are treated as simple characters.

Example:

[^0-9]+ Matches any text sting without digits (for instance, ExpressQuantumGrid but not ExpressQuantumGrid4).

Variant Grouping – ( | )

Contains several sequences that can match on an OR basis.

Example:

(b|z)oo matches both zoo and boo.

(abc|123) matches abc and 123.

  • Quantifiers

***** matches the preceding expression zero or more times.

The expression can be a symbol or group of symbols.

Example:

zo* matches z, zo and zoo.

[#$][0-9a-f]* matches $00ff00, #ff1234234, $ or #.

[0-9]*[.][0-9] matches any decimal value with a single digit after the decimal separator.

(a|b|c)* matches aaaa, bbb, cc, abc, acc, cba or an empty string.

+ matches the preceding expression one or more times.

The expression can be a symbol or group of symbols.

Example:

zo+ matches zo and zoo, but not z.

$[0-9]+.99 matches any string, thus representing a currency format.

[#$][0-9a-f]+ matches $00ff00, #ff1234234, but not $ or #.

? matches the preceding expression zero or one time.

The expression can be a symbol or group of symbols.

Example:

zoo? matches zo and zoo, but not z.

[#$][0-9a-f]? - matches $, #e, but not #2a.

{n} matches the preceding expression exactly n times.

The expression can be a symbol or group of symbols.

Example:

zo{2} matches zoo but not zo or zooo.

$[0-9a-f]{8} matches $00ff01de but not $ff or $0ffffe333.

{n,} matches the preceding expression at least n times.

The expression can be a symbol or group of symbols.

Example:

zo{2,} matches zoo or zooo but not z or zo.

$[0-9a-f]{4,} matches $00ff or $ffddee but not $ff.

{n,m} matches the preceding expression at least n times and at most m times.

The expression can be a symbol or group of symbols.

Example:

zo{1,2} matches zo and zoo but not z or zooo.

$[0-9a-f]{4,8} matches $00ff01de and $00ff but not $ff or $0ffffe333.

  1. Metacharacters

Metacharacters are used to represent a symbol or a range of symbols.

Character Description
\d Matches a digit character. Equivalent to [0-9].
\D Matches a non digit character. Equivalent to [^0-9].
\f Matches a form-feed character. Equivalent to \x0c.
\n Matches a new line character. Equivalent to \x0a.
\r Matches a carriage return character. Equivalent to \x0d.
\s Matches any white space character including space, tab, form-feed, etc.
\S Matches any non white space character.
\t Matches a tab character. Equivalent to \x09.
\w Matches any word character including underscore. Equivalent to ‘[A-Za-z0-9_]’.
\W Matches any non-word character. Equivalent to ‘[^A-Za-z0-9_]’.
\x*n* Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, ‘\x41’ matches “A”. ‘\x041’ is equivalent to ‘\x04’ & “1”. Allows ASCII codes to be used in regular expressions.
. Matches any symbol. It is treated as full stop within square brackets.

Examples:

\d\d\d-\d\d-\d\d matches any phone number (for instance 555-65-92 or 123-45-67).

\w+\d? matches any sequence on non-digit characters followed by an optional digit (for instance ExpressQuantumGrid or ExpressQuantumGrid4 but not All4You).

Some complex examples of regular expressions:

([01]?\d|2[0-3]):[0-5]\d:[0-5]\d – a regular expression representing the pattern for 24 hours time format.

Let us consider it in detail:

[01]?\d matches any number between 0 and 19. Zero can precede single digit numbers (for instance 0, 5, 02 or 14).

2[0-3] matches any number between 20 and 23.

([01]?\d|2[0-3]) matches any number between 0 and 23.

: - means that there must be a valid time separator. A valid time separator is obtained from the system regional settings.

[0-5]\d matches any number between 0 and 59. Zero precedes all single digit numbers (for instance 00, 05, 34, 59)

[\w-.]+@[\w-]+(\.[\w-]+)+ - a regular expression that represents a pattern for e-mail addresses.

Let us consider it in detail:

[\w-.]+ matches any letter, digit, hyphen, underscore or full stop one or more times.

[\w-]+ matches any letter, digit, underscore or hyphen one or more times.

(\.[\w-]+)+ matches full stop followed by a letter, digit, underscore or hyphen sequence one or more times.

See Also