swirl Guide to OmniMark 5   OmniMark home
docs home 
IndexConceptsTasksSyntaxLibrariesOMX VariablesErrors
 
Prerequisite Concepts      
Character classes

In a pattern, you can specify literal text (or expressions that resolve to literal text) or you can specify a character class. A character class is a set of characters. A character in the input data will match a character class if it matches any one of the characters in the character class.

For example, the OmniMark built-in character class digit includes the characters "0", "1", "2", "3", "4", "5", "6", "7", "8", and "9". Given the input data "123ABC", the following pattern will match "1":

  find digit

And the following pattern will match "123":

  find digit+

OmniMark provides the following predefined character classes:

Since the predefined character classes may not always meet your needs, OmniMark lets you define your own character classes. A programmer-defined character class is contained between square brackets. For example, the following pattern matches an arithmetic operator:

  find ["+-*/"]

This character class consists of any of the characters in the string "+-*/". If your character class will contain many characters, you can include every character except those you specify by preceding the string of characters with the "except" operator \. For example, the following pattern matches any character except the XML markup characters "<", "&", and ">":

  find [\"<&>"]

You can also specify a character set by adding or subtracting characters from a built-in character set. To add characters, you join character classes and strings with the or operator |. For example, the following pattern matches any hexadecimal digit:

  find [digit | "AaBcCcDdEeFf"]

To subtract characters, you use the "except" operator "\". For example, the following pattern matches any octal digit:

  find [digit \ "89"]

You can also use the "or" operator to join two or more built-in character classes, as in this pattern that matches any alpha-numeric character:

  find [letter | digit]

Note that while you can use the "or" operator as many times as you like, you can only use the except operator once in a character class. Thus this pattern is not valid:

  find [letter \ "xyz" | digit \ "7"]

You must rewrite it as follows:

  find [letter | digit \ "xyz7"]

You can also specify ranges of characters using to. For example, the following code fragment matches any character between the lowercase letters "a" and "m":

  find ["a" to "m"]

You can combine ranges or exclude them from other things in a character set, including other ranges. For example, the following pattern matches any character between the lowercase letters "a" and "z" as well as the characters ".", ",", or "?"; it does not match the lowercase letters between "i" and "n" or the lowercase letter "t":

  find ["a" to "z" | ".,?" \ "i" to "n" | "t"]

Take care when using character set ranges because the letters of the alphabet are not always contiguous in a character set. In the EBCDIC character encoding, for example, there are non-alphabetic characters between "A" and "Z".

Deprecated syntax

The word except is a deprecated synonym for the "except" operator \.

In previous versions of OmniMark, the keyword any was required before the "except" operator in creating an "any except" character class. Thus the character class [\ "aeiou"] would be written [any except "aeiou"]. The form [any \ "aeiou"] is still permitted and is identical in meaning to [\ "aeiou"].

Prerequisite Concepts
     Pattern matching
 
   
----

Top [ INDEX ] [ CONCEPTS ] [ TASKS ] [ SYNTAX ] [ LIBRARIES ] [ OMX ] [ OMX ] [ ERRORS ]

Generated: August 11, 2000 at 3:06:17 pm
If you have any comments about this section of the documentation, send email to docerrors@omnimark.com

Copyright © OmniMark Technologies Corporation, 1988-2000.