Regular Expressions

Regular expressions are special string patters of characters, used with strings, for substring search and substring substitutions. It's a powerful, but complicated tool; for an introduction to regular expressions see: http://en.wikipedia.org/wiki/Regular_expression .

In Ruby regular expressions are build into the language, as the "Regexp" class. The matching with a a string is done using the match operator: "=~", or by the method: match. The to_s method give a string representation of a Regexp object.

Regular expressions are created by the Regexp constructor method, from a string pattern between "/ /" or with the "%r" operator: r=%r{pattern}:

r=/pattern /

r=/pattern /options

r=%r{pattern}         # every character can be used in place of "{}"

r=%r{pattern}options  # a regular expression with options

r=Regexp.new(pattern,options)

A regular expression can have options, some options are:

i : case insensitive search
o : a substitution is performed only once
m : the dot match also \n
x : extended syntax
    (pattern can contain comments and other constricts)

In regular expression patterns the backslash is used as an escape characters and some special constructs are interpreted according to the following table:

. the dot matches any character
+ one ore more of the preceding substring
* zero or more of the preceding substring
? one or zero of the preceding substring
[abc123] one of these characters
[^aeiou] a character not in the list; not a vowel for: [^aeiou]
[a-c] a range of characters
[^a-c] a character not in the range
a|b logical or for characters ( "a" or "b" )
{m} m times the preceding substring
{n,m} min m and max m occurrence of the preceding substring
{,m} at least m times
{,n} at most n
^ the beginning of a line
$ the ending of a line
\Z ending of a string
\A beginning of a string
\b word boundaries
\B non word boundaries
\s space characters (and also \n \t \r \f )
\S NON space characters
\d a digit, same as [0-9]
\w a word character, same as [A-Z,a-z,0-9_]
\W a non word character
( ) to group characters into a sub-pattern
\1 \2 substrings matched by a preceding sub-pattern

The operator: "~=" returns the index of the first match or nil if the match didn't occur; the match method returns instead a MatchData object with members containing the results of the match.

If sub-patterns are used, the substrings matched by the sub-patterns are also saved. String matched by sub-patterns can be used also into the match pattern itself.

The match methods have the following syntax:

string =~ /pattern/        # returns a number or nil:
                             n= string =~ /pattern/

/pattern/.match(string)    # returns a MatchData object:
                             matchdata = /pattern/.match(string)

string !~ /pattern/        # is the same as !(string =~ /pattern/)

After a match some global variables, and a Matchdata object, are defined, the Matchdata object is saved in the global variable: "$~

$& matchdata[0] the matched part of the string
$' matchdata.pre_match the part preceding the match
$` matchdata.post_match the part after the match
$1 $2 matchdata[1] strings matched by sub-patterns
  matchdata,size size of the matched string

Regexp objects are also used for string substitution, using the sub and gsub methods; sub makes a single substitutions, gsub changes all the occurrences of the match:

"string".sub(/pattern/,"replacement string")

"string".gsub(/pattern/,"replacement string")  # gsub for multiple substitutions

"string".sub!(/pattern/)       # sub!: for on-place replacement
"string".gsub!(/pattern/)

A block of statements can be associated to the sub and gsub methods, these blocks have, as argument, the matched string; the result of the block is substituted into the string (blocks of statements will be described further):

"string".sub(/pattern/)  { |p| statements }   # "p" is the matched string
"string".gsub(/pattern/) { |p| statements }

Patterns can be used to subdivide a string, by using the scan method:

"one word or two".scan(/\w+/)   =>  ["one", "word", "or", "two"]

"one word or two".scan(/\w+/) {|w| statements }  # also passing matches to a block
"string".scan(/(t).*(n)/) { |a,b| print a,b }    # sub-matches given to the block

Examples of regular expression usage:

match position matched string meaning
/abc/ =~ "012abc34" 3 "abc" "abc" substring
/^abc/ =~ "012abc34" nil nil not "a", then "bc"
/d+/ =~ "012abc34" 0 "012" some digits
/.+s.+/ =~ "123 abc" 0 "123 abc" first space between characters
/4$/ =~ "4234" 3 "4" "4" at the end
/(34)$/ =~ "34234" 3 "34" "34" at the end
/^(34)/ =~ "34234" 0 "34" "34 at the beginning
/3{2}/ =~ "12343312" 4 "33" "3" two times
/[0-9]/ =~ "abc3de" 3 "3" a character in a range
/(dd):(dd)/=~"a12:30" 1 "12:30" subpattern: $1=>"12"; $2=>"30"