1. The Equals Tilde
What’s an equalstilde, you say? Who would come up with such a dumb sounding name for an operator? It actually lies at the foundation of Ruby regular expressions. It allows you to apply a regular expression to a string, and returns the index within the string where the regular expression matches. For example,
1
|
|
2. Matchdata
Instead of using the equalstilde, you can also use a string method called match to apply a regular expression to a string. However, instead of returning an index, it returns this weird type of object called MatchData:
1
|
|
Cool, so then what do you do with a MatchData object? It all makes sense when you learn about this crazy thing called…
3. Capture groups
Get ready to have your mind blown. In Ruby, if you use parenthesis in a regular expression, you can utilize capture groups. You can extract multiple parts from a string without using multiple regular expressions, just by putting the part of the string you want to capture. Get out, I know. It’s awesome. For example,
1 2 3 4 5 |
|
Then, you can access each of the capture groups separately, like so:
1 2 3 4 |
|
You can even name your capture groups:
1 2 3 4 5 |
|
Then you can access each group using hash syntax:
1
|
|
4. Atomic Grouping
An atomic group is a type of capture group. When the regex engine exits it, all backtracking positions are discarded. Let’s go over two cases, one that uses atomic grouping, and one that doesn’t, and see how the regex engine would operate.
1
|
|
The regex engine first matches the start of the string, \A, and then matches “Tommy”. However, since it then would leave the capture group and try to match the \z, or the end of a string, the match would fail. The engine would then go back and try to match Thomas, and fail, try to match Tom, and ultimately stop and declare failure. But say we want to shorten this process.
1
|
|
In this case, again, the \A is matched as the start of the string, and then the engine tries to match “Tommy”. It succeeds and moves onto matching \z, which fails. Because of the atomic grouping, the engine has thrown out all back tracing data upon reaching the \z, and therefore fails after only trying to match “Tommy” rather than all three options in the capture group.
5. Subexpression Calls
Okay. Okay. This one is probably the most radical thing about regular expressions.
By using the \g
Say that you want to make sure all parenthesis surrounding a string are always balanced. You would use something like this:
1 2 3 4 5 6 7 8 |
|
Cool. Pretty cool. Let’s go over what the regex engine does.
- Enter a capture group named paren.
- Match a literal (.
- Match the text in between the parenthesis that is anything except for parenthesis.
- Call the paren capture group again, dropping the part in the middle of the parentheses for now.
- Enter the paren capture group again
- Match a literal (, the second character in the group
- Match the text in between the parenthesis that is anything except for parenthesis.
- Try to call paren again, but fail since it would cause the match thus far to fail.
- Match a literal ) n times, where n is the depth of the recursion.
Note that the * following \g
6. Lookahead and lookbehind assertions
What if you want to make sure certain characters exist in a regular expression, but you don’t want them to be part of your match group? This is when you would want to use a special type of anchor, called lookahead and lookbeind assertions.
- (?=pat) is a positive lookahead assertion, and ensures that the characters following your expression match “pat”
- (?!pat) is a negative lookahead assertion, and ensures that the characters following your expression do not match “pat”
- (?<=pat) is a positive lookbehind assertion, and ensures that the characters preceeding your expression match “pat”
- (?<!pat) is a negative lookbehind assertion, and ensures that the characters preceeding your expression do not match “pat”
Pretty fantastic, right? For example, say you have a list of emails, and you’re trying to find the usernames of all of the ones at a certain domain:
1 2 3 4 5 6 7 8 |
|
Other References
If there’s still more you want to know about regular expressions in Ruby, I recommend looking at the Ruby Docs or visiting the webstite Regular-Expressions.info, which contains more than you’d ever want to know about regular expressions. In the best way possible.