Introduction to Regular Expressions: Key Points

Regular Expressions: The pitch

Regexs are powerful tools for searching and transforming text.
A search pattern, using a defined syntax, allows non-specific but directed matching.

Use of wildcards in the Unix shell for file selection is a simple form of regular expressions.
* matches zero or more characters
? matches exactly one character
[ ] matches a character from a list or range of contained options
[! ] matches a character NOT in a list or range of contained options
{ } expands to produce forms of all listed contained options

grep in Extended Regex mode (or egrep) allows complex pattern matching in files/streams.
| acts as an OR between options
( ) allows grouping, e.g. for OR modifier, with quantifiers, etc..
[ ] matches a character from a list or range of contained options
[^ ] matches a character NOT in a list or range of contained options
^ at the start of a regex means match at start of line
$ at the end of a regex means match at end of line
. is the match-all (any single character) wildcard
? quantifies previous character or group as occurring zero or one time
* quantifies previous character or group as occurring zero or more times
+ quantifies previous character or group as occurring one or more times
{n,m} quantifies previous character or group as occurring between n and m times
Quantifiers are greedy- will always match longest possible fit.

sed -E 's/pattern/replacement/'
's/pattern/replacement/g' - enables Greedy, replace-all mode.
Use grouping () in pattern and back-reference \1 in replacement…
… to rearrange or recontextualise parts of the matched input.
Tips for writing complex substitutions:
1- Start with a complete real example pasted as your pattern.
2- Escape ‘\’ any forward slashes, literal brackets, etc., as necessary.
3- Circle the parts to retain, with round brackets.
4- Write your replacement rules, using back-references.
5- Substitution should now work for your specific real example.
6- Abstract pattern with wildcards, etc., to make ambiguous enough for all required cases.

Regular expression capabilities are incorporated in most modern text editors for find and replace.

Regular expressions through R:
str_detect( string.vector, 'pattern' )
str_replace( string.vector, 'pattern', 'replacement )
str_replace_all( string.vector, 'pattern', 'replacement ) for ‘greedy’ match & replace.
Need to double escape \\ any back slashes in patterns.