Intro to regular expressions: Glossary

Key Points

Regular Expressions: The pitch	Regexs are powerful tools for searching and transforming text. A search pattern, using a defined syntax, allows non-specific but directed matching.
Shell wildcards - a type of regex	Use of wildcards in the Unix shell for file selection is a simple form of regular expressions. `*` matches zero or more characters `?` matches exactly one character `[ ]` matches a character from a list or range of contained options `[! ]` matches a character NOT in a list or range of contained options `{ }` expands to produce forms of all listed contained options
Pattern matching with grep -E, part 1	grep in Extended Regex mode (or egrep) allows complex pattern matching in files/streams. `\|` acts as an OR between options `( )` allows grouping, e.g. for OR modifier, with quantifiers, etc.. `[ ]` matches a character from a list or range of contained options `[^ ]` matches a character NOT in a list or range of contained options `^` at the start of a regex means match at start of line `$` at the end of a regex means match at end of line `.` is the match-all (any single character) wildcard `?` quantifies previous character or group as occurring zero or one time `*` quantifies previous character or group as occurring zero or more times `+` quantifies previous character or group as occurring one or more times `{n,m}` quantifies previous character or group as occurring between n and m times Quantifiers are greedy- will always match longest possible fit.
Pattern matching with grep -E, part 2	grep in Extended Regex mode has a number of predefined character classes: `[:alpha:] [:alnum:] [:digit:] [:upper:] [:lower:] [:punct:] [:space:]` and escape-character enabled shorthand character classes and anchors: `\w` : Word character [a-zA-Z0-9] OR a _ (underscore) `\W` : `[^\w]` Inverse of \w, any non-word character `\s` : Spaces, tabs, in some contexts new-lines `\S` : `[^\s]` Inverse of \s, any non-space character `\b` : Boundary between adjacent word and space, 0-length anchor `\B` : `[^\b]` In the middle of a word or multiple spaces, 0-length anchor `\<` : Boundary at start of word between word and space, 0-length anchor `\>` : Boundary at end of word between word and space, 0-length anchor You can refer back to an exact copy of a matched (group) using \1, \2, etc..
Find... and replace! With sed.	`sed -E 's/pattern/replacement/'` `'s/pattern/replacement/g'` - enables Greedy, replace-all mode. Use grouping () in pattern and back-reference \1 in replacement… … to rearrange or recontextualise parts of the matched input. Tips for writing complex substitutions: 1- Start with a complete real example pasted as your pattern. 2- Escape ‘\’ any forward slashes, literal brackets, etc., as necessary. 3- Circle the parts to retain, with round brackets. 4- Write your replacement rules, using back-references. 5- Substitution should now work for your specific real example. 6- Abstract pattern with wildcards, etc., to make ambiguous enough for all required cases.
Regexs within text editors	Regular expression capabilities are incorporated in most modern text editors for find and replace.
Python regular expressions	Regular expressions through Python: `import re` `match = re.search(r'pattern', 'string')` or `list = re.split(r'pattern', 'string')` or `re.sub( r'pattern', r'replacement', 'string' )` Reference: https://docs.python.org/3/library/re.html
R regular expressions	Regular expressions through R: `grep( 'pattern', string.vector )` `sub( 'pattern', 'replacement', string.vector )` Use gsub instead of sub for greedy find+replace mode. Need to double escape `\` any back slashes in patterns.

Glossary

FIXME