Regex
learn-regex/README-cn.md at master · ziishaned/learn-regex
re --- 正则表达式操作 — Python 3.10.0 文档
Regex Vis:在线可视化正则编辑器。
以下文章摘自:
匹配单个字符
正则表达式是对文本执行搜索时所指定的“格式”。大多数字符匹配它们自己,例如,正则表达式 test 将完全匹配字符串 test。
正则表达式是大小写敏感的,所以 The 不会匹配 the。
有些字符是特殊的元字符,有特殊作用,它们不匹配自身:
. ^ $ * + ? { } [ ] \ | ( )
[ and ] are used for specifying a character class, which is a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a '-'.
[a-z] match only lowercase letters.
在 [] 内的元字符会被当作一般的字符。
complementing the set: [^5] will match any character except 5
在 [] 内想匹配元字符,就要加反斜杠转义。
\d 匹配任何十进制数字;这相当于 [0-9]
\D 匹配任何非数字字符;这相当于 [^0-9]
\s 匹配任何空白字符;这相当于 [ \t\n\r\f\v]
\S 匹配任何非空白字符;这相当于 [^ \t\n\r\f\v]
\w 匹配任何字母数字字符;这相当于类 [a-zA-Z0-9_]
\W 匹配任何非字母数字字符;这相当于类 [^a-za-z0-9_]
. matches anything except a newline character.
(xyz) 字符集,匹配与 xyz 完全相等的字符串
匹配多个字符
* specifies that the previous character can be matched zero or more times.
a[bcd]*b匹配 a 开头、b 结尾、中间 bcd 任意字符重复任意次。
+ matches one or more times.
? matches either once or zero time
{m,n} means there must be at least m repetitions, and at most n
更多元字符
^ Matches at the beginning of a line.
$ Matches at the end of a line.
\b Word boundary. This is a zero-width assertion that matches only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character.
应用
找出当前目录下的所有 .h .m .mm 后缀的文件:find -E . -regex ".*\.(h|m|mm)"
全量格式化处理:
find -E . -regex ".*\.(h|m|mm|c|cc|cpp)" | xargs clang-format -i -style=file
on Mac OS X (BSD find): man find says -E uses extended regex support.