Skip to main content

Regex

learn-regex/README-cn.md at master · ziishaned/learn-regex

re --- 正则表达式操作 — Python 3.10.0 文档

Regex Vis:在线可视化正则编辑器。

以下文章摘自:

匹配单个字符

正则表达式是对文本执行搜索时所指定的“格式”。大多数字符匹配它们自己,例如,正则表达式 test 将完全匹配字符串 test

正则表达式是大小写敏感的,所以 The 不会匹配 the

有些字符是特殊的元字符,有特殊作用,它们不匹配自身:

. ^ $ * + ? { } [ ] \ | ( )

[ and ] are used for specifying a character class, which is a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a '-'.

[a-z] match only lowercase letters.

[] 内的元字符会被当作一般的字符。

complementing the set: [^5] will match any character except 5

[] 内想匹配元字符,就要加反斜杠转义。

\d 匹配任何十进制数字;这相当于 [0-9]

\D 匹配任何非数字字符;这相当于 [^0-9]

\s 匹配任何空白字符;这相当于 [ \t\n\r\f\v]

\S 匹配任何非空白字符;这相当于 [^ \t\n\r\f\v]

\w 匹配任何字母数字字符;这相当于类 [a-zA-Z0-9_]

\W 匹配任何非字母数字字符;这相当于类 [^a-za-z0-9_]

. matches anything except a newline character.

(xyz) 字符集,匹配与 xyz 完全相等的字符串

匹配多个字符

* specifies that the previous character can be matched zero or more times.

  • a[bcd]*b 匹配 a 开头、b 结尾、中间 bcd 任意字符重复任意次。

+ matches one or more times.

? matches either once or zero time

{m,n} means there must be at least m repetitions, and at most n

更多元字符

^ Matches at the beginning of a line.

$ Matches at the end of a line.

\b Word boundary. This is a zero-width assertion that matches only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character.

应用

找出当前目录下的所有 .h .m .mm 后缀的文件:find -E . -regex ".*\.(h|m|mm)"

全量格式化处理:

find -E . -regex ".*\.(h|m|mm|c|cc|cpp)" | xargs clang-format -i -style=file

on Mac OS X (BSD find): man find says -E uses extended regex support.