GNU 标准正则

GUN 对在正则表达式的功能进行了拓展，我们可以在 sed 或 grep -e 中使用正则表达式，支持的符号如下表所示。

符号	作用
.	任意单个字符
?	至多一个字符
*	任意个字符
+	至少一个字符
{N}	前一个字符或字符组出现 N 次
{N, }	前一个字符或字符组至少出现 N 次
{N, M}	前一个字符或字符组至少出现 N 次，至多出现 M 次
-	用在中括号中表示范围
^	用在中括号中表示反选，中括号外表示行首
$	表示行尾
\<	匹配单词的开头
\>	匹配单词的结尾
(regexp)	字符组 `regexp`
\w	字母与数字，称为字符
\W	非字符，等价于 `[^0-9a-zA-Z]`
\s	空白字符，等价于 `[[:space:]]`
\S	匹配任何非空白字符，等价于 `[^ \f\n\r\t\v]`
regexp1\	regexp2	匹配表达式 1 或表达式 2，注意不能有空格
(.*)配合\N$	表示与正则表达式中第N个用转义括号包裹的子表达式匹配，从左到右自动编号

[1-9][0-9]*  # 匹配一个正整数
abcdef  # 匹配 abcdef
a*b  # 匹配任意个 a 后接 b
a+b  # 匹配至少一个a 后接 b
a?b # 匹配至多一个 a 后接 b
.*  # 匹配所有字符串 包括空
.+  # 匹配至少有一个字符的字符串
^main\s*\(.*\)  #  开头main任意个空格(任意数量字符)
[a-zA-Z0-9]  # 等价于 [[:alnum:]]
[^0-9]  # 非数字
^\(.*\)-\1$  # 匹配 - 两侧一样的字符串 如 123-123
^.{2,4}A$  # 匹配 2 到 4 个任意字符开头后接 A 结尾
(a1){2}  # 字符组 a1 连续出现 2 次
[[:digit:]]|[[:alpha:]]  # 匹配数字或字母，等价于[[:alnum:]]
[[:digit:]][[:alpha:]]  # 匹配数字紧挨着字符

[success]技巧： * 和 + 限定符都会尽可能多的匹配字符，但只要在它们的后面加上一个 ? 就可以实现非贪婪或最小匹配。

[warning]警告：不能将限定符等与定位符 * + ? . 等一起使用。由于在紧靠换行或者单词边界的前面或后面不能有一个以上位置，因此不允许诸如 ^* 之类的表达式。

grep 命令

grep ：在文件中（全局）查找指定的正则表达式，并打印所有匹配的行。 egrep ：支持更多正则表达式元字符（ ? + { } | ( ) 不需要转义），等价于 grep -E 。 fgrep ：忽略所有元字符。

基础用法

grep 是 GNU 提供的工具，用于在输入文件中搜索包含与给定模式列表匹配的行，其基础用法如下：

grep [选项] PATTERN FILE1 FILE2 ...

> grep root /etc/passwd  # 最基本用法
root:x:0:0:root:/root:/bin/bash

> grep -n ^r /etc/passwd  # 带行号
1:root:x:0:0:root:/root:/bin/bash  
30:rtkit:x:111:118:RealtimeKit,,,:/proc:/usr/sbin/nologin  
43:remilia:x:1000:1000:Remilia Scarlet,,,:/home/remilia:/bin/bash  
45:redis:x:124:135::/var/lib/redis:/usr/sbin/nologin  

> grep -v ^[a-z] /etc/passwd  # 反向查找 找不匹配的
_apt:x:105:65534::/nonexistent:/usr/sbin/nologin

> grep -c false /etc/passwd  # 统计匹配的行数
5

> grep -i ps ~/.bash* | grep -v history  # 忽略大小写 管道查询/home/remilia/.bashrc:HISTCONTROL=erasedups
/home/remilia/.bashrc:    PS1='${debian_chroot:+($debian_chroot)}\u@\h:\w\$ '
/home/remilia/.bashrc:    PS1="\[\e]0;${debian_chroot:+($debian_chroot)}\u@\h: \w\a\]$PS1"

# -o 选项只显示匹配内容
> ifconfig | grep -Eo '([0-9]{1,3}\.){3}[0-9]{1,3}'
127.0.0.1
255.0.0.0
192.168.0.104
255.255.255.0
192.168.0.255

# 显示匹配的上下文
> seq 5 | grep -C2 3  # 前后各两行
1
2
3
4
5
> seq 5 | grep -A2 3  # 后两行
3
4
5
> seq 5 | grep -B2 3  # 前两行
1
2
3

返回状态

0 ：找到了指定的模式。 1 ：没有找到指定的模式。 2 ：没有找到文件。

> grep -q root /etc/passwd ; echo $?
0

配合正则表达式

> egrep 'NW' datafile  # 找带 NW 的行
> egrep 'NW|EA' d*  # 找带 NW 或 EA 的行
> egrep 's(h|u)'  # sh 或 su
> egrep 'sh|u'  # sh 或 u
> egrep '^n' datafile  # 查找以 n 开头的行
> egrep TB Savage datafile  # 注意这是从两个文件中找
> egrep 'TB Savage' datafile  # 这是从一个文件中找
> egrep ‘\..5’ datafile  # 例如 .65
> egrep '\.5‘ datafile  # 例如 .56
> egrep ’^[we]' datafile  # 以 w 或 e 开头的行
> egrep ‘\<[a-r].*n\>` datafile  # 以 [a-r] 开头并以 n 结尾的单词
> ifconfig | egrep -o '([0-9]{1,3}\.){3}[0-9]{1,3}'  # 点分十进制 ipv4 地址

字符类别

[:alnum:] Alphanumeric characters
[:alpha:] Alphabetic characters
[:blank:] Space and TAB characters
[:cntrl:] Control characters
[:digit:] Numeric characters
[:graph:] Characters that are both printable and visible (a space is printable but not visible, whereas an ‘a’ is both)
[:lower:] Lowercase alphabetic characters
[:print:] Printable characters (characters that are not control characters)
[:punct:] Punctuation characters (characters that are not letters, digits, control characters, or space characters)
[:space:] Space characters (these are: space, TAB, newline, carriage return, formfeed and vertical tab)
[:upper:] Uppercase alphabetic characters
[:xdigit:] Characters that are hexadecimal digits

正则表达式