未定义行为之 NULL dereference
下面这段代码中 is_valid()
解引用了空指针 str
,我们的直觉是编译运行后将迎来 SIGSEGV,然而事情并非所期望的那样。
/*
* ub_null.c - 未定义行为演示 之 NULL dereference
*/
#include <stdio.h>
#include <string.h>
int is_valid(const char *str)
{
if(*str == 0x80) return 1;
if(str == NULL) return 0;
return strcmp(str, "expected string") == 0;
}
int main(void)
{
const char *str = NULL;
printf("%d\n", is_valid(str));
return 0;
}
lyazj@HelloWorld:~$ gcc --version gcc (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0 Copyright (C) 2021 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. lyazj@HelloWorld:~$ gcc -Wall -Wshadow -Wextra ub_null.c -o ub_null -O0 ub_null.c: In function ‘is_valid’: ub_null.c:6:11: warning: comparison is always false due to limited range of data type [-Wtype-limits] 6 | if(*str == 0x80) return 0; | ^~ lyazj@HelloWorld:~$ ./ub_null 0
结合 GCC 发出的警告,不难推断出条件表达式 *str == 0x80
在编译期被求值且相应的 if
语句被优化掉了,而且这是在 O0 的优化等级下。以下的反汇编结果验证了这一点。
lyazj@HelloWorld:~$ objdump --disassemble=is_valid -j.text ub_null ub_null: file format elf64-x86-64 Disassembly of section .text: 0000000000001169 <is_valid>: 1169: f3 0f 1e fa endbr64 116d: 55 push %rbp 116e: 48 89 e5 mov %rsp,%rbp 1171: 48 83 ec 10 sub $0x10,%rsp 1175: 48 89 7d f8 mov %rdi,-0x8(%rbp) 1179: 48 83 7d f8 00 cmpq $0x0,-0x8(%rbp) 117e: 75 07 jne 1187 <is_valid+0x1e> 1180: b8 00 00 00 00 mov $0x0,%eax 1185: eb 1e jmp 11a5 <is_valid+0x3c> 1187: 48 8b 45 f8 mov -0x8(%rbp),%rax 118b: 48 8d 15 72 0e 00 00 lea 0xe72(%rip),%rdx # 2004 <_IO_stdin_used+0x4> 1192: 48 89 d6 mov %rdx,%rsi 1195: 48 89 c7 mov %rax,%rdi 1198: e8 d3 fe ff ff call 1070 <strcmp@plt> 119d: 85 c0 test %eax,%eax 119f: 0f 94 c0 sete %al 11a2: 0f b6 c0 movzbl %al,%eax 11a5: c9 leave 11a6: c3 ret
我们在同一环境对 O3 优化等级做相同的实验,得到了相同的结果:
lyazj@HelloWorld:~$ gcc -Wall -Wshadow -Wextra ub_null.c -o ub_null -O3 ub_null.c: In function ‘is_valid’: ub_null.c:6:11: warning: comparison is always false due to limited range of data type [-Wtype-limits] 6 | if(*str == 0x80) return 0; | ^~ lyazj@HelloWorld:~$ ./ub_null 0 lyazj@HelloWorld:~$ objdump --disassemble=is_valid -j.text ub_null ub_null: file format elf64-x86-64 Disassembly of section .text: 00000000000011a0 <is_valid>: 11a0: f3 0f 1e fa endbr64 11a4: 48 85 ff test %rdi,%rdi 11a7: 74 27 je 11d0 <is_valid+0x30> 11a9: 48 83 ec 08 sub $0x8,%rsp 11ad: 48 8d 35 50 0e 00 00 lea 0xe50(%rip),%rsi # 2004 <_IO_stdin_used+0x4> 11b4: e8 a7 fe ff ff call 1060 <strcmp@plt> 11b9: 85 c0 test %eax,%eax 11bb: 0f 94 c0 sete %al 11be: 48 83 c4 08 add $0x8,%rsp 11c2: 0f b6 c0 movzbl %al,%eax 11c5: c3 ret 11c6: 66 2e 0f 1f 84 00 00 cs nopw 0x0(%rax,%rax,1) 11cd: 00 00 00 11d0: 31 c0 xor %eax,%eax 11d2: c3 ret
接下来我们用下面的两行代码替换被优化掉的 if
语句,看看会发生什么:
char head = *str;
if(head == 0x80) return 0;
lyazj@HelloWorld:~$ gcc -Wall -Wshadow -Wextra ub_null.c -o ub_null -O0 ub_null.c: In function ‘is_valid’: ub_null.c:10:11: warning: comparison is always false due to limited range of data type [-Wtype-limits] 10 | if(head == 0x80) return 0; | ^~ lyazj@HelloWorld:~$ ./ub_null Segmentation fault lyazj@HelloWorld:~$ objdump --disassemble=is_valid -j.text ub_null ub_null: file form