编译器背后的小故事 - c++编程基础

工作当中遇到过好几次比较诡异的问题，最后基本都是编译器捣的鬼，在此总结一下，以供大家参考，不对之处希望踊跃拍砖(peakflys原创作品，转载注明 )
编译环境：GCC 3.4.5 20051201 (项目组早期代码从04年开始的)，为了脱离实际项目代码，下面仅用测试例子来反映基本一样的问题。
例一：
/**
*\author peakflys
*\brief 演示编译器“潜规则”
*/
#include
#include
using namespace std;
typedef unsigned int DWORD;
typedef unsigned long QWORD;

int main()
{
cout<<"max unsigned int : \t"<::max()< cout<<"max unsigned long: \t"<::max()< DWORD st1 = 4200000000,st2 = 100000000;
QWORD i1 = st1 + st2; //或者直接就用4200000000 + 100000000
cout< return 0;
}
我服务器上的运行结果：
max unsigned int : 4294967295
max unsigned long: 18446744073709551615
5032704
如果在高级别的警告环境下，编译会有溢出警告，而运行结果也证明了确实溢出了。
查看汇编如下：
0x0000000000400949 : movl $0xfa56ea00,-0x18(%rbp)
0x0000000000400950 : movl $0x5f5e100,-0x14(%rbp)
0x0000000000400957 : mov -0x14(%rbp),%eax
0x000000000040095a : add -0x18(%rbp),%eax
0x000000000040095d : mov %eax,%eax
0x000000000040095f : mov %rax,-0x10(%rbp)
0x0000000000400963 : mov -0x10(%rbp),%rsi
0x0000000000400967 : mov $0x600ed0,%edi
0x000000000040096c : callq 0x400748 <_ZNSolsEm@plt>
原来编译器是先把st2值放在eax里，然后和st1相加，结果还是放在eax里，而eax是32位寄存器，自然溢出了……
修改代码及运行结果如下：
/**
*\author peakflys
*\brief 演示编译器“潜规则”
*/
#include
#include
using namespace std;
typedef unsigned int DWORD;
typedef unsigned long QWORD;

int main()
{
cout<<"max unsigned int : \t"<::max()< cout<<"max unsigned long: \t"<::max()< DWORD st1 = 4200000000,st2 = 100000000;
QWORD i1 = (QWORD)st1 + st2;
cout< return 0;
}
max unsigned int : 4294967295
max unsigned long: 18446744073709551615
4300000000
这次运行正确，直接disassemble，相加代码汇编如下：
0x0000000000400949 : movl $0xfa56ea00,-0x18(%rbp)
0x0000000000400950 : movl $0x5f5e100,-0x14(%rbp)
0x0000000000400957 : mov -0x18(%rbp),%edx
0x000000000040095a : mov -0x14(%rbp),%eax
0x000000000040095d : lea (%rdx,%rax,1),%rax
0x0000000000400961 : mov %rax,-0x10(%rbp)
0x0000000000400965 : mov -0x10(%rbp),%rsi
0x0000000000400969 : mov $0x600ed0,%edi
0x000000000040096e : callq 0x400748 <_ZNSolsEm@plt>
可见这次编译器动用了两个寄存器edx和eax来做相加操作，结果在64为的rax里，自然不会溢出了。这个例子的原型是程序里处理玩家获得经验的经验公式，本来很多经验，最后只获得了极少的经验。
例二：
/**
*\author peakflys
*\brief 演示编译器自动优化
*/
#include
using namespace std;
class A
{
public:
A(const int _a) : a(a){}
int a;
};
int main()
{
cout<<"test1(const Class):\t";
const A ca(100);
A *pca = (A*)&ca;
pca->a = 50;
cout<<"initValue: 100"<<"\tconstValue:"<a<

cout<<"test2(const int):\t";
const int a = 100;
int *pi = (int *)&a;
*pi = 50;
cout<<"initValue: 100"<<"\tconstValue:"<

cout<<"test3(const string):\t";
const string s("中国"); "
char *ps = const_cast (s.c_str());
strcpy(ps,"美国"); "
cout<<"initValue: 中国"<<"\tconstValue:"<

return 0;
}
运行结果如下：
test1(const Class): initValue: 100 constValue:50 nonconstValue:50
test2(const int): initValue: 100 constValue:100 nonconstValue:50
test3(const string): initValue: 中国 constValue:美国 nonconstValue:美国
程序中强制去除const的代码很粗暴，很ugly，但是实际使用中有时候不得不因为各种原因而使用这样的代码。上面例子中对于class类型(自定义的classA和系统的class string)和系统基本类型(上面的int)强制去掉栈对象的const属性再做操作是可行的(这样的const仅仅是编译器操作的const，如果是字符串常量等实际常量区的对象强行操作，操作系统会发飙的……)，但是操作结果却不一样，查看汇编：
ump of assembler code for function main:
0x0000000000400b24 : push %rbp
0x0000000000400b25 : mov %rsp,%rbp
0x0000000000400b28 : push %r12
0x0000000000400b2a : push %rbx
0x0000000000400b2b : sub $0x50,%rsp

***********************************test1*********************************************

0x0000000000400b2f : mov $0x400e48,%esi
0x0000000000400b34 : mov $0x601300,%edi
0x0000000000400b39 : call

编译器背后的小故事(一)