C++ regex 正则表达式的使用(一)

2014-11-24 08:48:33 · 作者: · 浏览: 2

在c++中,有三种正则可以选择使用,C ++regex,C regex,boost regex ,如果在windows下开发c++,默认不支持后面两种正则,如果想快速应用,显然C++ regex 比较方便使用。文章将讨论C++ regex 正则表达式的使用。

C++ regex函数有3个:regex_match、 regex_search 、regex_replace

regex_match

regex_match是正则表达式匹配的函数,下面以例子说明。如果想系统的了解,参考regex_match

// regex_match example
#include 
  
   
#include 
   
     #include 
    
      int main () { if (std::regex_match ("subject", std::regex("(sub)(.*)") )) std::cout << "string literal matched\n"; std::string s ("subject"); std::regex e ("(sub)(.*)"); if (std::regex_match (s,e)) std::cout << "string object matched\n"; if ( std::regex_match ( s.begin(), s.end(), e ) ) std::cout << "range matched\n"; std::cmatch cm; // same as std::match_results
     
       cm; std::regex_match ("subject",cm,e); std::cout << "string literal with " << cm.size() << " matches\n"; std::smatch sm; // same as std::match_results
      
        sm; std::regex_match (s,sm,e); std::cout << "string object with " << sm.size() << " matches\n"; std::regex_match ( s.cbegin(), s.cend(), sm, e); std::cout << "range with " << sm.size() << " matches\n"; // using explicit flags: std::regex_match ( "subject", cm, e, std::regex_constants::match_default ); std::cout << "the matches were: "; for (unsigned i=0; i
       
        输出如下:
        

string literal matched
string object matched
range matched
string literal with 3 matches
string object with 3 matches
range with 3 matches
the matches were: [subject] [sub] [ject]

regex_search

regex_match是另外一个正则表达式匹配的函数,下面是regex_search的例子。regex_search和regex_match的主要区别是:regex_match是全词匹配,而regex_search是搜索其中匹配的字符串。如果想系统了解,请参考regex_search

// regex_search example
#include 
         
          
#include 
          
            #include 
           
             int main(){ std::string s ("this subject has a submarine as a subsequence"); std::smatch m; std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub" std::cout << "Target sequence: " << s << std::endl; std::cout << "Regular expression: /\\b(sub)([^ ]*)/" << std::endl; std::cout << "The following matches and submatches were found:" << std::endl; while (std::regex_search (s,m,e)) { for (auto x=m.begin();x!=m.end();x++) std::cout << x->str() << " "; std::cout << "--> ([^ ]*) match " << m.format("$2") <
             
             

输出如下:

Target sequence: this subject has a submarine as a subsequence
Regular expression: /\b(sub)([^ ]*)/
The following matches and submatches were found:
subject sub ject --> ([^ ]*) match ject
submarine sub marine --> ([^ ]*) match marine
subsequence sub sequence --> ([^ ]*) match sequence

/********  无情的分割线 ********* /    
  作者:没有开花的树    
  博客:blog.csdn.net/mycwq    
/ *******   无情的copy  *********/
regex_replace

regex_replace是替换正则表达式匹配内容的函数,下面是regex_replace的例子。如果想系统了解,请参考regex_replace

#include 
              
                
#include 
               
                 int main() { char buf[20]; const char *first = "axayaz"; const char *last = first + strlen(first); std::regex rx("a"); std::string fmt("A"); std::regex_constants::match_flag_type fonly = std::regex_constants::format_first_only; *std::regex_replace(&buf[0], first, last, rx, fmt) = '\0'; std::cout << &buf[0] << std::endl; *std::regex_replace(&buf[0], first, last, rx, fmt, fonly) = '\0'; std::cout << &buf[0] << std::endl; std::string str("adaeaf"); std::cout << std::regex_replace(str, rx, fmt) << std::endl; std::cout << std::regex_replace(str, rx, fmt, fonly) << std::endl; return 0; } 
               
              
输出如下:
AxAyAz
Axayaz
AdAeAf
Adaeaf

C++ regex正则表达式的规则和其他编程语言差不多,如下:

特殊字符(用于匹配很难形容的字符):

characters description matches
. not newline any character except line terminators (LF, CR, LS, PS).
\t tab (HT) a horizontal tab character (same as \u0009).
\n newline (LF) a newline (line feed) character (same as \u000A).
\v vertical tab (VT) a vertical tab character (same as \u000B).
\f form feed (FF) a form feed character (same as \u000C).
\r carriage return (CR) a carriage return character (same as \u000D).
\cletter control code a control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
For example: \ca is the same as \u0001, \cb the same as \u0002, and so on...
\xhh ASCII character a character whose code unit value has an hex value equivalent to the two hex digits hh.
For example: \x4c is the same as L, or \x23 the same as #.
\uhhhh unicode character a character whose code unit value has an hex value equivalent to the four hex digitshhhh.
\0 null a null character (same as \u0000).
\int backreference the result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info.
\d digit a decimal digit character
\D not digit any character that is not a decimal digit character
\s whitespace a whitespace character
\S not whitespace any character that is not a whitespace character
\w word an alphanumeric or undersco