C++ regex 正则表达式的使用 - c++编程基础

在c++中，有三种正则可以选择使用，C ++regex，C regex，boost regex ，如果在windows下开发c++，默认不支持后面两种正则，如果想快速应用，显然C++ regex 比较方便使用。文章将讨论C++ regex 正则表达式的使用。

C++ regex函数有3个：regex_match、 regex_search 、regex_replace

regex_match

regex_match是正则表达式匹配的函数，下面以例子说明。如果想系统的了解，参考regex_match

// regex_match example
#include 
  
   
#include 
   
     #include 
    
      int main () { if (std::regex_match ("subject", std::regex("(sub)(.*)") )) std::cout << "string literal matched\n"; std::string s ("subject"); std::regex e ("(sub)(.*)"); if (std::regex_match (s,e)) std::cout << "string object matched\n"; if ( std::regex_match ( s.begin(), s.end(), e ) ) std::cout << "range matched\n"; std::cmatch cm; // same as std::match_results
     
       cm; std::regex_match ("subject",cm,e); std::cout << "string literal with " << cm.size() << " matches\n"; std::smatch sm; // same as std::match_results
      
        sm; std::regex_match (s,sm,e); std::cout << "string object with " << sm.size() << " matches\n"; std::regex_match ( s.cbegin(), s.cend(), sm, e); std::cout << "range with " << sm.size() << " matches\n"; // using explicit flags: std::regex_match ( "subject", cm, e, std::regex_constants::match_default ); std::cout << "the matches were: "; for (unsigned i=0; i
       
        输出如下：
        
 
         
        string literal matched
string object matched
range matched
string literal with 3 matches
string object with 3 matches
range with 3 matches
the matches were: [subject] [sub] [ject]

        
 
        regex_search 
        regex_match是另外一个正则表达式匹配的函数，下面是regex_search的例子。regex_search和regex_match的主要区别是：regex_match是全词匹配，而regex_search是搜索其中匹配的字符串。如果想系统了解，请参考regex_search 
        // regex_search example
#include 
         
          
#include 
          
            #include 
           
             int main(){ std::string s ("this subject has a submarine as a subsequence"); std::smatch m; std::regex e ("\\b(sub)([^ ]*)"); // matches words beginning by "sub" std::cout << "Target sequence: " << s << std::endl; std::cout << "Regular expression: /\\b(sub)([^ ]*)/" << std::endl; std::cout << "The following matches and submatches were found:" << std::endl; while (std::regex_search (s,m,e)) { for (auto x=m.begin();x!=m.end();x++) std::cout << x->str() << " "; std::cout << "--> ([^ ]*) match " << m.format("$2") <
             
              
             输出如下：





 
              
             Target sequence: this subject has a submarine as a subsequence
Regular expression: /\b(sub)([^ ]*)/
The following matches and submatches were found:
subject sub ject --> ([^ ]*) match ject
submarine sub marine --> ([^ ]*) match marine
subsequence sub sequence --> ([^ ]*) match sequence
 
             
             /********  无情的分割线 ********* /    
  作者：没有开花的树    
  博客：blog.csdn.net/mycwq    
/ *******   无情的copy  *********/
             regex_replace
              
             regex_replace是替换正则表达式匹配内容的函数，下面是regex_replace的例子。如果想系统了解，请参考regex_replace
  
              
             #include 
              
                
#include 
               
                 int main() { char buf[20]; const char *first = "axayaz"; const char *last = first + strlen(first); std::regex rx("a"); std::string fmt("A"); std::regex_constants::match_flag_type fonly = std::regex_constants::format_first_only; *std::regex_replace(&buf[0], first, last, rx, fmt) = '\0'; std::cout << &buf[0] << std::endl; *std::regex_replace(&buf[0], first, last, rx, fmt, fonly) = '\0'; std::cout << &buf[0] << std::endl; std::string str("adaeaf"); std::cout << std::regex_replace(str, rx, fmt) << std::endl; std::cout << std::regex_replace(str, rx, fmt, fonly) << std::endl; return 0; } 
               
              输出如下：
             
 
             AxAyAz
Axayaz
AdAeAf
Adaeaf 
             C++ regex正则表达式的规则和其他编程语言差不多，如下： 
             特殊字符（用于匹配很难形容的字符）: 
              
              
               
                
                 characters 
                 description 
                 matches 
                
                
                 . 
                 not newline 
                 any character except line terminators (LF, CR, LS, PS). 
                
                
                 \t 
                 tab (HT) 
                 a horizontal tab character (same as \u0009). 
                
                
                 \n 
                 newline (LF) 
                 a newline (line feed) character (same as \u000A). 
                
                
                 \v 
                 vertical tab (VT) 
                 a vertical tab character (same as \u000B). 
                
                
                 \f 
                 form feed (FF) 
                 a form feed character (same as \u000C). 
                
                
                 \r 
                 carriage return (CR) 
                 a carriage return character (same as \u000D). 
                
                
                 \cletter 
                 control code 
                 a control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32.
 For example: \ca is the same as \u0001, \cb the same as \u0002, and so on... 
                
                
                 \xhh 
                 ASCII character 
                 a character whose code unit value has an hex value equivalent to the two hex digits hh.
 For example: \x4c is the same as L, or \x23 the same as #. 
                
                
                 \uhhhh 
                 unicode character 
                 a character whose code unit value has an hex value equivalent to the four hex digitshhhh. 
                
                
                 \0 
                 null 
                 a null character (same as \u0000). 
                
                
                 \int 
                 backreference 
                 the result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than 0). See groups below for more info. 
                
                
                 \d 
                 digit 
                 a decimal digit character  
                
                
                 \D 
                 not digit 
                 any character that is not a decimal digit character 
                
                
                 \s 
                 whitespace 
                 a whitespace character  
                
                
                 \S 
                 not whitespace 
                 any character that is not a whitespace character 
                
                
                 \w 
                 word 
                 an alphanumeric or undersco
            

            
            
                首页 上一页   1 2  下一页 尾页 1/2/2            

            
                
                
                    
                        
                    
                

                
                    
                        上一篇
                        LeetCode | Binary Tree Postorde..
                    
                    
                        下一篇
                        Codeforces 380C Sereja and Brac..
                    
                
            
        

       

    




    
        Copyright © https://www.cppentry.com all rights reserved 
        粤ICP备13067022号-3

characters	description	matches
`.`	not newline	any character except line terminators (LF, CR, LS, PS).
`\t`	tab (HT)	a horizontal tab character (same as `\u0009`).
`\n`	newline (LF)	a newline (line feed) character (same as `\u000A`).
`\v`	vertical tab (VT)	a vertical tab character (same as `\u000B`).
`\f`	form feed (FF)	a form feed character (same as `\u000C`).
`\r`	carriage return (CR)	a carriage return character (same as `\u000D`).
`\c`letter	control code	a control code character whose code unit value is the same as the remainder of dividing the code unit value of letter by 32. For example: `\ca` is the same as `\u0001`, `\cb` the same as `\u0002`, and so on...
`\x`hh	ASCII character	a character whose code unit value has an hex value equivalent to the two hex digits hh. For example: `\x4c` is the same as `L`, or `\x23` the same as `#`.
`\u`hhhh	unicode character	a character whose code unit value has an hex value equivalent to the four hex digitshhhh.
`\0`	null	a null character (same as `\u0000`).
`\`int	backreference	the result of the submatch whose opening parenthesis is the int-th (int shall begin by a digit other than `0`). See groups below for more info.
`\d`	digit	a decimal digit character
`\D`	not digit	any character that is not a decimal digit character
`\s`	whitespace	a whitespace character
`\S`	not whitespace	any character that is not a whitespace character
`\w`	word	an alphanumeric or undersco 首页上一页 1 2 下一页尾页 1/2/2 上一篇 LeetCode \| Binary Tree Postorde.. 下一篇 Codeforces 380C Sereja and Brac.. Copyright © https://www.cppentry.com all rights reserved 粤ICP备13067022号-3

C++ regex 正则表达式的使用(一)