Clojure专题:字符串处理(二)

2014-11-24 02:45:30 · 作者: · 浏览: 5
pattern / (string or function of match).
;; 在替换字符串中,$0,$1等用来引用匹配字符串中的组.
(str/replace "@davidgraeber 12.3%,@shanley 19.8%"
             #"(@\S+)\s([.0-9]+)%"
             "$2 ($1)")
;=> "12.3 (@davidgraeber),19.8 (@shanley)"

;; 使用函数来替换字符串,更灵活
(println
  (str/replace "@davidgraeber 12.3%,@shanley 19.8%"
               #"(@\w+)\s([.0-9]+)%, "
               (fn [[_ person percent]]
                   (let [points (-> percent Float/parseFloat (* 100) Math/round)]
                     (str person "'s followers grew " points " points.\n")))))
;print=> @davidgraeber's followers grew 1230 points.
;print=> @shanley's followers grew 1980 points.
;print=>

上下文无关语法

和正则表达式相比上下文无关语法提供了更具表现力的匹配方式.你能使用例如嵌套来表达想法.

我们将在JSON语法上使用Instapares.(这个例子没有经过严格的测试也没什么特色.如果想应用到实际开发中,请使用data.json)


;; 你项目中的project.clj需要包含如下依赖(你可能需要重启JVM)
;;   :dependencies [[instaparse "1.2.4"]]
;;
;;  我们假设你的ns宏包含了如下内容:
;;   (:require [instaparse.core :as insta])
;; 或你在REPL里加载了:
;;   (require '[instaparse.core :as insta])

(def barely-tested-json-parser
  (insta/parser
   "object     = <'{'> 
                      
                        (members 
                       
                        )* <'}'> 
                        
                          = pair (
                         
                           <','> 
                          
                            members)* 
                           
                             = string 
                            
                              <':'> 
                             
                               value 
                              
                                = string | number | object | array | 'true' | 'false' | 'null' array = <'['> elements* <']'> 
                               
                                 = value 
                                
                                  (<','> 
                                 
                                   elements)* number = int frac  exp  
                                  
                                    = '-'  digits 
                                   
                                     = '.' digits 
                                    
                                      = e digits 
                                     
                                       = ('e' | 'E') (<'+'> | '-')  
                                      
                                        = #'[0-9]+' (* First sketched state machine; then it was easier to figure out regex syntax and all the maddening escape-backslashes. *) string = <'\\\"'> #'([^\"\\\\]|\\\\.)*' <'\\\"'> 
                                       
                                         = #'\\s+'")) (barely-tested-json-parser "{\"foo\": {\"bar\": 99.9e-9, \"quux\": [1, 2, -3]}}") 
                                        ;
                                        => [:object 
                                        ; 
                                        [:string "foo"] 
                                        ; 
                                        [:object 
                                        ; 
                                        [:string "bar"] 
                                        ; 
                                        [:number "99" "." "9" "e" "-" "9"] 
                                        ; 
                                        [:string "quux"] 
                                        ; 
                                        [:array [:number "1"] [:number "2"] [:number "-" "3"]]]] 
                                        ;; 
                                        最后的输出有点 嗦,我们改进一下. (->> (barely-tested-json-parser "{\"foo\": {\"bar\": 99.9e-9, \"quux\": [1, 2, -3]}}") (insta/transform {:object hash-map :string str :array vector :number (comp edn/read-string str)})) 
                                        ;
                                        => {"foo" {"quux" [1 2 -3], "bar" 9.99E-8}} 
                                        ;; 
                                        所有的内容都在上面了  
                                        ;; 
                                        ;; 
                                        =右边的语法主要用来隐藏多余信息.比如说,我们不关心空格,所以我们通过设置
                                         
                                          来隐藏
                                          
                                        ;; 
                                        ;; 
                                        =左边的语法只是用来避免嵌套输出的.例如,"members"仅仅是一个人为设置的实体,  
                                        ;; 
                                        所以我们阻止了这个无意义的嵌套关系 
                                       
                                      
                                     
                                    
                                   
                                  
                                 
                                
                               
                              
                             
                            
                           
                          
                         
                        
                       
                      

构建复杂的字符串

重定向

with-out-str提供了一个简单的方法来构建字符串.它重定向标准输出(out)到StringWriter,然后返回结果字符串.这样你就可以使用print这样的函数来获得字符串(即使在嵌套函数中)


(let [shrimp-varieties ["shrimp-kabobs" "shrimp creole" "shrimp gumbo"]]
  (with-out-str
    (print "We have ")
    (doseq [name (str/join ", " shrimp-varieties)]
      (print name))
    (print "...")))
;=> "We have shrimp-kabobs, shrimp creole, shrimp gumbo..."

格式化字符串

Java里的模板能帮助你方便的构造字符串.Reference


;; %s经常会作为print的参数. 而%需要使用%%
(format "%s enjoyed %s%%." "Mozambique" 19.8) ;=> "Mozambique enjoyed 19.8%."

;; 1$前缀能引用到其后的第一个参数
(format "%1$tY-%1$tm-%1$latex td" #inst"2000-01-02T00:00:00") ;=> "2000-01-02"

;; 同上1$, 2$前缀可以引用其后面的参数
(format "New year: %2$tY. Old year: %1$tY"
        #inst"2000-01-02T00:00:00"
        #inst"3111-12-31T00:00:00")
;=> "New year: 3111. Old year: 2000"

CL-Format

cl-format是Common Lisp中的一个臭名昭著的功能.例如,你可以从序列来构建字符串(其怪异程度就像使用英文来表示数字,使用两种类型的罗马数字).但对于打印日期和上面的对无序参数的引用方面却又比普通的格式化功能弱.



你只需要记住cl-format就是个无人问津的不值一学的语言.但是如果你喜欢并想学习它,那么看Practical Common Lisp教程.或者Common Lisp手册


;; The first param prints to *out* if true. To string if false.
;; To a stream if it's a stream.
(pp/cl-format true "~{~{~a had ~s percentage point~:p.~}~^~%~}"
              {"@davidgraeber" 12.3
               "@shanley" 19.8
               "@tjgabbour" 1})
;print=> @davidgraeber had 12.3 percentage points.
;print=> @tjgabbour had 1 percentage point.
;print=> @shanley had 19.8 percentage points.

(def format-string "~{~#[~;~a~;~