今天在线上Java代码里,处理了一个由ExecutorServicec线程池引发的问题,将处理过程和一些调试沉淀下来。
分布式任务处理模块中,有一个Java daemon进程,通过队列接受Java代码描述的任务(jar),产生子进程(单独的JVM)class loader,处理定义的Java代码,并收集日志、处理结果等,子进程数量在百级。启动参数如下:
java -server -Xss256k -Xmx4096m -Xms4096m -XX:NewRatio=4 -XX:PermSize=96m -XX:MaxPermSize=96m -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:ErrorFile=./stderr -XX:+UseConcMarkSweepGC
问题现象
执行的大量任务集中在凌晨,通常情况下,内存维持在1G ~ 1.5G,由于代码方面上会注意Yong GC的亲和性,并且处理逻辑趋向短时间的内存适用,故YongGC次数稍多,每次回收较多,FullGC很少。
日常的gc.log截取:
2014-04-02T00:15:13.336+0800: 31549.990: [GC 31549.990: [ParNew: 673793K->2029K(755008K), 0.0051360 secs] 731427K->60257K(4110464K), 0.0053540 secs] [Times: user=0.03 sys=0.00, real=0.00 secs] 2014-04-02T02:02:17.091+0800: 37973.745: [GC 37973.745: [ParNew: 673197K->3098K(755008K), 0.0056950 secs] 731425K->61807K(4110464K), 0.0059840 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 2014-04-02T03:25:39.567+0800: 42976.221: [GC 42976.221: [ParNew: 674266K->2130K(755008K), 0.0067750 secs] 732975K->60842K(4110464K), 0.0070530 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2014-04-02T04:12:15.891+0800: 45772.545: [GC 45772.545: [ParNew: 673298K->4497K(755008K), 0.0103740 secs] 732010K->64271K(4110464K), 0.0106280 secs] [Times: user=0.06 sys=0.00, real=0.01 secs] 2014-04-02T04:40:27.685+0800: 47464.339: [GC 47464.340: [ParNew: 675611K->1931K(755008K), 0.0076410 secs] 735386K->61723K(4110464K), 0.0078830 secs] [Times: user=0.04 sys=0.00, real=0.01 secs] 2014-04-02T05:02:00.719+0800: 48757.373: [GC 48757.373: [ParNew: 673095K->2770K(755008K), 0.0069400 secs] 732887K->62579K(4110464K), 0.0071810 secs] [Times: user=0.04 sys=0.00, real=0.00 secs] 2014-04-02T05:18:45.616+0800: 49762.270: [GC 49762.270: [ParNew: 673911K->4611K(755008K), 0.0084600 secs] 733720K->64424K(4110464K), 0.0087260 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 2014-04-02T05:31:57.960+0800: 50554.614: [GC 50554.614: [ParNew: 675779K->7261K(755008K), 0.0111740 secs] 735592K->67237K(4110464K), 0.0114760 secs] [Times: user=0.06 sys=0.00, real=0.02 secs] 2014-04-02T05:39:29.031+0800: 51005.685: [GC 51005.686: [ParNew: 678429K->8082K(755008K), 0.0091520 secs] 738405K->68060K(4110464K), 0.0094180 secs] [Times: user=0.05 sys=0.00, real=0.01 secs] 2014-04-02T05:45:38.114+0800: 51374.768: [GC 51374.768: [ParNew: 679250K->8359K(755008K), 0.0071780 secs] 739228K->69105K(4110464K), 0.0074630 secs] [Times: user=0.04 sys=0.00, real=0.01 secs]
日常的jstat截取(启动后一段时间):
[xxxxxxxxxxxxxxxxxxxx]$ sudo /usr/java/default/bin/jstat -gcutil 9974 1s S0 S1 E O P YGC YGCT FGC FGCT GCT 0.00 0.60 78.51 2.52 40.91 139 0.718 0 0.000 0.718 0.00 0.60 78.51 2.52 40.91 139 0.718 0 0.000 0.718 0.00 0.60 78.51 2.52 40.91 139 0.718 0 0.000 0.718 0.00 0.60 78.51 2.52 40.91 139 0.718 0 0.000 0.718 0.00 0.60 78.51 2.52 40.91 139 0.718 0 0.000 0.718
当天凌晨 JVM内存异常,触发报警。内存几乎用满,FullGC频率极具增加,子进程由于和该进程有内存交互,故部分任务直接oom,报错out of memroy : java heap space。
内存暴增后的gc.log:
2014-03-31T08:28:49.731+0800: 210168.258: [CMS-concurrent-sweep-start] 2014-03-31T08:28:49.766+0800: 210168.294: [CMS-concurrent-sweep: 0.035/0.035 secs] [Times: user=0.03 sys=0.00, real=0.03 secs] 2014-03-31T08:28:49.766+0800: 210168.294: [CMS-concurrent-reset-