设为首页 加入收藏

TOP

Hadoop文本转换为序列文件
2014-11-24 02:58:01 来源: 作者: 【 】 浏览:1
Tags:Hadoop 文本 转换 序列 文件

相关阅读:


时隔好久,今天又重新试了下,居然不行了?,比如,我要编写一个把文本转为序列文件的java程序如下:


package mahout.fansy.canopy.transformdata;


import java.io.IOException;


import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.SequenceFileOutputFormat;
import org.apache.mahout.common.AbstractJob;
import org.apache.mahout.math.RandomAccessSparseVector;
import org.apache.mahout.math.Vector;
import org.apache.mahout.math.VectorWritable;


public class Text2VectorWritable extends AbstractJob{

@Override
public int run(String[] arg0) throws Exception {
addInputOption();
addOutputOption();
if (parseArguments(arg0) == null) {
return -1;
}
Path input=getInputPath();
Path output=getOutputPath();
Configuration conf=getConf();
Job job=new Job(conf,"text2vectorWritable with input:"+input.getName());
// job.setInputFormatClass(SequenceFileInputFormat.class);
job.setOutputFormatClass(SequenceFileOutputFormat.class);
job.setMapperClass(Text2VectorWritableMapper.class);
job.setMapOutputKeyClass(Writable.class);
job.setMapOutputValueClass(VectorWritable.class);
job.setNumReduceTasks(0);
job.setJarByClass(Text2VectorWritable.class);

FileInputFormat.addInputPath(job, input);
SequenceFileOutputFormat.setOutputPath(job, output);
if (!job.waitForCompletion(true)) {
throw new InterruptedException("Canopy Job failed processing " + input);
}
return 0;
}

public static class Text2VectorWritableMapper extends Mapper{
public void map(Writable key,Text value,Context context)throws IOException,InterruptedException{
String[] str=value.toString().split(",");
Vector vector=new RandomAccessSparseVector(str.length);
for(int i=0;i vector.set(i, Double.parseDouble(str[i]));
}
VectorWritable va=new VectorWritable(vector);
context.write(key, va);
}
}

}


】【打印繁体】【投稿】【收藏】 【推荐】【举报】【评论】 【关闭】 【返回顶部
分享到: 
上一篇Padrino 生成器指南 下一篇undefined reference to error解..

评论

帐  号: 密码: (新用户注册)
验 证 码:
表  情:
内  容: