怎么在hadoop上运行mapreduce程序_CMS教程

一、首先要知道此前提转载若在windows的Eclipse工程中直接启动mapreduc程序，需要先把hadoop集群的配置目录下的xml都拷贝到src目录下，让程序自动读取集群的地址后去进行分布式运行(您也可以自己写java代码去设置job的configuration属性)。

网上的 MapReduce WordCount 教程对于如何编译 WordCountJava 几乎是一笔带过… 而有写到的，大多又是 020 等旧版本版本的做法，即 javac -classpath /usr/local/Hadoop/hadoop-101/hadoop-core-101jar WordCountjava，但较新的 2X 版本中，已经没有 hadoop-corejar 这个文件，因此编辑和打包自己的 MapReduce 程序与旧版本有所不同。

本文以 Hadoop 272 环境下的 WordCount 实例来介绍 2x 版本中如何编辑自己的 MapReduce 程序。

编译、打包 Hadoop MapReduce 程序

我们将 Hadoop 的 classhpath 信息添加到 CLASSPATH 变量中，在 ~/bashrc 中增加如下几行：

[html] view plain copy

export HADOOP_HOME=/usr/local/hadoop

export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH

别忘了执行 source ~/bashrc 使变量生效，接着就可以通过 javac 命令编译 WordCountjava 了（使用的是 Hadoop 源码中的 WordCountjava，源码在文本最后面）：javac WordCountjava

编译时会有警告，可以忽略。编译后可以看到生成了几个 class 文件。

接着把 class 文件打包成 jar，才能在 Hadoop 中运行：

[html] view plain copy

jar -cvf WordCountjar /WordCountclass

开始运行：

[html] view plain copy

hadoop jar WordCountjar WordCount input output//hdfs上的input文件夹，命令执行所在位置为WordCountjar同一目录

因为程序中声明了

package ，所以在命令中也要 orgapachehadoopexamples 写完整：

[html] view plain copy

hadoop jar WordCountjar orgapachehadoopexamplesWordCount input output

查看：

[html] view plain copy

hadoop fs -cat /output/part-r-00000

WordCountjava 源码

package orgapachehadoopexamples;

import javaioIOException;

import javautilStringTokenizer;

import orgapachehadoopconfConfiguration;

import orgapachehadoopfsPath;

import orgapachehadoopioIntWritable;

import orgapachehadoopioText;

import orgapachehadoopmapreduceJob;

import orgapachehadoopmapreduceMapper;

import orgapachehadoopmapreduceReducer;

import orgapachehadoopmapreducelibinputFileInputFormat;

import orgapachehadoopmapreduceliboutputFileOutputFormat;

import orgapachehadooputilGenericOptionsParser;

public class WordCount {

public static class TokenizerMapper

extends Mapper<Object, Text, Text, IntWritable>{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context

) throws IOException, InterruptedException {

StringTokenizer itr = new StringTokenizer(valuetoString());

while (itrhasMoreTokens()) {

wordset(itrnextToken());

contextwrite(word, one);

}

public static class IntSumReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {

private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context

) throws IOException, InterruptedException {

int sum = 0;

for (IntWritable val : values) {

sum += valget();

}

resultset(sum);

contextwrite(key, result);

}

public static void main(String[] args) throws Exception {

Configuration conf = new Configuration();

String[] otherArgs = new GenericOptionsParser(conf, args)getRemainingArgs();

if (otherArgslength != 2) {

Systemerrprintln("Usage: wordcount <in> <out>");

Systemexit(2);

}

Job job = new Job(conf, "word count");

jobsetJarByClass(WordCountclass);

jobsetMapperClass(TokenizerMapperclass);

jobsetCombinerClass(IntSumReducerclass);

jobsetReducerClass(IntSumReducerclass);

jobsetOutputKeyClass(Textclass);

jobsetOutputValueClass(IntWritableclass);

FileInputFormataddInputPath(job, new Path(otherArgs[0]));

FileOutputFormatsetOutputPath(job, new Path(otherArgs[1]));

Systemexit(jobwaitForCompletion(true) 0 : 1);

}

1、在用hadoop的时候，都是在linux上运行，没出现这个问题。 2、但我在用nutch在window上开发的时候遇到了这个问题，主要是跟hadoop、nutch在cygwin上运行时的权限有关系，只要改下hadoop-corejar包中的FileUtilsjava中的一个方法就可以了。

要在Hadoop集群运行上运行JNI程序，首先要在单机上调试程序直到可以正确运行JNI程序，之后移植到Hadoop集群就是水到渠成的事情。

Hadoop运行程序的方式是通过jar包，所以我们需要将所有的class文件打包成jar包。在打包的过程中，无需将动态链接库包含进去。

在集群中运行程序之前，Hadoop会首先将jar包传递到所有的节点，然后启动运行。我们可以在这个阶段将动态链接库作为附件和jar包同时传递到所有的节点。方法就是给jar命令指定-files参数。命令如下：

hadoop

jarSegmentjar

Segment

-files

/bin/libso

input

output

通过该命令，jar包和动态链接库都会传递到所有节点，然后启动mapreduce任务。

网上有很多教程，但是都很复杂，而且很多方法验证也不正确，上述方法是最简单的方式。

很有可能是没部署成功，检查jobtracker和tasktracker之间是否通，namenode和datanode

SSH是否能跳过去。

可能是HOST问题，我以前就遇到过，后来所有IP改成Host名字就好了。

还有可以去看下Job的日志是否有错。

以上就是关于怎么在hadoop上运行mapreduce程序全部的内容，包括:怎么在hadoop上运行mapreduce程序、如何在hadoop2.5.2使用命令行编译打包运行自己的mapreduce程序、hadoop wordcount程序运行错误等相关内容解答，如果想了解更多相关内容，可以关注我们，你们的支持是我们更新的动力！

欢迎分享，转载请注明来源：内存溢出

原文地址:https://54852.com/zz/9399026.html

怎么在hadoop上运行mapreduce程序

发表评论

评论列表（0条）