• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

Hadoop自动化安装及单节点方式运行

mysql 搞代码 4年前 (2022-01-09) 24次浏览 已收录 0个评论
文章目录[隐藏]

本文尝试使用shell脚本来自动化安装配置Hadoop。使用的操作系统为CentOS,Hadoop版本为?1.x,jdk版本?1.7,其他版本未测试,可能有未知bug。 Hadoop安装脚本 Hadoop安装分为3步,首先安装jdk,然后安装Hadoop,接着配置ssh免密码登陆(非必须)。[1] #!/bin/ba

本文尝试使用shell脚本来自动化安装配置Hadoop。使用的操作系统为CentOS,Hadoop版本为?1.x,jdk版本?1.7,其他版本未测试,可能有未知bug。

Hadoop安装脚本

Hadoop安装分为3步,首先安装jdk,然后安装Hadoop,接着配置ssh免密码登陆(非必须)。[1]

#!/bin/bash# Usage: Hadoop自动配置脚本# History: #	20140425  annhe  基本功能#Hadoop版本HADOOP_VERSION=1.2.1#Jdk版本,Oracle官方无直链下载,请自备rpm包并设定版本号JDK_VESION=7u51#Hadoop下载镜像,默认为北理(bit)MIRRORS=mirror.bit.edu.cn#操作系统版本OS=`uname -a |awk '{print $13}'`# Check if user is rootif [ $(id -u) != "0" ]; then    printf "Error: You must be root to run this script!\n"    exit 1fi# 检查是否是Centoscat /etc/issue|grep CentOS && r=0 || r=1if [ $r -eq 1 ]; then	echo "This script can only run on CentOS!"	exit 1fi#软件包HADOOP_FILE=hadoop-$HADOOP_VERSION-1.$OS.rpmif [ "$OS"x = "x86_64"x ]; then	JDK_FILE=jdk-$JDK_VESION-linux-x64.rpmelse	JDK_FILE=jdk-$JDK_VESION-linux-i586.rpmfifunction Install (){	#卸载已安装版本	rpm -qa |grep hadoop	rpm -e hadoop	rpm -qa | grep jdk	rpm -e jdk	#恢复/etc/profile备份文件	mv /etc/profile.bak /etc/profile	#准备软件包	if [ ! -f $HADOOP_FILE ]; then		wget "http://$MIRRORS/apache/hadoop/common/stable1/$HADOOP_FILE" && r=0 || r=1		[ $r -eq 1 ] && { echo "download error, please check your mirrors or check your network....exit"; exit 1; }	fi	[ ! -f $JDK_FILE ] && { echo "$JDK_FILE not found! Please download yourself....exit"; exit 1; }	#开始安装	rpm -ivh $JDK_FILE && r=0 || r=1	if [ $r -eq 1 ]; then		echo "$JDK_FILE install failed, please verify your rpm file....exit"		exit 1	fi	rpm -ivh $HADOOP_FILE && r=0 || r=1	if [ $r -eq 1 ]; then		echo "$HADOOP_FILE install failed, please verify your rpm file....exit"		exit 1	fi	#备份/etc/profile	cp /etc/profile /etc/profile.bak	#配置java环境变量	cat >> /etc/profile <> ~/.ssh/authorized_keys	chmod 644 ~/.ssh/authorized_keys}Install 2>&1 | tee -a hadoop_install.logSSHlogin 2>&1 | tee -a hadoop_install.log#修改HADOOP_CLIENT_OPTS后需要重启 shutdown -r now

单节点运行自带示例

默认情况下,Hadoop被配置成以非分布式模式运行的一个独立Java进程。这对调试非常有帮助。新建测试文本

[root@linux hadoop]# echo "hello world" >input/hello.txt[root@linux hadoop]# echo "hello hadoop" >input/hadoop.txt

运行Wordcount

[root@linux hadoop]# hadoop jar /usr/share/hadoop/hadoop-examples-1.2.1.jar wordcount input output14/04/26 02:56:23 INFO util.NativeCodeLoader: Loaded the native-hadoop library14/04/26 02:56:23 INFO input.FileInputFormat: Total input paths to process : 214/04/26 02:56:24 WARN snappy.LoadSnappy: Snappy native library not loaded14/04/26 02:56:24 INFO mapred.JobClient: Running job: job_local275273933_000114/04/26 02:56:24 INFO mapred.LocalJobRunner: Waiting for map tasks14/04/26 02:56:24 INFO mapred.LocalJobRunner: Starting task: attempt_local<p style="color:transparent">本文来源gao!%daima.com搞$代*!码网1</p>275273933_0001_m_000000_014/04/26 02:56:25 INFO util.ProcessTree: setsid exited with exit code 014/04/26 02:56:25 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@7e86fe3a14/04/26 02:56:25 INFO mapred.MapTask: Processing split: file:/root/hadoop/input/hadoop.txt:0+1314/04/26 02:56:25 INFO mapred.MapTask: io.sort.mb = 10014/04/26 02:56:25 INFO mapred.MapTask: data buffer = 79691776/9961472014/04/26 02:56:25 INFO mapred.MapTask: record buffer = 262144/32768014/04/26 02:56:25 INFO mapred.MapTask: Starting flush of map output14/04/26 02:56:25 INFO mapred.MapTask: Finished spill 014/04/26 02:56:25 INFO mapred.Task: Task:attempt_local275273933_0001_m_000000_0 is done. And is in the process of commiting14/04/26 02:56:25 INFO mapred.LocalJobRunner:14/04/26 02:56:25 INFO mapred.Task: Task 'attempt_local275273933_0001_m_000000_0' done.14/04/26 02:56:25 INFO mapred.LocalJobRunner: Finishing task: attempt_local275273933_0001_m_000000_014/04/26 02:56:25 INFO mapred.LocalJobRunner: Starting task: attempt_local275273933_0001_m_000001_014/04/26 02:56:25 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@16ed889d14/04/26 02:56:25 INFO mapred.MapTask: Processing split: file:/root/hadoop/input/hello.txt:0+1214/04/26 02:56:25 INFO mapred.MapTask: io.sort.mb = 10014/04/26 02:56:25 INFO mapred.MapTask: data buffer = 79691776/9961472014/04/26 02:56:25 INFO mapred.MapTask: record buffer = 262144/32768014/04/26 02:56:25 INFO mapred.MapTask: Starting flush of map output14/04/26 02:56:25 INFO mapred.MapTask: Finished spill 014/04/26 02:56:25 INFO mapred.Task: Task:attempt_local275273933_0001_m_000001_0 is done. And is in the process of commiting14/04/26 02:56:25 INFO mapred.LocalJobRunner:14/04/26 02:56:25 INFO mapred.Task: Task 'attempt_local275273933_0001_m_000001_0' done.14/04/26 02:56:25 INFO mapred.LocalJobRunner: Finishing task: attempt_local275273933_0001_m_000001_014/04/26 02:56:25 INFO mapred.LocalJobRunner: Map task executor complete.14/04/26 02:56:25 INFO mapred.Task:  Using ResourceCalculatorPlugin : org.apache.hadoop.util.LinuxResourceCalculatorPlugin@42701c5714/04/26 02:56:25 INFO mapred.LocalJobRunner:14/04/26 02:56:25 INFO mapred.Merger: Merging 2 sorted segments14/04/26 02:56:25 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 53 bytes14/04/26 02:56:25 INFO mapred.LocalJobRunner:14/04/26 02:56:25 INFO mapred.Task: Task:attempt_local275273933_0001_r_000000_0 is done. And is in the process of commiting14/04/26 02:56:25 INFO mapred.LocalJobRunner:14/04/26 02:56:25 INFO mapred.Task: Task attempt_local275273933_0001_r_000000_0 is allowed to commit now14/04/26 02:56:25 INFO output.FileOutputCommitter: Saved output of task 'attempt_local275273933_0001_r_000000_0' to output14/04/26 02:56:25 INFO mapred.LocalJobRunner: reduce > reduce14/04/26 02:56:25 INFO mapred.Task: Task 'attempt_local275273933_0001_r_000000_0' done.14/04/26 02:56:25 INFO mapred.JobClient:  map 100% reduce 100%14/04/26 02:56:25 INFO mapred.JobClient: Job complete: job_local275273933_000114/04/26 02:56:25 INFO mapred.JobClient: Counters: 2014/04/26 02:56:25 INFO mapred.JobClient:   File Output Format Counters14/04/26 02:56:25 INFO mapred.JobClient:     Bytes Written=3714/04/26 02:56:25 INFO mapred.JobClient:   FileSystemCounters14/04/26 02:56:25 INFO mapred.JobClient:     FILE_BYTES_READ=42952614/04/26 02:56:25 INFO mapred.JobClient:     FILE_BYTES_WRITTEN=58646314/04/26 02:56:25 INFO mapred.JobClient:   File Input Format Counters14/04/26 02:56:25 INFO mapred.JobClient:     Bytes Read=2514/04/26 02:56:25 INFO mapred.JobClient:   Map-Reduce Framework14/04/26 02:56:25 INFO mapred.JobClient:     Reduce input groups=314/04/26 02:56:25 INFO mapred.JobClient:     Map output materialized bytes=6114/04/26 02:56:25 INFO mapred.JobClient:     Combine output records=414/04/26 02:56:25 INFO mapred.JobClient:     Map input records=214/04/26 02:56:25 INFO mapred.JobClient:     Reduce shuffle bytes=014/04/26 02:56:25 INFO mapred.JobClient:     Physical memory (bytes) snapshot=014/04/26 02:56:25 INFO mapred.JobClient:     Reduce output records=314/04/26 02:56:25 INFO mapred.JobClient:     Spilled Records=814/04/26 02:56:25 INFO mapred.JobClient:     Map output bytes=4114/04/26 02:56:25 INFO mapred.JobClient:     CPU time spent (ms)=014/04/26 02:56:25 INFO mapred.JobClient:     Total committed heap usage (bytes)=48091545614/04/26 02:56:25 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=014/04/26 02:56:25 INFO mapred.JobClient:     Combine input records=414/04/26 02:56:25 INFO mapred.JobClient:     Map output records=414/04/26 02:56:25 INFO mapred.JobClient:     SPLIT_RAW_BYTES=19714/04/26 02:56:25 INFO mapred.JobClient:     Reduce input records=

结果

[root@linux hadoop]# cat output/*hadoop  1hello   2world   1

?运行自己编写的Wordcount

package net.annhe.wordcount;import java.io.IOException;import java.util.*;import org.apache.hadoop.fs.Path;import org.apache.hadoop.conf.*;import org.apache.hadoop.io.*;import org.apache.hadoop.mapreduce.*;import org.apache.hadoop.mapreduce.lib.input.*;import org.apache.hadoop.mapreduce.lib.output.*;import org.apache.hadoop.util.*;public class WordCount extends Configured implements Tool {	public static class Map extends Mapper {		private final static IntWritable one = new IntWritable(1);		private Text word = new Text();		public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {			String line = value.toString();			StringTokenizer tokenizer = new StringTokenizer(line);			while (tokenizer.hasMoreTokens()) {				word.set(tokenizer.nextToken());				context.write(word,one);			}		}	}	public static class Reduce extends Reducer {		public void reduce (Text key, Iterable values, Context context) throws IOException, InterruptedException {			int sum=0;			for(IntWritable val : values) {				sum += val.get();			}			context.write(key, new IntWritable(sum));		}	}	public int run(String[] args) throws Exception {		Job job = new Job(getConf());		job.setJarByClass(WordCount.class);		job.setJobName("wordcount");		job.setOutputKeyClass(Text.class);		job.setOutputValueClass(IntWritable.class);		job.setMapperClass(Map.class);		job.setReducerClass(Reduce.class);		job.setInputFormatClass(TextInputFormat.class);		job.setOutputFormatClass(TextOutputFormat.class);		FileInputFormat.setInputPaths(job, new Path(args[0]));		FileOutputFormat.setOutputPath(job, new Path(args[1]));		boolean success = job.waitForCompletion(true);		return success ? 0 : 1;	}	public static void main(String[] args) throws Exception {		int ret = ToolRunner.run(new WordCount(),args);		System.exit(ret);	}}

编译

javac -classpath /usr/share/hadoop/hadoop-core-1.2.1.jar -d . WordCount.java

打包

jar -vcf wordcount.jar -C demo/ .

运行

hadoop jar wordcount.jar net.annhe.wordcount.WordCount input/ out

结果

[root@linux hadoop]# cat out/*hadoop  1hello   2world   1

?遇到的问题

1. 内存不足

分给虚拟机的内存才180M,运行实例程序时报错:

java.lang.Exception: java.lang.OutOfMemoryError: Java heap space

解决方案:
增加虚拟机内存,并编辑/etc/hadoop/hadoop-env.sh,修改:

export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS" #改成512m

原来启动JVM时配置的最大内存是128m,当运行hadoop的一些自带的实例会报内存溢出,其实这里是可以修改内存大小
如果不需要也不必修改。[2]

?2. 带有包名的类的引用

带有包名的类要按照包层次调用类。如上面的 net.annhe.wordcount.WordCount [3]

3. 带有包名的类的编译

需要打包编译,加-d选项。

java的类文件是应该放入包中的,如package abc;
public class ls {…} 那么这个abc就是就是类ls的包,那么编译的时候就应该创建相应的abc包,具体就是用javac的一个参数,就是这个-d来生成这个类文件的包,例如上面的类在编译时应该写javac -d . ls.java注意javac和-d,-d和后面的.,.和后面的ls.java中间都有空格[4]

参考资料

[1]. 陆嘉桓. Hadoop实战. 第二版. 机械工业出版社

[2]. OSchina博客:http://my.oschina.net/mynote/blog/93340

[3]. CSDN博客:http://blog.gaodaima.com/xw13106209/article/details/6861855

[4]. 百度知道:http://zhidao.baidu.com/link?url=ND1BWmyGb_5a05Jntd9vGZNWGtmJmcKF1V6dhVNM1eFNuHL6kbQyVrEWtCUmy7KYP5F66R2BumCifCnPQnYdD_


本文遵从CC版权协定,转载请以链接形式注明出处。
本文链接地址: http://www.annhe.net/article-2672.html


搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Hadoop自动化安装及单节点方式运行

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址