最近要做一次关于yarn的分享,于是想搭建一个Hadoop环境。Hadoop 2.0较之前的Hadoop 0.1x变化比较大,折腾了好久了,终于把环境搞好了。我搭建了一个两节点的集群,只配置了一些必须的参数,让集群勉强跑起来。 1、core-site.xml configurationpropertynamef
最近要做一次关于yarn的分享,于是想搭建一个Hadoop环境。Hadoop 2.0较之前的Hadoop 0.1x变化比较大,折腾了好久了,终于把环境搞好了。我搭建了一个两节点的集群,只配置了一些必须的参数,让集群勉强跑起来。
1、core-site.xml
fs.defaultFShdfs://10.232.42.91:19000/
2、mapred-site.xml
mapreduce.framework.nameyarn
3、yarn-site.xml
yarn.resourcemanager.addresshdfs://10.232.42.91:19001/yarn.resourcemanager.resource-tracker.addresshdfs://10.232.42.91:19002/yarn.nodemanager.aux-servicesmapreduce.shuffleyarn.resourcemanager.scheduler.address10.232.42.91:8030
把JAVA_HOME、HADOOP_HOME都设置到.bashrc里面去,然后运行sbin/start-all.sh。使用jps可以看到两个节点下运行的进程如下。
[master] jps31318 ResourceManager28981 DataNode11580 JobHistoryServer28858 NameNode29155 SecondaryNameNode31426 NodeManager11016 Jps[slave] jps12592 NodeManager11711 DataNode17699 Jps
上面这个JobHistoryServer需要单独启动,通过它可以看到每个application的详细日志。启动命令如下。
sbin/mr-jobhistory-daemon.sh start historyserver
打开http://10.232.42.91:8088/cluster/cluster这个地址可以看到cluster的介绍信息。这里再也看不到slot相关的数据了。
万事俱备。放点文本数据到hdfs://10.232.42.91:19000/input这个目录下,运行wordcount看看效果。
$ cd hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.0.3-alpha.jar wordcount hdfs://10.232.42.91:19000/input hdfs://10.232.42.91:19000/output13/03/07 21:08:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.13/03/07 21:08:26 INFO input.FileInputFormat: Total input paths to process : 313/03/07 21:08:26 INFO mapreduce.JobSubmitter: number of splits:313/03/07 21:08:26 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar13/03/07 21:08:26 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.v<span>本文来源gaodai#ma#com搞*!代#%^码网5</span>alue.class13/03/07 21:08:26 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class13/03/07 21:08:26 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class13/03/07 21:08:26 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name13/03/07 21:08:26 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class13/03/07 21:08:26 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir13/03/07 21:08:26 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir13/03/07 21:08:26 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps13/03/07 21:08:26 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class13/03/07 21:08:26 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir13/03/07 21:08:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1362658309553_001913/03/07 21:08:26 INFO client.YarnClientImpl: Submitted application application_1362658309553_0019 to ResourceManager at /10.232.42.91:1900113/03/07 21:08:26 INFO mapreduce.Job: The url to track the job: http://search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0019/13/03/07 21:08:26 INFO mapreduce.Job: Running job: job_1362658309553_001913/03/07 21:08:33 INFO mapreduce.Job: Job job_1362658309553_0019 running in uber mode : false13/03/07 21:08:33 INFO mapreduce.Job: map 0% reduce 0%13/03/07 21:08:39 INFO mapreduce.Job: map 100% reduce 0%13/03/07 21:08:44 INFO mapreduce.Job: map 100% reduce 100%13/03/07 21:08:44 INFO mapreduce.Job: Job job_1362658309553_0019 completed successfully13/03/07 21:08:44 INFO mapreduce.Job: Counters: 43 File System Counters FILE: Number of bytes read=12698 FILE: Number of bytes written=312593 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=16947 HDFS: Number of bytes written=8739 HDFS: Number of read operations=12 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=3 Launched reduce tasks=1 Rack-local map tasks=3 Total time spent by all maps in occupied slots (ms)=10750 Total time spent by all reduces in occupied slots (ms)=4221 Map-Reduce Framework Map input records=317 Map output records=2324 Map output bytes=24586 Map output materialized bytes=12710 Input split bytes=316 Combine input records=2324 Combine output records=885 Reduce input groups=828 Reduce shuffle bytes=12710 Reduce input records=885 Reduce output records=828 Spilled Records=1770 Shuffled Maps =3 Failed Shuffles=0 Merged Map outputs=3 GC time elapsed (ms)=376 CPU time spent (ms)=4480 Physical memory (bytes) snapshot=557428736 Virtual memory (bytes) snapshot=2105122816 Total committed heap usage (bytes)=254607360 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=16631 File Output Format Counters Bytes Written=8739
接下来玩玩yarn吧。Hadoop官方文档那篇WritingYarnApplications太让人蛋碎了,好在我领悟到distributedshell就是使用yarn编写的。要研究yarn的话,直接去Hadoop source里面找相应的代码研究即可。
$ hadoop jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar --jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --shell_command uname --shell_args '-a'13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.13/03/07 21:42:44 INFO distributedshell.Client: Initializing Client13/03/07 21:42:44 INFO distributedshell.Client: Running Client13/03/07 21:42:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.13/03/07 21:42:44 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=213/03/07 21:42:44 INFO distributedshell.Client: Got Cluster node info from ASM13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search042091.sqa.cm4:39557, nodeAddresssearch042091.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663711950, 13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search041134.sqa.cm4:49313, nodeAddresssearch041134.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663712038, 13/03/07 21:42:44 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=17, queueChildQueueCount=013/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE13/03/07 21:42:44 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 102413/03/07 21:42:44 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 819213/03/07 21:42:44 INFO distributedshell.Client: AM memory specified below min threshold of cluster. Using min value., specified=10, min=102413/03/07 21:42:44 INFO distributedshell.Client: Setting up application submission context for ASM13/03/07 21:42:44 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment13/03/07 21:42:45 INFO distributedshell.Client: Set the environment for the application master13/03/07 21:42:45 INFO distributedshell.Client: Setting up app master command13/03/07 21:42:45 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx1024m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 1 --priority 0 --shell_command uname --shell_args -a --debug 1>/AppMaster.stdout 2>/AppMaster.stderr 13/03/07 21:42:45 INFO distributedshell.Client: Submitting application to ASM13/03/07 21:42:45 INFO client.YarnClientImpl: Submitted application application_1362658309553_0020 to ResourceManager at /10.232.42.91:1900113/03/07 21:42:46 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:47 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:48 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:49 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:50 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:51 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:52 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:52 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop13/03/07 21:42:52 INFO distributedshell.Client: Application completed successfully
运行完成之后,找不到输出在哪儿,费了好大的劲,终于在hadoop/logs/userlogs下面找到输出了。不知道为何运行了两个container。
$ tree hadoop/logs/userlogs/application_1362658309553_0018application_1362658309553_0018|-- container_1362658309553_0018_01_000001| |-- AppMaster.stderr| `-- AppMaster.stdout`-- container_1362658309553_0018_01_000002 |-- stderr `-- stdout$ cat hadoop/logs/userlogs/application_1362658309553_0018/container_1362658309553_0018_01_000002/stdoutLinux search042091.sqa.cm4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux
好,开始用yarn调度一个程序。我写了一个脚本,里面启动了服务器。
$ cat ~/start_sp.sh #!/bin/env bashsource /home/admin/.bashrc/home/admin/sp/bin/sap_server -c /home/admin/sp/sp_worker/etc/sap_server_app.cfg -l /home/admin/sp/sp_worker/etc/sap_server_log.cfg -k restart
启动起来之后,进程关系图如下。
接着我把脚本直接kill掉,期待yarn给我重启脚本。发现application运行结束了,AppMaster.stderr日志里面有如下内容。
13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=113/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1362747551045_0017_01_000002, state=COMPLETE, exitStatus=137, diagnostics=Killed by external signal13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Current application state: loop=464, appDone=true, total=1, requested=1, completed=1, failed=1, currentAllocated=113/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM13/03/08 21:40:02 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.AMRMClientImpl is stopped.13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application Master failed. exiting
原文地址:Hadoop 2.0配置, 感谢原作者分享。