• 欢迎访问搞代码网站,推荐使用最新版火狐浏览器和Chrome浏览器访问本网站!
  • 如果您觉得本站非常有看点,那么赶紧使用Ctrl+D 收藏搞代码吧

Hadoop 2.0配置

mysql 搞代码 4年前 (2022-01-09) 22次浏览 已收录 0个评论

最近要做一次关于yarn的分享,于是想搭建一个Hadoop环境。Hadoop 2.0较之前的Hadoop 0.1x变化比较大,折腾了好久了,终于把环境搞好了。我搭建了一个两节点的集群,只配置了一些必须的参数,让集群勉强跑起来。 1、core-site.xml configurationpropertynamef

最近要做一次关于yarn的分享,于是想搭建一个Hadoop环境。Hadoop 2.0较之前的Hadoop 0.1x变化比较大,折腾了好久了,终于把环境搞好了。我搭建了一个两节点的集群,只配置了一些必须的参数,让集群勉强跑起来。

1、core-site.xml

fs.defaultFShdfs://10.232.42.91:19000/

2、mapred-site.xml

mapreduce.framework.nameyarn

3、yarn-site.xml

yarn.resourcemanager.addresshdfs://10.232.42.91:19001/yarn.resourcemanager.resource-tracker.addresshdfs://10.232.42.91:19002/yarn.nodemanager.aux-servicesmapreduce.shuffleyarn.resourcemanager.scheduler.address10.232.42.91:8030

把JAVA_HOME、HADOOP_HOME都设置到.bashrc里面去,然后运行sbin/start-all.sh。使用jps可以看到两个节点下运行的进程如下。

[master] jps31318 ResourceManager28981 DataNode11580 JobHistoryServer28858 NameNode29155 SecondaryNameNode31426 NodeManager11016 Jps[slave] jps12592 NodeManager11711 DataNode17699 Jps

上面这个JobHistoryServer需要单独启动,通过它可以看到每个application的详细日志。启动命令如下。

sbin/mr-jobhistory-daemon.sh start historyserver

打开http://10.232.42.91:8088/cluster/cluster这个地址可以看到cluster的介绍信息。这里再也看不到slot相关的数据了。

万事俱备。放点文本数据到hdfs://10.232.42.91:19000/input这个目录下,运行wordcount看看效果。

$ cd hadoop/share/hadoop/mapreduce$ hadoop jar hadoop-mapreduce-examples-2.0.3-alpha.jar wordcount hdfs://10.232.42.91:19000/input hdfs://10.232.42.91:19000/output13/03/07 21:08:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.13/03/07 21:08:26 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.13/03/07 21:08:26 INFO input.FileInputFormat: Total input paths to process : 313/03/07 21:08:26 INFO mapreduce.JobSubmitter: number of splits:313/03/07 21:08:26 WARN conf.Configuration: mapred.jar is deprecated. Instead, use mapreduce.job.jar13/03/07 21:08:26 WARN conf.Configuration: mapred.output.value.class is deprecated. Instead, use mapreduce.job.output.v<span>本文来源gaodai#ma#com搞*!代#%^码网5</span>alue.class13/03/07 21:08:26 WARN conf.Configuration: mapreduce.combine.class is deprecated. Instead, use mapreduce.job.combine.class13/03/07 21:08:26 WARN conf.Configuration: mapreduce.map.class is deprecated. Instead, use mapreduce.job.map.class13/03/07 21:08:26 WARN conf.Configuration: mapred.job.name is deprecated. Instead, use mapreduce.job.name13/03/07 21:08:26 WARN conf.Configuration: mapreduce.reduce.class is deprecated. Instead, use mapreduce.job.reduce.class13/03/07 21:08:26 WARN conf.Configuration: mapred.input.dir is deprecated. Instead, use mapreduce.input.fileinputformat.inputdir13/03/07 21:08:26 WARN conf.Configuration: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir13/03/07 21:08:26 WARN conf.Configuration: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps13/03/07 21:08:26 WARN conf.Configuration: mapred.output.key.class is deprecated. Instead, use mapreduce.job.output.key.class13/03/07 21:08:26 WARN conf.Configuration: mapred.working.dir is deprecated. Instead, use mapreduce.job.working.dir13/03/07 21:08:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1362658309553_001913/03/07 21:08:26 INFO client.YarnClientImpl: Submitted application application_1362658309553_0019 to ResourceManager at /10.232.42.91:1900113/03/07 21:08:26 INFO mapreduce.Job: The url to track the job: http://search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0019/13/03/07 21:08:26 INFO mapreduce.Job: Running job: job_1362658309553_001913/03/07 21:08:33 INFO mapreduce.Job: Job job_1362658309553_0019 running in uber mode : false13/03/07 21:08:33 INFO mapreduce.Job:  map 0% reduce 0%13/03/07 21:08:39 INFO mapreduce.Job:  map 100% reduce 0%13/03/07 21:08:44 INFO mapreduce.Job:  map 100% reduce 100%13/03/07 21:08:44 INFO mapreduce.Job: Job job_1362658309553_0019 completed successfully13/03/07 21:08:44 INFO mapreduce.Job: Counters: 43	File System Counters		FILE: Number of bytes read=12698		FILE: Number of bytes written=312593		FILE: Number of read operations=0		FILE: Number of large read operations=0		FILE: Number of write operations=0		HDFS: Number of bytes read=16947		HDFS: Number of bytes written=8739		HDFS: Number of read operations=12		HDFS: Number of large read operations=0		HDFS: Number of write operations=2	Job Counters 		Launched map tasks=3		Launched reduce tasks=1		Rack-local map tasks=3		Total time spent by all maps in occupied slots (ms)=10750		Total time spent by all reduces in occupied slots (ms)=4221	Map-Reduce Framework		Map input records=317		Map output records=2324		Map output bytes=24586		Map output materialized bytes=12710		Input split bytes=316		Combine input records=2324		Combine output records=885		Reduce input groups=828		Reduce shuffle bytes=12710		Reduce input records=885		Reduce output records=828		Spilled Records=1770		Shuffled Maps =3		Failed Shuffles=0		Merged Map outputs=3		GC time elapsed (ms)=376		CPU time spent (ms)=4480		Physical memory (bytes) snapshot=557428736		Virtual memory (bytes) snapshot=2105122816		Total committed heap usage (bytes)=254607360	Shuffle Errors		BAD_ID=0		CONNECTION=0		IO_ERROR=0		WRONG_LENGTH=0		WRONG_MAP=0		WRONG_REDUCE=0	File Input Format Counters 		Bytes Read=16631	File Output Format Counters 		Bytes Written=8739

接下来玩玩yarn吧。Hadoop官方文档那篇WritingYarnApplications太让人蛋碎了,好在我领悟到distributedshell就是使用yarn编写的。要研究yarn的话,直接去Hadoop source里面找相应的代码研究即可。

$ hadoop jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar --jar hadoop-yarn-applications-distributedshell-2.0.3-alpha.jar org.apache.hadoop.yarn.applications.distributedshell.Client --shell_command uname --shell_args '-a'13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is inited.13/03/07 21:42:44 INFO distributedshell.Client: Initializing Client13/03/07 21:42:44 INFO distributedshell.Client: Running Client13/03/07 21:42:44 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable13/03/07 21:42:44 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.YarnClientImpl is started.13/03/07 21:42:44 INFO distributedshell.Client: Got Cluster metric info from ASM, numNodeManagers=213/03/07 21:42:44 INFO distributedshell.Client: Got Cluster node info from ASM13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search042091.sqa.cm4:39557, nodeAddresssearch042091.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663711950, 13/03/07 21:42:44 INFO distributedshell.Client: Got node report from ASM for, nodeId=search041134.sqa.cm4:49313, nodeAddresssearch041134.sqa.cm4:8042, nodeRackName/default-rack, nodeNumContainers0, nodeHealthStatusis_node_healthy: true, health_report: "", last_health_report_time: 1362663712038, 13/03/07 21:42:44 INFO distributedshell.Client: Queue info, queueName=default, queueCurrentCapacity=0.0, queueMaxCapacity=1.0, queueApplicationCount=17, queueChildQueueCount=013/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=SUBMIT_APPLICATIONS13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=root, userAcl=ADMINISTER_QUEUE13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=SUBMIT_APPLICATIONS13/03/07 21:42:44 INFO distributedshell.Client: User ACL Info for Queue, queueName=default, userAcl=ADMINISTER_QUEUE13/03/07 21:42:44 INFO distributedshell.Client: Min mem capabililty of resources in this cluster 102413/03/07 21:42:44 INFO distributedshell.Client: Max mem capabililty of resources in this cluster 819213/03/07 21:42:44 INFO distributedshell.Client: AM memory specified below min threshold of cluster. Using min value., specified=10, min=102413/03/07 21:42:44 INFO distributedshell.Client: Setting up application submission context for ASM13/03/07 21:42:44 INFO distributedshell.Client: Copy App Master jar from local filesystem and add to local environment13/03/07 21:42:45 INFO distributedshell.Client: Set the environment for the application master13/03/07 21:42:45 INFO distributedshell.Client: Setting up app master command13/03/07 21:42:45 INFO distributedshell.Client: Completed setting up app master command ${JAVA_HOME}/bin/java -Xmx1024m org.apache.hadoop.yarn.applications.distributedshell.ApplicationMaster --container_memory 10 --num_containers 1 --priority 0 --shell_command uname --shell_args -a --debug 1>/AppMaster.stdout 2>/AppMaster.stderr 13/03/07 21:42:45 INFO distributedshell.Client: Submitting application to ASM13/03/07 21:42:45 INFO client.YarnClientImpl: Submitted application application_1362658309553_0020 to ResourceManager at /10.232.42.91:1900113/03/07 21:42:46 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:47 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=N/A, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=ACCEPTED, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:48 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:49 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:50 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:51 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=RUNNING, distributedFinalState=UNDEFINED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:52 INFO distributedshell.Client: Got application report from ASM for, appId=20, clientToken=null, appDiagnostics=, appMasterHost=, appQueue=default, appMasterRpcPort=0, appStartTime=1362663765373, yarnAppState=FINISHED, distributedFinalState=SUCCEEDED, appTrackingUrl=search042091.sqa.cm4.tbsite.net:8088/proxy/application_1362658309553_0020/, appUser=henshao13/03/07 21:42:52 INFO distributedshell.Client: Application has completed successfully. Breaking monitoring loop13/03/07 21:42:52 INFO distributedshell.Client: Application completed successfully

运行完成之后,找不到输出在哪儿,费了好大的劲,终于在hadoop/logs/userlogs下面找到输出了。不知道为何运行了两个container。

$ tree hadoop/logs/userlogs/application_1362658309553_0018application_1362658309553_0018|-- container_1362658309553_0018_01_000001|   |-- AppMaster.stderr|   `-- AppMaster.stdout`-- container_1362658309553_0018_01_000002    |-- stderr    `-- stdout$ cat hadoop/logs/userlogs/application_1362658309553_0018/container_1362658309553_0018_01_000002/stdoutLinux search042091.sqa.cm4 2.6.18-164.el5 #1 SMP Tue Aug 18 15:51:48 EDT 2009 x86_64 x86_64 x86_64 GNU/Linux

好,开始用yarn调度一个程序。我写了一个脚本,里面启动了服务器。

$ cat ~/start_sp.sh #!/bin/env bashsource /home/admin/.bashrc/home/admin/sp/bin/sap_server -c /home/admin/sp/sp_worker/etc/sap_server_app.cfg -l /home/admin/sp/sp_worker/etc/sap_server_log.cfg -k restart

启动起来之后,进程关系图如下。

接着我把脚本直接kill掉,期待yarn给我重启脚本。发现application运行结束了,AppMaster.stderr日志里面有如下内容。

13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got response from RM for container ask, completedCnt=113/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Got container status for containerID=container_1362747551045_0017_01_000002, state=COMPLETE, exitStatus=137, diagnostics=Killed by external signal13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Current application state: loop=464, appDone=true, total=1, requested=1, completed=1, failed=1, currentAllocated=113/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application completed. Signalling finish to RM13/03/08 21:40:02 INFO service.AbstractService: Service:org.apache.hadoop.yarn.client.AMRMClientImpl is stopped.13/03/08 21:40:02 INFO distributedshell.ApplicationMaster: Application Master failed. exiting

搞代码网(gaodaima.com)提供的所有资源部分来自互联网,如果有侵犯您的版权或其他权益,请说明详细缘由并提供版权或权益证明然后发送到邮箱[email protected],我们会在看到邮件的第一时间内为您处理,或直接联系QQ:872152909。本网站采用BY-NC-SA协议进行授权
转载请注明原文链接:Hadoop 2.0配置

喜欢 (0)
[搞代码]
分享 (0)
发表我的评论
取消评论

表情 贴图 加粗 删除线 居中 斜体 签到

Hi,您需要填写昵称和邮箱!

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址