hadoop3.2.2安装「建议收藏」

hadoop3.2.2安装「建议收藏」参考:https://zhuanlan.zhihu.com/p/1163949451准备机器集群规划 ip hostanem 配置 服务 192.168.109.151 node1 2c8g master 192.168.109.152 node2 2c4g 192.168.109.153 node3 2c4g 2给普通用户liucfsudo权限三…

大家好,欢迎来到IT知识分享网。

 

目录

1 准备机器

2 给普通用户liucf sudo 权限

3 固定集群中每个节点的IP、机器名、IP与机器名的映射

4 安装jdk

4.1 卸载系统自带的OpenJDK以及相关的java文件  

4.2 下载解压jdk1.8到linux机器

4.3 配置JAVA_HOME

5 配置SSH无密登录

6 关闭防火墙和selinux

6.1 关闭防火墙

6.2  禁用selinux

7 时钟同步

8 安装并配置Hadoop

8.1 现在单台机器上配置hadoop

8.2 配置hdfs

8.2.1 core-site.xml

8.2.2 hdfs-site.xml

8.2.3 hadoop-env.sh

8.2.4 work文件里配置datanode

8.2.5 分发文件到node2,node3

8.2.6 格式化hdfs

8.2.7 启动namenode,datanode

8.2.8验证

8.3 配置yarn

8.3.1 yarn-site.xml

8.3.2  mapred-site.xml

8.3.3 yarn-env.sh和mapred-env.sh

8.3.4 分发文件到node2,node3

8.3.5 启动yarn

8.3.6 验证

9  配置 JobHistoryServer

9.1 yarn-site.xml 增加日志聚合配置

9.2 mapred-site.xml 增加日志聚合配置

9.3 分发配置到node2,node3

9.4 启动JobHistoryServer 

10 测试

10.1 测试上传文件到hdfs 

10.2 测试运行mapreduce 程序在yarn上

10.3 测试jobhistoryserver


1 准备机器

集群规划
ip hostanem 配置 服务
192.168.109.151 node1 2c8g master
192.168.109.152 node2 2c4g  
192.168.109.153 node3 2c4g  
       

 

2 给普通用户liucf sudo 权限

三台机器都要

对于linux用户来讲,普通用户的权限是有一定限制的,所以在有些操作的时候,是需要sudo权限的,那么如何在linux下赋予普通用户sudo权限呢?此处将讲解一下方法。

在login我们的系统后,如果是普通用户,我们可以使用su – root来切换到root用户下进行操作,在root用户下,命令行输入 visudo回车即可进入/etc/sudoers下,找到

## Allow root to run any commands anywhere 
root    ALL=(ALL)       ALL

在这个下面添加如下即可

liucf   ALL=(ALL)       ALL

然后保存退出,再次切换到普通用户下,执行sudo命令就可以看到相应的信息了

3 固定集群中每个节点的IP、机器名、IP与机器名的映射

3台都要

hadoop3.2.2安装「建议收藏」

4 安装jdk

3台都要

4.1 卸载系统自带的OpenJDK以及相关的java文件  

① 查看系统是否自带JDK 

[liucf@node1 ~]$ java -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)
[liucf@node1 ~]$ rpm -qa | grep java
java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2020a-1.el7.noarch
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.262.b10-1.el7.x86_64
[liucf@node1 ~]$ 

② 删除自带jdk

然后将上一步中带openjdk的文件全部删除,具体根据你安装的版本不同,文件也不尽相同,这里以上一步中的文件为例。

[liucf@node2 ~]$ sudo rpm -e --nodeps java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64
[liucf@node2 ~]$ sudo rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.262.b10-1.el7.x86_64
[liucf@node2 ~]$ java -version
-bash: /usr/bin/java: No such file or directory
[liucf@node2 ~]$ 

4.2 下载解压jdk1.8到linux机器

[liucf@node1 soft]$ tar -zxvf jdk-8u121-linux-x64.tar.gz -C /home/liucf/soft

4.3 配置JAVA_HOME

[liucf@node1 jdk1.8.0_121]$ sudo vim /etc/profile

 hadoop3.2.2安装「建议收藏」

[liucf@node1 jdk1.8.0_121]$ source /etc/profile
[liucf@node1 jdk1.8.0_121]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[liucf@node1 jdk1.8.0_121]$ 

5 配置SSH无密登录

CentOS7 默认使用RSA加密算法生成密钥对,保存在~/.ssh目录下的id_rsa(私钥)和id_rsa.pub(公钥)。也可以使用“-t DSA”参数指定为DSA算法,对应文件为id_dsa和id_dsa.pub,密钥对生成过程会提示输入私钥加密密码,可以直接回车不使用密码保护。

分发秘钥到各个机器,注意本机也分发一份

[liucf@node2 ~]$ ssh-copy-id -i node1
[liucf@node2 ~]$ ssh-copy-id -i node2
[liucf@node2 ~]$ ssh-copy-id -i node3

6 关闭防火墙和selinux

6.1 关闭防火墙

sudo systemctl stop firewalld
sudo systemctl disable firewalld

6.2  禁用selinux

sudo vim /etc/selinux/config
SELINUX=enforcing --> SELINUX=disabled

7 时钟同步

参照:https://blog.csdn.net/m0_37813354/article/details/105118147

里的 3.4 配置ntp

8 安装并配置Hadoop

下载:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz

8.1 现在单台机器上配置hadoop

[liucf@node1 softfile]$ tar -zxvf hadoop-3.2.2.tar.gz -C /home/liucf/soft

8.2 配置hdfs

8.2.1 core-site.xml

<configuration>
	<!-- 配置NameNode运行的主机 -->
	<property>
		<name>fs.defaultFS</name>
		<value>hdfs://node1:8020/</value>
	</property>
	<!--配置hadoop文件存储数据的目录-->
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/home/liucf/data/dfs/tmp</value>
	</property>
</configuration>

8.2.2 hdfs-site.xml

<configuration>
	<!--设置集群中有二个副本-->
	<property>
		<name>dfs.replication</name>
		<value>2</value>
	</property>
	<!--关闭用hdfs权限检查-->
	<property>
		<name>dfs.permissions.enabled</name>
		<value>false</value>
	</property>	
</configuration>

8.2.3 hadoop-env.sh

export JAVA_HOME=/home/liucf/soft/jdk1.8.0_121

8.2.4 work文件里配置datanode

hadoop3.2.2安装「建议收藏」

8.2.5 分发文件到node2,node3

[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node2:/home/liucf/soft
[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node3:/home/liucf/soft

8.2.6 格式化hdfs

[liucf@node1 soft]$ hdfs namenode  -format

8.2.7 启动namenode,datanode

[liucf@node1 soft]$ /home/liucf/soft/hadoop-3.2.2/sbin/hadoop-daemon.sh start namenode
[liucf@node2 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/hadoop-daemon.sh start datanode
[liucf@node3 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/hadoop-daemon.sh start datanode

8.2.8验证

① jps 命令各个机器上查看进程

② web ui 查看

查看端口

[liucf@node1 soft]$ hdfs getconf -confKey dfs.namenode.http-address
0.0.0.0:9870

http://192.168.109.151:9870/

8.3 配置yarn

8.3.1 yarn-site.xml

<configuration>
	<!--Resourcemanager主机入口配置 -->
	<property>
		<name>yarn.resourcemanager.hostname</name>
		<value>node1</value>
	</property>
	<!--mapreduce 启用 shuffle-->
	<property>
		<name>yarn.nodemanager.aux-services</name>
		<value>mapreduce_shuffle</value>
	</property>
	<property>
		<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
		<value>org.apache.hadoop.mapred.ShuffleHandler</value>
	</property>


</configuration>

8.3.2  mapred-site.xml

<configuration>

	<property>
		<name>mapreduce.framework.name</name>
		<value>yarn</value>
	</property>
	<property>
		<name>yarn.app.mapreduce.am.env</name>
		<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
	</property>
	<property>
		<name>mapreduce.map.env</name>
		<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
	</property>
	<property>
		<name>mapreduce.reduce.env</name>
		<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
	</property>

</configuration>

 如果不配置yarn.app.mapreduce.am.env,mapreduce.map.env,mapreduce.reduce.env 会报错如下。

[2021-05-30 11:22:28.097]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster

Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:

8.3.3 yarn-env.sh和mapred-env.sh

两个文件里都配置

export JAVA_HOME=/home/liucf/soft/jdk1.8.0_121

8.3.4 分发文件到node2,node3

[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node2:/home/liucf/soft
[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node3:/home/liucf/soft

8.3.5 启动yarn

[liucf@node1 soft]$ /home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start resourcemanager
[liucf@node2 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start nodemanager
[liucf@node3 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start nodemanager

8.3.6 验证

jps 看到进程都在

ResourceManager

NodeManager

hadoop3.2.2安装「建议收藏」

完成

9  配置 JobHistoryServer

9.1 yarn-site.xml 增加日志聚合配置

<!-- 开启日志聚合 -->
	<property>
		<name>yarn.log-aggregation-enable</name>
		<value>true</value>
	</property>
	<!-- yarn.log.server.url -->
	<property>
         <name>yarn.log.server.url</name>
         <value>http://node1:19888/jobhistory/logs</value>
	</property>

 hadoop3.2.2安装「建议收藏」

9.2 mapred-site.xml 增加日志聚合配置

<!-- 设置jobhistoryserver 没有配置的话 history入口不可用 -->
	 <property>
		<name>mapreduce.jobhistory.address</name>
		<value>node1:10020</value>
	</property>
	<!-- 配置web端口 -->
	<property>
		<name>mapreduce.jobhistory.webapp.address</name>
		<value>node1:19888</value>
	</property>
	<!-- 配置正在运行中的日志在hdfs上的存放路径 -->
	<property>
		<name>mapreduce.jobhistory.intermediate-done-dir</name>
		<value>/jobhistory/done_intermediate</value>
	</property>

	<!-- 配置运行过的日志存放在hdfs上的存放路径 -->
	<property>
		<name>mapreduce.jobhistory.done-dir</name>
		<value>/jobhistory/done</value>
	</property>

hadoop3.2.2安装「建议收藏」

9.3 分发配置到node2,node3

 

scp mapred-site.xml liucf@node2:/home/liucf/soft/hadoop-3.2.2/etc/hadoop
scp mapred-site.xml liucf@node3:/home/liucf/soft/hadoop-3.2.2/etc/hadoop

scp yarn-site.xml liucf@node2:/home/liucf/soft/hadoop-3.2.2/etc/hadoop
scp yarn-site.xml liucf@node3:/home/liucf/soft/hadoop-3.2.2/etc/hadoop

9.4 启动JobHistoryServer 

/home/liucf/soft/hadoop-3.2.2/sbin/mr-jobhistory-daemon.sh start historyserver

hadoop3.2.2安装「建议收藏」

 

10 测试

10.1 测试上传文件到hdfs 

hadoop fs -cat /data/input/wc.txt

hadoop3.2.2安装「建议收藏」

10.2 测试运行mapreduce 程序在yarn上

利用Hadoop自带example实现wordCount

hadoop jar  /home/liucf/soft/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /data/input /data/output/wc

结果

[liucf@node1 myShell]$ hadoop jar  /home/liucf/soft/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /data/input /data/output/wc
2021-05-30 13:08:56,391 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.109.151:8032
2021-05-30 13:08:56,832 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/liucf/.staging/job_1622350103597_0003
2021-05-30 13:08:57,009 INFO input.FileInputFormat: Total input files to process : 1
2021-05-30 13:08:57,090 INFO mapreduce.JobSubmitter: number of splits:1
2021-05-30 13:08:57,214 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1622350103597_0003
2021-05-30 13:08:57,215 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-05-30 13:08:57,353 INFO conf.Configuration: resource-types.xml not found
2021-05-30 13:08:57,354 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-05-30 13:08:57,419 INFO impl.YarnClientImpl: Submitted application application_1622350103597_0003
2021-05-30 13:08:57,456 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1622350103597_0003/
2021-05-30 13:08:57,456 INFO mapreduce.Job: Running job: job_1622350103597_0003
2021-05-30 13:09:02,528 INFO mapreduce.Job: Job job_1622350103597_0003 running in uber mode : false
2021-05-30 13:09:02,529 INFO mapreduce.Job:  map 0% reduce 0%
2021-05-30 13:09:06,626 INFO mapreduce.Job:  map 100% reduce 0%
2021-05-30 13:09:11,672 INFO mapreduce.Job:  map 100% reduce 100%
2021-05-30 13:09:11,684 INFO mapreduce.Job: Job job_1622350103597_0003 completed successfully
2021-05-30 13:09:11,750 INFO mapreduce.Job: Counters: 54
	File System Counters
		FILE: Number of bytes read=54
		FILE: Number of bytes written=469925
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=167
		HDFS: Number of bytes written=32
		HDFS: Number of read operations=8
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=2
		HDFS: Number of bytes read erasure-coded=0
	Job Counters 
		Launched map tasks=1
		Launched reduce tasks=1
		Data-local map tasks=1
		Total time spent by all maps in occupied slots (ms)=1911
		Total time spent by all reduces in occupied slots (ms)=1846
		Total time spent by all map tasks (ms)=1911
		Total time spent by all reduce tasks (ms)=1846
		Total vcore-milliseconds taken by all map tasks=1911
		Total vcore-milliseconds taken by all reduce tasks=1846
		Total megabyte-milliseconds taken by all map tasks=1956864
		Total megabyte-milliseconds taken by all reduce tasks=1890304
	Map-Reduce Framework
		Map input records=4
		Map output records=11
		Map output bytes=111
		Map output materialized bytes=54
		Input split bytes=100
		Combine input records=11
		Combine output records=4
		Reduce input groups=4
		Reduce shuffle bytes=54
		Reduce input records=4
		Reduce output records=4
		Spilled Records=8
		Shuffled Maps =1
		Failed Shuffles=0
		Merged Map outputs=1
		GC time elapsed (ms)=108
		CPU time spent (ms)=900
		Physical memory (bytes) snapshot=450744320
		Virtual memory (bytes) snapshot=5566611456
		Total committed heap usage (bytes)=385351680
		Peak Map Physical memory (bytes)=277688320
		Peak Map Virtual memory (bytes)=2780983296
		Peak Reduce Physical memory (bytes)=173056000
		Peak Reduce Virtual memory (bytes)=2785628160
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=67
	File Output Format Counters 
		Bytes Written=32
[liucf@node1 myShell]$ 

hadoop3.2.2安装「建议收藏」

 

10.3 测试jobhistoryserver

查看刚才执行的Wordcount历史记录

hadoop3.2.2安装「建议收藏」

点击上面application——id后

hadoop3.2.2安装「建议收藏」

点击1 History后

hadoop3.2.2安装「建议收藏」

点击 2 Logs

hadoop3.2.2安装「建议收藏」

完结

 

 

 

免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/26453.html

(0)
上一篇 2023-04-11 10:00
下一篇 2023-05-06 10:00

相关推荐

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注微信