大家好,欢迎来到IT知识分享网。
目录
8.3.3 yarn-env.sh和mapred-env.sh
1 准备机器
ip | hostanem | 配置 | 服务 |
---|---|---|---|
192.168.109.151 | node1 | 2c8g | master |
192.168.109.152 | node2 | 2c4g | |
192.168.109.153 | node3 | 2c4g | |
2 给普通用户liucf sudo 权限
三台机器都要
对于linux用户来讲,普通用户的权限是有一定限制的,所以在有些操作的时候,是需要sudo权限的,那么如何在linux下赋予普通用户sudo权限呢?此处将讲解一下方法。
在login我们的系统后,如果是普通用户,我们可以使用su – root来切换到root用户下进行操作,在root用户下,命令行输入 visudo回车即可进入/etc/sudoers下,找到
## Allow root to run any commands anywhere
root ALL=(ALL) ALL
在这个下面添加如下即可
liucf ALL=(ALL) ALL
然后保存退出,再次切换到普通用户下,执行sudo命令就可以看到相应的信息了
3 固定集群中每个节点的IP、机器名、IP与机器名的映射
3台都要
4 安装jdk
3台都要
4.1 卸载系统自带的OpenJDK以及相关的java文件
① 查看系统是否自带JDK
[liucf@node1 ~]$ java -version
openjdk version "1.8.0_262"
OpenJDK Runtime Environment (build 1.8.0_262-b10)
OpenJDK 64-Bit Server VM (build 25.262-b10, mixed mode)
[liucf@node1 ~]$ rpm -qa | grep java
java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64
javapackages-tools-3.4.1-11.el7.noarch
tzdata-java-2020a-1.el7.noarch
python-javapackages-3.4.1-11.el7.noarch
java-1.8.0-openjdk-headless-1.8.0.262.b10-1.el7.x86_64
[liucf@node1 ~]$
② 删除自带jdk
然后将上一步中带openjdk的文件全部删除,具体根据你安装的版本不同,文件也不尽相同,这里以上一步中的文件为例。
[liucf@node2 ~]$ sudo rpm -e --nodeps java-1.8.0-openjdk-1.8.0.262.b10-1.el7.x86_64
[liucf@node2 ~]$ sudo rpm -e --nodeps java-1.8.0-openjdk-headless-1.8.0.262.b10-1.el7.x86_64
[liucf@node2 ~]$ java -version
-bash: /usr/bin/java: No such file or directory
[liucf@node2 ~]$
4.2 下载解压jdk1.8到linux机器
[liucf@node1 soft]$ tar -zxvf jdk-8u121-linux-x64.tar.gz -C /home/liucf/soft
4.3 配置JAVA_HOME
[liucf@node1 jdk1.8.0_121]$ sudo vim /etc/profile
[liucf@node1 jdk1.8.0_121]$ source /etc/profile
[liucf@node1 jdk1.8.0_121]$ java -version
java version "1.8.0_121"
Java(TM) SE Runtime Environment (build 1.8.0_121-b13)
Java HotSpot(TM) 64-Bit Server VM (build 25.121-b13, mixed mode)
[liucf@node1 jdk1.8.0_121]$
5 配置SSH无密登录
CentOS7 默认使用RSA加密算法生成密钥对,保存在~/.ssh目录下的id_rsa(私钥)和id_rsa.pub(公钥)。也可以使用“-t DSA”参数指定为DSA算法,对应文件为id_dsa和id_dsa.pub,密钥对生成过程会提示输入私钥加密密码,可以直接回车不使用密码保护。
分发秘钥到各个机器,注意本机也分发一份
[liucf@node2 ~]$ ssh-copy-id -i node1
[liucf@node2 ~]$ ssh-copy-id -i node2
[liucf@node2 ~]$ ssh-copy-id -i node3
6 关闭防火墙和selinux
6.1 关闭防火墙
sudo systemctl stop firewalld
sudo systemctl disable firewalld
6.2 禁用selinux
sudo vim /etc/selinux/config
SELINUX=enforcing --> SELINUX=disabled
7 时钟同步
参照:https://blog.csdn.net/m0_37813354/article/details/105118147
里的 3.4 配置ntp
8 安装并配置Hadoop
下载:https://www.apache.org/dyn/closer.cgi/hadoop/common/hadoop-3.2.2/hadoop-3.2.2.tar.gz
8.1 现在单台机器上配置hadoop
[liucf@node1 softfile]$ tar -zxvf hadoop-3.2.2.tar.gz -C /home/liucf/soft
8.2 配置hdfs
8.2.1 core-site.xml
<configuration>
<!-- 配置NameNode运行的主机 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://node1:8020/</value>
</property>
<!--配置hadoop文件存储数据的目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/home/liucf/data/dfs/tmp</value>
</property>
</configuration>
8.2.2 hdfs-site.xml
<configuration>
<!--设置集群中有二个副本-->
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<!--关闭用hdfs权限检查-->
<property>
<name>dfs.permissions.enabled</name>
<value>false</value>
</property>
</configuration>
8.2.3 hadoop-env.sh
export JAVA_HOME=/home/liucf/soft/jdk1.8.0_121
8.2.4 work文件里配置datanode
8.2.5 分发文件到node2,node3
[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node2:/home/liucf/soft
[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node3:/home/liucf/soft
8.2.6 格式化hdfs
[liucf@node1 soft]$ hdfs namenode -format
8.2.7 启动namenode,datanode
[liucf@node1 soft]$ /home/liucf/soft/hadoop-3.2.2/sbin/hadoop-daemon.sh start namenode
[liucf@node2 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/hadoop-daemon.sh start datanode
[liucf@node3 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/hadoop-daemon.sh start datanode
8.2.8验证
① jps 命令各个机器上查看进程
② web ui 查看
查看端口
[liucf@node1 soft]$ hdfs getconf -confKey dfs.namenode.http-address
0.0.0.0:9870
8.3 配置yarn
8.3.1 yarn-site.xml
<configuration>
<!--Resourcemanager主机入口配置 -->
<property>
<name>yarn.resourcemanager.hostname</name>
<value>node1</value>
</property>
<!--mapreduce 启用 shuffle-->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
</configuration>
8.3.2 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>yarn.app.mapreduce.am.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.map.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<property>
<name>mapreduce.reduce.env</name>
<value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
</configuration>
如果不配置yarn.app.mapreduce.am.env,mapreduce.map.env,mapreduce.reduce.env 会报错如下。
[2021-05-30 11:22:28.097]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
Error: Could not find or load main class org.apache.hadoop.mapreduce.v2.app.MRAppMaster
Please check whether your etc/hadoop/mapred-site.xml contains the below configuration:
8.3.3 yarn-env.sh和mapred-env.sh
两个文件里都配置
export JAVA_HOME=/home/liucf/soft/jdk1.8.0_121
8.3.4 分发文件到node2,node3
[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node2:/home/liucf/soft
[liucf@node1 soft]$ scp -r hadoop-3.2.2 liucf@node3:/home/liucf/soft
8.3.5 启动yarn
[liucf@node1 soft]$ /home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start resourcemanager
[liucf@node2 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start nodemanager
[liucf@node3 data]$ /home/liucf/soft/hadoop-3.2.2/sbin/yarn-daemon.sh start nodemanager
8.3.6 验证
jps 看到进程都在
ResourceManager
NodeManager
完成
9 配置 JobHistoryServer
9.1 yarn-site.xml 增加日志聚合配置
<!-- 开启日志聚合 -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- yarn.log.server.url -->
<property>
<name>yarn.log.server.url</name>
<value>http://node1:19888/jobhistory/logs</value>
</property>
9.2 mapred-site.xml 增加日志聚合配置
<!-- 设置jobhistoryserver 没有配置的话 history入口不可用 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>node1:10020</value>
</property>
<!-- 配置web端口 -->
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>node1:19888</value>
</property>
<!-- 配置正在运行中的日志在hdfs上的存放路径 -->
<property>
<name>mapreduce.jobhistory.intermediate-done-dir</name>
<value>/jobhistory/done_intermediate</value>
</property>
<!-- 配置运行过的日志存放在hdfs上的存放路径 -->
<property>
<name>mapreduce.jobhistory.done-dir</name>
<value>/jobhistory/done</value>
</property>
9.3 分发配置到node2,node3
scp mapred-site.xml liucf@node2:/home/liucf/soft/hadoop-3.2.2/etc/hadoop
scp mapred-site.xml liucf@node3:/home/liucf/soft/hadoop-3.2.2/etc/hadoop
scp yarn-site.xml liucf@node2:/home/liucf/soft/hadoop-3.2.2/etc/hadoop
scp yarn-site.xml liucf@node3:/home/liucf/soft/hadoop-3.2.2/etc/hadoop
9.4 启动JobHistoryServer
/home/liucf/soft/hadoop-3.2.2/sbin/mr-jobhistory-daemon.sh start historyserver
10 测试
10.1 测试上传文件到hdfs
hadoop fs -cat /data/input/wc.txt
10.2 测试运行mapreduce 程序在yarn上
利用Hadoop自带example实现wordCount
hadoop jar /home/liucf/soft/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /data/input /data/output/wc
结果
[liucf@node1 myShell]$ hadoop jar /home/liucf/soft/hadoop-3.2.2/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.2.jar wordcount /data/input /data/output/wc
2021-05-30 13:08:56,391 INFO client.RMProxy: Connecting to ResourceManager at node1/192.168.109.151:8032
2021-05-30 13:08:56,832 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/liucf/.staging/job_1622350103597_0003
2021-05-30 13:08:57,009 INFO input.FileInputFormat: Total input files to process : 1
2021-05-30 13:08:57,090 INFO mapreduce.JobSubmitter: number of splits:1
2021-05-30 13:08:57,214 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1622350103597_0003
2021-05-30 13:08:57,215 INFO mapreduce.JobSubmitter: Executing with tokens: []
2021-05-30 13:08:57,353 INFO conf.Configuration: resource-types.xml not found
2021-05-30 13:08:57,354 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2021-05-30 13:08:57,419 INFO impl.YarnClientImpl: Submitted application application_1622350103597_0003
2021-05-30 13:08:57,456 INFO mapreduce.Job: The url to track the job: http://node1:8088/proxy/application_1622350103597_0003/
2021-05-30 13:08:57,456 INFO mapreduce.Job: Running job: job_1622350103597_0003
2021-05-30 13:09:02,528 INFO mapreduce.Job: Job job_1622350103597_0003 running in uber mode : false
2021-05-30 13:09:02,529 INFO mapreduce.Job: map 0% reduce 0%
2021-05-30 13:09:06,626 INFO mapreduce.Job: map 100% reduce 0%
2021-05-30 13:09:11,672 INFO mapreduce.Job: map 100% reduce 100%
2021-05-30 13:09:11,684 INFO mapreduce.Job: Job job_1622350103597_0003 completed successfully
2021-05-30 13:09:11,750 INFO mapreduce.Job: Counters: 54
File System Counters
FILE: Number of bytes read=54
FILE: Number of bytes written=469925
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=167
HDFS: Number of bytes written=32
HDFS: Number of read operations=8
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
HDFS: Number of bytes read erasure-coded=0
Job Counters
Launched map tasks=1
Launched reduce tasks=1
Data-local map tasks=1
Total time spent by all maps in occupied slots (ms)=1911
Total time spent by all reduces in occupied slots (ms)=1846
Total time spent by all map tasks (ms)=1911
Total time spent by all reduce tasks (ms)=1846
Total vcore-milliseconds taken by all map tasks=1911
Total vcore-milliseconds taken by all reduce tasks=1846
Total megabyte-milliseconds taken by all map tasks=1956864
Total megabyte-milliseconds taken by all reduce tasks=1890304
Map-Reduce Framework
Map input records=4
Map output records=11
Map output bytes=111
Map output materialized bytes=54
Input split bytes=100
Combine input records=11
Combine output records=4
Reduce input groups=4
Reduce shuffle bytes=54
Reduce input records=4
Reduce output records=4
Spilled Records=8
Shuffled Maps =1
Failed Shuffles=0
Merged Map outputs=1
GC time elapsed (ms)=108
CPU time spent (ms)=900
Physical memory (bytes) snapshot=450744320
Virtual memory (bytes) snapshot=5566611456
Total committed heap usage (bytes)=385351680
Peak Map Physical memory (bytes)=277688320
Peak Map Virtual memory (bytes)=2780983296
Peak Reduce Physical memory (bytes)=173056000
Peak Reduce Virtual memory (bytes)=2785628160
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=67
File Output Format Counters
Bytes Written=32
[liucf@node1 myShell]$
10.3 测试jobhistoryserver
查看刚才执行的Wordcount历史记录
点击上面application——id后
点击1 History后
点击 2 Logs
完结
免责声明:本站所有文章内容,图片,视频等均是来源于用户投稿和互联网及文摘转载整编而成,不代表本站观点,不承担相关法律责任。其著作权各归其原作者或其出版社所有。如发现本站有涉嫌抄袭侵权/违法违规的内容,侵犯到您的权益,请在线联系站长,一经查实,本站将立刻删除。 本文来自网络,若有侵权,请联系删除,如若转载,请注明出处:https://yundeesoft.com/26453.html