当前位置:   article > 正文

Hadoop大数据应用:Linux 部署 HDFS 分布式集群_linux 部署hdfs

linux 部署hdfs

目录

  一、实验

1.环境

2.Linux 部署 HDFS 分布式集群

3.Linux 使用 HDFS 文件系统

二、问题

1.ssh-copy-id 报错

2. 如何禁用ssh key 检测

3.HDFS有哪些配置文件

4.hadoop查看版本报错

5.启动集群报错

6.hadoop 的启动和停止命令

7.上传文件报错

8.HDFS 使用命令


  一、实验

1.环境

(1)主机

表1  主机

主机架构软件版本IP备注
hadoop

NameNode

SecondaryNameNode

hadoop2.7.7192.168.204.50

node01DataNodehadoop2.7.7192.168.204.51
node02DataNodehadoop2.7.7192.168.204.52
node03DataNodehadoop2.7.7192.168.204.53

(2)安全机制

查看

[root@localhost ~]# sestatus

关闭

  1. [root@localhost ~]# vim /etc/selinux/config
  2. ……
  3. SELINUX=disabled
  4. ……

再次查看(需要reboot重启)

[root@localhost ~]# sestatus

(3)防火墙

关闭

  1. [root@localhost ~]# systemctl stop firewalld
  2. [root@localhost ~]# systemctl mask firewalld

(4)安装java

[root@localhost ~]# yum install -y java-1.8.0-openjdk-devel.x86_64

查看

[root@localhost ~]# jps

hadoop

node01

node02

node03

(5)域名主机名

  1. [root@localhost ~]# vim /etc/hosts
  2. ……
  3. 192.168.205.50 hadoop
  4. 192.168.205.51 node01
  5. 192.168.205.52 node02
  6. 192.168.205.53 node03

(6)修改主机名

  1. [root@localhost ~]# hostnamectl set-hostname 主机名
  2. [root@localhost ~]# bash

(7)hadoop节点创建密钥

  1. [root@hadoop ~]# mkdir /root/.ssh
  2. [root@hadoop ~]# cd /root/.ssh/
  3. [root@hadoop .ssh]# ssh-keygen -t rsa -b 2048 -N ''

(8)添加免密登录

  1. [root@hadoop .ssh]# ssh-copy-id -i id_rsa.pub hadoop
  2. [root@hadoop .ssh]# ssh-copy-id -i id_rsa.pub node01
  3. [root@hadoop .ssh]# ssh-copy-id -i id_rsa.pub node02
  4. [root@hadoop .ssh]# ssh-copy-id -i id_rsa.pub node03

2.Linux 部署 HDFS 分布式集群

(1)官网

https://hadoop.apache.org/

查看版本

https://archive.apache.org/dist/hadoop/common/

(2)下载

wget https://archive.apache.org/dist/hadoop/common/hadoop-2.7.7/hadoop-2.7.7.tar.gz

(3) 解压

tar -zxf hadoop-2.7.7.tar.gz

(3)移动

mv hadoop-2.7.7 /usr/local/hadoop

(4)更改权限

chown -R root.root hadoop

(5)验证版本

(需要修改环境配置文件hadoop-env.sh 申明JAVA安装路径和hadoop配置文件路径)

 修改配置文件

[root@hadoop hadoop]# vim hadoop-env.sh

 

验证

[root@hadoop hadoop]# ./bin/hadoop version

(6)修改节点配置文件

[root@hadoop hadoop]# vim slaves

修改前:

修改后:

  1. node01
  2. node02
  3. node03

(7)查看官方文档

https://hadoop.apache.org/docs/

指定版本

https://hadoop.apache.org/docs/r2.7.7/

查看核心配置文件

https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-common/core-default.xml

文件系统配置参数:

数据目录配置参数:

(8)修改核心配置文件

[root@hadoop hadoop]# vim core-site.xml

修改前:

修改后:

  1. <configuration>
  2. <property>
  3. <name>fs.defaultFS</name>
  4. <value>hdfs://hadoop:9000</value>
  5. <description>hdfs file system</description>
  6. </property>
  7. <property>
  8. <name>hadoop.tmp.dir</name>
  9. <value>/var/hadoop</value>
  10. </property>
  11. </configuration>

(9)查看HDFS配置文件

https://hadoop.apache.org/docs/r2.7.7/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml

namenode:

副本数量:

(10)修改HDFS配置文件

[root@hadoop hadoop]# vim hdfs-site.xml

修改前:

修改后:

  1. <configuration>
  2. <property>
  3. <name>dfs.namenode.http-address</name>
  4. <value>hadoop:50070</value>
  5. <name>dfs.namenode.secondary.http-address</name>
  6. <value>hadoop:50090</value>
  7. <name>dfs.replication</name>
  8. <value>2</value>
  9. </property>
  10. </configuration>

(11) 查看同步

[root@hadoop ~]# rpm -q rsync

同步

  1. [root@hadoop ~]# rsync -aXSH --delete /usr/local/hadoop node01:/usr/local/
  2. [root@hadoop ~]# rsync -aXSH --delete /usr/local/hadoop node02:/usr/local/
  3. [root@hadoop ~]# rsync -aXSH --delete /usr/local/hadoop node03:/usr/local/

(12)初始化hdfs

[root@hadoop ~]# mkdir /var/hadoop

(13)查看命令

  1. [root@hadoop hadoop]# ./bin/hdfs
  2. Usage: hdfs [--config confdir] [--loglevel loglevel] COMMAND
  3. where COMMAND is one of:
  4. dfs run a filesystem command on the file systems supported in Hadoop.
  5. classpath prints the classpath
  6. namenode -format format the DFS filesystem
  7. secondarynamenode run the DFS secondary namenode
  8. namenode run the DFS namenode
  9. journalnode run the DFS journalnode
  10. zkfc run the ZK Failover Controller daemon
  11. datanode run a DFS datanode
  12. dfsadmin run a DFS admin client
  13. haadmin run a DFS HA admin client
  14. fsck run a DFS filesystem checking utility
  15. balancer run a cluster balancing utility
  16. jmxget get JMX exported values from NameNode or DataNode.
  17. mover run a utility to move block replicas across
  18. storage types
  19. oiv apply the offline fsimage viewer to an fsimage
  20. oiv_legacy apply the offline fsimage viewer to an legacy fsimage
  21. oev apply the offline edits viewer to an edits file
  22. fetchdt fetch a delegation token from the NameNode
  23. getconf get config values from configuration
  24. groups get the groups which users belong to
  25. snapshotDiff diff two snapshots of a directory or diff the
  26. current directory contents with a snapshot
  27. lsSnapshottableDir list all snapshottable dirs owned by the current user
  28. Use -help to see options
  29. portmap run a portmap service
  30. nfs3 run an NFS version 3 gateway
  31. cacheadmin configure the HDFS cache
  32. crypto configure HDFS encryption zones
  33. storagepolicies list/get/set block storage policies
  34. version print the version
  35. Most commands print help when invoked w/o parameters.

(14)格式化hdfs

[root@hadoop hadoop]# ./bin/hdfs namenode -format

查看目录

  1. [root@hadoop hadoop]# cd /var/hadoop/
  2. [root@hadoop hadoop]# tree .
  3. .
  4. └── dfs
  5. └── name
  6. └── current
  7. ├── fsimage_0000000000000000000
  8. ├── fsimage_0000000000000000000.md5
  9. ├── seen_txid
  10. └── VERSION
  11. 3 directories, 4 files

(15) 启动集群

查看目录

  1. [root@hadoop hadoop]# cd ~
  2. [root@hadoop ~]# cd /usr/local/hadoop/
  3. [root@hadoop hadoop]# ls

启动

[root@hadoop hadoop]# ./sbin/start-dfs.sh

查看日志(新生成logs目录)

[root@hadoop hadoop]# cd logs/ ; ll

查看jps

[root@hadoop hadoop]# jps

datanode节点查看(node01)

datanode节点查看(node02)

datanode节点查看(node03)

(16)查看命令

  1. [root@hadoop hadoop]# ./bin/hdfs dfsadmin
  2. Usage: hdfs dfsadmin
  3. Note: Administrative commands can only be run as the HDFS superuser.
  4. [-report [-live] [-dead] [-decommissioning]]
  5. [-safemode <enter | leave | get | wait>]
  6. [-saveNamespace]
  7. [-rollEdits]
  8. [-restoreFailedStorage true|false|check]
  9. [-refreshNodes]
  10. [-setQuota <quota> <dirname>...<dirname>]
  11. [-clrQuota <dirname>...<dirname>]
  12. [-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>]
  13. [-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>]
  14. [-finalizeUpgrade]
  15. [-rollingUpgrade [<query|prepare|finalize>]]
  16. [-refreshServiceAcl]
  17. [-refreshUserToGroupsMappings]
  18. [-refreshSuperUserGroupsConfiguration]
  19. [-refreshCallQueue]
  20. [-refresh <host:ipc_port> <key> [arg1..argn]
  21. [-reconfig <datanode|...> <host:ipc_port> <start|status>]
  22. [-printTopology]
  23. [-refreshNamenodes datanode_host:ipc_port]
  24. [-deleteBlockPool datanode_host:ipc_port blockpoolId [force]]
  25. [-setBalancerBandwidth <bandwidth in bytes per second>]
  26. [-fetchImage <local directory>]
  27. [-allowSnapshot <snapshotDir>]
  28. [-disallowSnapshot <snapshotDir>]
  29. [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
  30. [-getDatanodeInfo <datanode_host:ipc_port>]
  31. [-metasave filename]
  32. [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
  33. [-help [cmd]]
  34. Generic options supported are
  35. -conf <configuration file> specify an application configuration file
  36. -D <property=value> use value for given property
  37. -fs <local|namenode:port> specify a namenode
  38. -jt <local|resourcemanager:port> specify a ResourceManager
  39. -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
  40. -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
  41. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
  42. The general command line syntax is

(17)验证集群

查看报告,发现3个节点

  1. [root@hadoop hadoop]# ./bin/hdfs dfsadmin -report
  2. Configured Capacity: 616594919424 (574.25 GB)
  3. Present Capacity: 598915952640 (557.78 GB)
  4. DFS Remaining: 598915915776 (557.78 GB)
  5. DFS Used: 36864 (36 KB)
  6. DFS Used%: 0.00%
  7. Under replicated blocks: 0
  8. Blocks with corrupt replicas: 0
  9. Missing blocks: 0
  10. Missing blocks (with replication factor 1): 0
  11. -------------------------------------------------
  12. Live datanodes (3):
  13. Name: 192.168.204.53:50010 (node03)
  14. Hostname: node03
  15. Decommission Status : Normal
  16. Configured Capacity: 205531639808 (191.42 GB)
  17. DFS Used: 12288 (12 KB)
  18. Non DFS Used: 5620584448 (5.23 GB)
  19. DFS Remaining: 199911043072 (186.18 GB)
  20. DFS Used%: 0.00%
  21. DFS Remaining%: 97.27%
  22. Configured Cache Capacity: 0 (0 B)
  23. Cache Used: 0 (0 B)
  24. Cache Remaining: 0 (0 B)
  25. Cache Used%: 100.00%
  26. Cache Remaining%: 0.00%
  27. Xceivers: 1
  28. Last contact: Thu Mar 14 10:30:18 CST 2024
  29. Name: 192.168.204.51:50010 (node01)
  30. Hostname: node01
  31. Decommission Status : Normal
  32. Configured Capacity: 205531639808 (191.42 GB)
  33. DFS Used: 12288 (12 KB)
  34. Non DFS Used: 6028849152 (5.61 GB)
  35. DFS Remaining: 199502778368 (185.80 GB)
  36. DFS Used%: 0.00%
  37. DFS Remaining%: 97.07%
  38. Configured Cache Capacity: 0 (0 B)
  39. Cache Used: 0 (0 B)
  40. Cache Remaining: 0 (0 B)
  41. Cache Used%: 100.00%
  42. Cache Remaining%: 0.00%
  43. Xceivers: 1
  44. Last contact: Thu Mar 14 10:30:18 CST 2024
  45. Name: 192.168.204.52:50010 (node02)
  46. Hostname: node02
  47. Decommission Status : Normal
  48. Configured Capacity: 205531639808 (191.42 GB)
  49. DFS Used: 12288 (12 KB)
  50. Non DFS Used: 6029533184 (5.62 GB)
  51. DFS Remaining: 199502094336 (185.80 GB)
  52. DFS Used%: 0.00%
  53. DFS Remaining%: 97.07%
  54. Configured Cache Capacity: 0 (0 B)
  55. Cache Used: 0 (0 B)
  56. Cache Remaining: 0 (0 B)
  57. Cache Used%: 100.00%
  58. Cache Remaining%: 0.00%
  59. Xceivers: 1
  60. Last contact: Thu Mar 14 10:30:18 CST 2024

(18)web页面验证

http://192.168.204.50:50070/

http://192.168.204.50:50090/

http://192.168.204.51:50075/

(19)访问系统

目前为空

3.Linux 使用 HDFS 文件系统

(1)查看命令

  1. [root@hadoop hadoop]# ./bin/hadoop
  2. Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
  3. CLASSNAME run the class named CLASSNAME
  4. or
  5. where COMMAND is one of:
  6. fs run a generic filesystem user client
  7. version print the version
  8. jar <jar> run a jar file
  9. note: please use "yarn jar" to launch
  10. YARN applications, not this command.
  11. checknative [-a|-h] check native hadoop and compression libraries availability
  12. distcp <srcurl> <desturl> copy file or directories recursively
  13. archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
  14. classpath prints the class path needed to get the
  15. credential interact with credential providers
  16. Hadoop jar and the required libraries
  17. daemonlog get/set the log level for each daemon
  18. trace view and modify Hadoop tracing settings
  19. Most commands print help when invoked w/o parameters.

  1. [root@hadoop hadoop]# ./bin/hadoop fs
  2. Usage: hadoop fs [generic options]
  3. [-appendToFile <localsrc> ... <dst>]
  4. [-cat [-ignoreCrc] <src> ...]
  5. [-checksum <src> ...]
  6. [-chgrp [-R] GROUP PATH...]
  7. [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
  8. [-chown [-R] [OWNER][:[GROUP]] PATH...]
  9. [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
  10. [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
  11. [-count [-q] [-h] <path> ...]
  12. [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
  13. [-createSnapshot <snapshotDir> [<snapshotName>]]
  14. [-deleteSnapshot <snapshotDir> <snapshotName>]
  15. [-df [-h] [<path> ...]]
  16. [-du [-s] [-h] <path> ...]
  17. [-expunge]
  18. [-find <path> ... <expression> ...]
  19. [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
  20. [-getfacl [-R] <path>]
  21. [-getfattr [-R] {-n name | -d} [-e en] <path>]
  22. [-getmerge [-nl] <src> <localdst>]
  23. [-help [cmd ...]]
  24. [-ls [-d] [-h] [-R] [<path> ...]]
  25. [-mkdir [-p] <path> ...]
  26. [-moveFromLocal <localsrc> ... <dst>]
  27. [-moveToLocal <src> <localdst>]
  28. [-mv <src> ... <dst>]
  29. [-put [-f] [-p] [-l] <localsrc> ... <dst>]
  30. [-renameSnapshot <snapshotDir> <oldName> <newName>]
  31. [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
  32. [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
  33. [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
  34. [-setfattr {-n name [-v value] | -x name} <path>]
  35. [-setrep [-R] [-w] <rep> <path> ...]
  36. [-stat [format] <path> ...]
  37. [-tail [-f] <file>]
  38. [-test -[defsz] <path>]
  39. [-text [-ignoreCrc] <src> ...]
  40. [-touchz <path> ...]
  41. [-truncate [-w] <length> <path> ...]
  42. [-usage [cmd ...]]
  43. Generic options supported are
  44. -conf <configuration file> specify an application configuration file
  45. -D <property=value> use value for given property
  46. -fs <local|namenode:port> specify a namenode
  47. -jt <local|resourcemanager:port> specify a ResourceManager
  48. -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster
  49. -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.
  50. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.
  51. The general command line syntax is
  52. bin/hadoop command [genericOptions] [commandOptions]

(2)查看文件目录

[root@hadoop hadoop]# ./bin/hadoop fs -ls /

(3)创建文件夹

[root@hadoop hadoop]# ./bin/hadoop fs -mkdir /devops

查看

查看web

(4)上传文件

[root@hadoop hadoop]# ./bin/hadoop fs -put *.txt /devops/

查看

[root@hadoop hadoop]# ./bin/hadoop fs -ls /devops/

查看web

  1. Permission Owner Group Size Last Modified Replication Block Size Name
  2. -rw-r--r-- root supergroup 84.4 KB 2024/3/14 11:05:33 2 128 MB LICENSE.txt
  3. -rw-r--r-- root supergroup 14.63 KB 2024/3/14 11:05:34 2 128 MB NOTICE.txt
  4. -rw-r--r-- root supergroup 1.33 KB 2024/3/14 11:05:34 2 128 MB README.txt

下载

(5)创建文件

[root@hadoop hadoop]# ./bin/hadoop fs -touchz /tfile

查看

[root@hadoop hadoop]# ./bin/hadoop fs -ls /

(5)下载文件

[root@hadoop hadoop]# ./bin/hadoop fs -get /tfile /tmp/

查看

[root@hadoop hadoop]# ls -l /tmp/ | grep tfile

查看web

(6) 查看命令比较

之前的设置

所以查看功能相同

  1. [root@hadoop hadoop]# ./bin/hadoop fs -ls /
  2. [root@hadoop hadoop]# ./bin/hadoop fs -ls hdfs://hadoop:9000/

另外官网默认是file ,使用的是本地文件目录

[root@hadoop hadoop]# ./bin/hadoop fs -ls file:///

二、问题

1.ssh-copy-id 报错

(1)报错

/usr/bin/ssh-copy-id: ERROR: ssh: connect to host hadoop port 22: Connection refused

(2)原因分析

主机解析错误。

(3)解决方法

修改前:

修改后:

成功:

2. 如何禁用ssh key 检测

(1)修改配置文件

[root@hadoop .ssh]# vim /etc/ssh/ssh_config

添加配置

StrictHostKeyChecking no

成功:

3.HDFS有哪些配置文件

(1)配置文件

  1. 1)环境配置文件
  2. hadoop-env.sh
  3. 2)核心配置文件
  4. core-site.xml
  5. 3)HDFS配置文件
  6. hdfs-site.xml
  7. 4)节点配置文件
  8. slaves

4.hadoop查看版本报错

(1) 报错

(2)原因分析

未申明JAVA环境。

(3)解决方法

申明JAVA环境。

查看

rpm -ql java-1.8.0-openjdk

确定JAVA环境

/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.402.b06-1.el7_9.x86_64/jre

确定配置路径

/usr/local/hadoop/etc/hadoop

修改配置文件

[root@hadoop hadoop]# vim hadoop-env.sh

修改前:

修改后:

成功:

[root@hadoop hadoop]# ./bin/hadoop version

5.启动集群报错

(1)报错

(2)原因分析

ssh-copy-id 未对本地主机验证。

(3)解决方法

ssh-copy-id 对本地主机验证。

[root@hadoop hadoop]# ssh-copy-id hadoop

如继续报错

需要停止Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode

[root@hadoop hadoop]# ./sbin/stop-dfs.sh

再次启动

6.hadoop 的启动和停止命令

(1)命令

  1. sbin/start-all.sh 启动所有的Hadoop守护进程。包括NameNode、 Secondary NameNode、DataNode、ResourceManager、NodeManager
  2. sbin/stop-all.sh 停止所有的Hadoop守护进程。包括NameNode、 Secondary NameNode、DataNode、ResourceManager、NodeManager
  3. sbin/start-dfs.sh 启动Hadoop HDFS守护进程NameNode、SecondaryNameNode、DataNode
  4. sbin/stop-dfs.sh 停止Hadoop HDFS守护进程NameNode、SecondaryNameNode和DataNode
  5. sbin/hadoop-daemons.sh start namenode 单独启动NameNode守护进程
  6. sbin/hadoop-daemons.sh stop namenode 单独停止NameNode守护进程
  7. sbin/hadoop-daemons.sh start datanode 单独启动DataNode守护进程
  8. sbin/hadoop-daemons.sh stop datanode 单独停止DataNode守护进程
  9. sbin/hadoop-daemons.sh start secondarynamenode 单独启动SecondaryNameNode守护进程
  10. sbin/hadoop-daemons.sh stop secondarynamenode 单独停止SecondaryNameNode守护进程
  11. sbin/start-yarn.sh 启动ResourceManager、NodeManager
  12. sbin/stop-yarn.sh 停止ResourceManager、NodeManager
  13. sbin/yarn-daemon.sh start resourcemanager 单独启动ResourceManager
  14. sbin/yarn-daemons.sh start nodemanager 单独启动NodeManager
  15. sbin/yarn-daemon.sh stop resourcemanager 单独停止ResourceManager
  16. sbin/yarn-daemons.sh stopnodemanager 单独停止NodeManager
  17. sbin/mr-jobhistory-daemon.sh start historyserver 手动启动jobhistory
  18. sbin/mr-jobhistory-daemon.sh stop historyserver 手动停止jobhistory

7.上传文件报错

(1)报错

(2)原因分析

命令错误

(3)解决方法

使用正确命令

[root@hadoop hadoop]# ./bin/hadoop fs -put *.txt /devops/

8.HDFS 使用命令

(1)命令

  1. ls 查看文件或目录
  2. cat 查看文件内容
  3. put 上传
  4. get 下载

声明:本文内容由网友自发贡献,不代表【wpsshop博客】立场,版权归原作者所有,本站不承担相应法律责任。如您发现有侵权的内容,请联系我们。转载请注明出处:https://www.wpsshop.cn/w/喵喵爱编程/article/detail/1002474
推荐阅读
相关标签
  

闽ICP备14008679号