HDFS如何使用多个磁盘 - Hdfs

TOP

HDFS如何使用多个磁盘

2019-03-04 12:13:48 【大中小】浏览:194次

1fs.default.name

To run HDFS, you need to designate (指派)one machine as a namenode. In this case, the
propertyfs.default.nameis a HDFS filesystem URI, whose host is the namenode’s
hostname or IP address, and port is the port that the namenode will listen on for RPCs.
If no port is specified, the default of 8020 is used.
Thefs.default.nameproperty also doubles as specifying the default filesystem.The
default filesystem is used to resolve relative paths, which are handy （有用）to use since they
save typing (and avoid hardcoding knowledge of a particular namenode’s address). For
example, with the default filesystem defined in Example 9-1, the relative URI /a/b is
resolved to hdfs://namenode/a/b.

2 dfs.name.dir

here are a few other configuration properties you should set for HDFS: those that set
the storage directories for the namenode and for datanodes. The property
dfs.name.dirspecifiesa list of directorieswhere the namenode stores persistent file-
system metadata (the edit log, and the filesystem image). A copy of each of the metadata
files is stored in each directoryfor redundancy（冗余,即namenode在dfs.name.dir每一项位置中存的数据都是一样的）.

It’s common to configuredfs.name.dirso that the namenode metadata is written toone or two local disks, and
a remote disk, such as a NFS-mounted directory. Such a setup guards against failure
of a local disk, and failure of the entire namenode, since in both cases the files can be
recovered and used to start a new namenode. (The secondary namenode takes only
periodic checkpoints of the namenode, so it does not provide an up-to-date backup of
the namenode.)

3 dfs.data.dir

You should also set thedfs.data.dirproperty, which specifiesa list of directories for
a datanode to store its blocks.Unlike the namenode,which uses multiple directories
for redundancy(冗余), a datanoderound-robins(轮循,datanode在dfs.data.dir每一项位置中存的数据是不一样的）. ) writes between its storage directories, so for
performance you should specify a storage directory for each local disk. Read perform-
ance also benefits from having multiple disks for storage, because blocks will be spread
across them, and concurrent reads for distinct blocks will be correspondingly spread
across disks.

4 fs.checkpoint.dir

Finally, you should configure where the secondary namenode stores its checkpoints of
the filesystem. Thefs.checkpoint.dirproperty specifies a list of directories where the
checkpoints are kept. Like the storage directories for the namenode, which keep re-
dundant copies of the namenode metadata, the checkpointed filesystem image is stored
in each checkpoint directory for redundancy.

Notethat the storage directories for HDFS are under Hadoop’stempo-
rary directory by default (the hadoop.tmp.dir property, whose default
is /tmp/hadoop-${user.name}).Therefore it is critical that these proper-
ties are set so that data is not lost by the system clearing out temporary
directories.


【大中小】【打印】【繁体】【投稿】【收藏】【推荐】【举报】【评论】【关闭】【返回顶部】

上一篇：Hadoop-->HDFS原理总结	下一篇：HDFS（九）——HDFS 的底层原理（..