From http://developer.yahoo.com/hadoop/tutorial/module2.html
Rebalancing Blocks
如何添加新节点到集群:
New nodes can be added to a cluster
in a straightforward manner. On the new node, the same Hadoop
version and configuration (
conf/hadoop-site.xml
)
as on
the rest of the cluster should be installed. Starting the DataNode
daemon on the machine will cause it to contact the NameNode and
join the cluster. (The new node should be added to the slaves
file
on the master server as well, to inform the master how to invoke
script-based commands on the new node.)
如何在新的节点上平衡数据:
But the new DataNode will have no data on board initially; it is
therefore not alleviating space concerns on the existing nodes. New
files will be stored on the new DataNode in addition to the existing
ones, but for optimum usage, storage should be evenly balanced across
all nodes.
This can be achieved with the automatic balancer tool
included with Hadoop. The Balancer
class will intelligently balance blocks across the nodes to achieve
an even distribution of blocks within a given threshold, expressed as
a percentage. (The default is 10%.) Smaller percentages make nodes
more evenly balanced, but may require more time to achieve this state.
Perfect balancing (0%) is unlikely to actually be achieved.
The balancer script can be run by starting
bin/start-balancer.sh
in the Hadoop directory. The script can be
provided a balancing threshold percentage with the -threshold
parameter;
e.g., bin/start-balancer.sh -threshold 5
.
The
balancer will automatically terminate when it achieves its goal, or
when an error occurs, or it cannot find more candidate blocks to move
to achieve better balance. The balancer can always be terminated
safely by the administrator by running
bin/stop-balancer.sh
.
The balancing script can be run either when nobody else is using
the cluster (e.g., overnight), but can also be run in an
"online" fashion while many other jobs are on-going. To
prevent the rebalancing process from consuming large amounts of
bandwidth and significantly degrading the performance of other
processes on the cluster, the dfs.balance.bandwidthPerSec
configuration parameter can be used to limit the number of bytes/sec
each node may devote to rebalancing its data store.
Copying Large Sets of Files
When migrating a large
number of files from one location to another (either from one HDFS
cluster to another, from S3 into HDFS or vice versa, etc), the task should
be divided between multiple nodes to allow them all to share in the
bandwidth required for the process. Hadoop includes a tool called
distcp
for this purpose.
By invoking bin/hadoop distcp src
dest
,
Hadoop will start a MapReduce task to distribute the burden of copying
a large number of files from src
to dest
. These two
parameters may specify a full URL for the the path to copy.
e.g., "hdfs://SomeNameNode:9000/foo/bar/"
and
"hdfs://OtherNameNode:2000/baz/quux/"
will
copy the children of /foo/bar
on one cluster to
the directory tree rooted at /baz/quux
on the other.
The paths are assumed to be directories, and are copied recursively.
S3 URLs can be specified with s3://bucket-name
/key
.
Decommissioning Nodes
如何从集群中删除节点:
In addition to allowing nodes to be added to the cluster on the fly,
nodes can also be removed from a cluster while it is running
, without
data loss. But if nodes are simply shut down "hard,"
data loss may occur
as they may hold the sole copy of one or more
file blocks.
Nodes must be retired on a schedule that allows HDFS to ensure that
no blocks are entirely replicated within the to-be-retired set of
DataNodes.
HDFS provides a decommissioning feature which ensures that this
process is performed safely. To use it, follow the steps below:
Step 1: Cluster configuration
. If it is assumed that nodes
may be retired in your cluster, then before it is started, an
excludes file
must be configured. Add a key named
dfs.hosts.exclude
to your conf/hadoop-site.xml
file.
The value associated with this key provides the full path to
a file on the NameNode's local file system
which contains a list of machines which are not permitted to connect
to HDFS.
Step 2: Determine hosts to decommission
. Each machine to be
decommissioned should be added to the file identified by
dfs.hosts.exclude
, one per line. This will prevent
them from connecting to the NameNode.
Step 3: Force configuration reload
. Run the command
bin/hadoop dfsadmin -refreshNodes
. This will force the
NameNode to reread its configuration, including the newly-updated
excludes file. It will decommission the nodes over a period of time,
allowing time for each node's blocks to be replicated onto machines
which are scheduled to remain active.
Step 4: Shutdown nodes
. After the decommission process has
completed, the decommissioned hardware can be safely shutdown for
maintenance, etc. The
bin/hadoop dfsadmin -report
command
will describe which nodes are connected to the cluster.
Step 5: Edit excludes file again
. Once the machines have
been decommissioned, they can be removed from the excludes file.
Running bin/hadoop dfsadmin -refreshNodes
again will
read the excludes file back into the NameNode, allowing the
DataNodes to rejoin the cluster after maintenance has been completed,
or additional capacity is needed in the cluster again, etc.
Verifying File System Health
After decommissioning nodes, restarting a cluster, or periodically
during its lifetime, you may want to ensure that the file system is
healthy--that files are not corrupted or under-replicated, and that
blocks are not missing.
Hadoop provides an fsck
command to do exactly this. It
can be launched at the command line like so:
bin/hadoop fsck [path
] [options
]
If run with no arguments, it will print usage information and exit.
If run with the argument /
, it will check the health of the entire
file system and print a report. If provided with a path to a particular
directory or file, it will only check files under that path. If an
option argument is given but no path, it will start
from the file system root (/
). The
options
may include two different types of options:
Action
options specify what action should be taken when
corrupted files are found. This can be -move
, which moves
corrupt files to /lost+found
, or -delete
, which
deletes corrupted files.
Information
options specify how verbose the tool
should be in its report. The -files
option will list all
files it checks as it encounters them. This information can be further
expanded by adding the -blocks
option, which prints the list of
blocks for each file. Adding -locations
to these two options
will then print the addresses of the DataNodes holding these blocks.
Still more information can be retrieved by adding -racks
to the end of this list, which then prints the rack topology information
for each location. (See the next subsection for more information on
configuring network rack awareness.) Note that the later options
do not imply the former; you must use them in conjunction with one
another. Also, note that the Hadoop program uses -files
in
a "common argument parser" shared by the different commands
such as dfsadmin
, fsck
, dfs
, etc. This
means that if you omit a path argument to fsck, it will not receive
the -files
option that you intend. You can separate common
options from fsck-specific options by using --
as an
argument, like so:
bin/hadoop fsck -- -files -blocks
The --
is not required if you provide a path to start
the check from, or if you specify another argument first such as
-move
.
By default, fsck will not operate on files still open for write
by another client. A list of such files can be produced with the
-openforwrite
option.