Lab – Components failure and recovery mechanisms

In this exercise we will run few hadoop commands to check the status of our cluster and balance the file distribution across the nodes.

1. As an admin you can get the report for hadoop cluster by the command (if you working with cent user you would see [cent@localhost ~] on your terminal window)

hadoop dfsadmin -report

https://lh4.googleusercontent.com/kTw0vkWSVuH0HMlkHfxEQmJZHppm1BBNT3rRep2mmcj9F_s3c8JMpNUzbbeDP8-ezNaEEvLUxsty9ltvluQx8q8T5e2-oJkGDSqJCCNwWPpi8mnfcct9_ZSkrqSqqQxYOAyzAzJH

2. To get complete view of your hadoop cluster you can run the command:

hadoop dfsadmin -printTopology

https://lh3.googleusercontent.com/wh1B5IvsCeYhrxaWF0tZv15q6gMrhrPA649kYYWznjMN5wHAINWMQvrDJp9dNOOz7Aic5ThfNd7WspciMstsStRcyoxP7nmc9eLCq5egnoKGl9I9bh0PReql03uRTvKH9xb_cbGh

3. To rebalance the blocks availability across available data nodes, use the command:

hadoop balancer

https://lh5.googleusercontent.com/vDP1eYL_2EACXWYr0nrIdv7Ye0v7LD0E_lZQ2u_nn2MVDv2gpW8fAxrhIrwM8OGXzrKm00iurk4nGYvm_GTOZ67hyYiczbSox_gHb61XXRlGOn_yr6YrO74bZuKsRlJczZy80A7Z

Running hadoop balancer without any run time argument / specification will perform required balancing as per default threshold that is 10.0

Threshold value can be seen in command logs over terminal.