Lab – HDFS Introduction and Architecture

Note: To perform these exercises you need to have a hadoop configured environment. We have used hadoop 2.8.4 with java-8 in our cluster setup. If you have gone through our “Hadoop Administration (Basic)” course module, then you already have created this configuration during lab exercises. Use that for all exercises of this module.

1. Open your terminal, by clicking on terminal icon on Desktop

2. Start hadoop cluster by giving the command start-all.sh on your terminal.

Note: start-all.sh is an executable file which contains a set of commands to start all the components of hadoop distributed file system and yarn resource manager. Use of start-all.sh is now deprecated so you can run start-dfs.sh to run only hdfs components and start-yarn.sh to start yarn components separately. Provide the password whenever asked by the system.

3. Check whether all components are started or not using jps command.

Note:  jps command returns the list of all java processes with their process id running in cluster. Process id is allocated by the operating system when any process starts and it may be different for every machine. Every time you restart the process even on same machine it may get different process id. Here Namenode and Datanode are mentioned along with other components.

All other components will be discussed in detail in upcoming topics.