Sunday, 26 May 2013

Hadoop Lab

D1: Getting Started

 

Here is a quick reference for installing and running a sample M/R program in Hadoop on a single node.

Download the latest Hadoop Distribution hadoop-X.tar.gz from here. When
writing this article the latest is hadoop.1.2.0.


1. Untar hoop-1.2.0.tar.gz.

                $ tar -xvf hadoop-1.2.0.tar.gz
                $ cd hadoop-1.2.0

2. JAVA_HOME:  

Make sure that java is available on your sandbox. Now open the script 
conf/hadoop-env.sh, uncomment and set JAVA_HOME with java installed path.

        Eg: export JAVA_HOME=/usr/local/jdk1.6.0_38/

3. Configuration

For setting up a pseudo-cluster on single box. You need to add the following
in the below three config files for getting started.

        a.  conf/core-site.xml :

            <configuration>
                      <property>
                         <name>fs.default.name</name>
                         <value>hdfs://127.0.0.1:9000</value>
                      </property>
           </configuration>

      b. conf/hdfs-site.xml

          <configuration>
                <property>
                  <name>dfs.replication</name>
                  <value>1</value>
               </property>
         </configuration>

   
     C. conf/mapred-site.xml

        <configuration>
               <property>
                     <name>mapred.job.tracker</name>
                     <value>127.0.0.1:9001</value>
              </property>
        </configuration>       

Note: We have set the hdfs and map-red machines as localhost. It we were using a cluster we would have given the target machine ip. One interesting pointe to rember here is that we can use the same set of conf files on all the machines in the cluster.

4. Set up SSH : 

We should set up a password less ssh to login to any machine in cluster
from any other machine. As this is a Single node psedo cluster set up we
should set up a passless ssh login to localhost. Try
  
                         $ ssh localhost

If a prompt appears for password, skip this and jump to step 5. Else if you
see  a message as "Connection refused on port 22", its because sshd is
not up.
                         $ /etc/init.d/sshd start

Now try 'ssh localhost' if it prompts for password go to step 5.
If you dont see sshd binary get it from linux distriution repo,
for ubuntu users                 
           $sudo apt-get install openssh-server
                                       or
download from here  and install using command
   
      $sudo dbkg -i openssh-server_5.3p1-3ubuntu7_i386.deb

5. Password less ssh login setup:

To achieve this we have to generate rsa public key.



  $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
  $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

you can find the public key in /home/varun580/.ssh/id_rsa.pub   
copy this key to autorize_keys file in the same dir.
    
     
Now you should be able to do a 'ssh login' without a password.

6. Execution

Format a new distributed-filesystem:
              $ bin/hadoop namenode -format
   
Start the hadoop daemons:
             $ bin/start-all.sh
   
Browse the web interface for the NameNode and the JobTracker; by default
they are available at:
    

No comments:

Post a Comment