D1: Getting Started
Here is a quick reference for installing and running a sample M/R program in Hadoop on a single node.
Download the latest Hadoop Distribution hadoop-X.tar.gz from here. When
writing this article the latest is hadoop.1.2.0.
1. Untar hoop-1.2.0.tar.gz.
$ tar -xvf hadoop-1.2.0.tar.gz
$ cd hadoop-1.2.0
2. JAVA_HOME:
Make sure that java is available on your sandbox. Now open the script
conf/hadoop-env.sh, uncomment and set JAVA_HOME with java installed path.
Eg: export JAVA_HOME=/usr/local/jdk1.6.0_38/
3. Configuration:
For setting up a pseudo-cluster on single box. You need to add the following
in the below three config files for getting started.
a. conf/core-site.xml :
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://127.0.0.1:9000</value>
</property>
</configuration>
b. conf/hdfs-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
C. conf/mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>127.0.0.1:9001</value>
</property>
</configuration>
Note: We have set the hdfs and map-red machines as localhost. It
we were using a cluster we would have given the target machine ip. One
interesting pointe to rember here is that we can use the same set of
conf files on all the machines in the cluster.
4. Set up SSH :
We should set up a password less ssh to login to any machine in cluster
from any other machine. As this is a Single node psedo cluster set up we
should set up a passless ssh login to localhost. Try
$ ssh localhost
If a prompt appears for password, skip this and jump to step 5. Else if you
see a message as "Connection refused on port 22", its because sshd is
not up.
$ /etc/init.d/sshd start
Now try 'ssh localhost' if it prompts for password go to step 5.
If you dont see sshd binary get it from linux distriution repo,
for ubuntu users
$sudo apt-get install openssh-server
or
download from here and install using command
$sudo dbkg -i openssh-server_5.3p1-3ubuntu7_i386.deb
5. Password less ssh login setup:
To achieve this we have to generate rsa public key.
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
you can find the public key in /home/varun580/.ssh/id_rsa.pub
copy this key to autorize_keys file in the same dir.
Now you should be able to do a 'ssh login' without a password.
6. Execution
Format a new distributed-filesystem:
$ bin/hadoop namenode -format
Start the hadoop daemons:
$ bin/start-all.sh
Browse the web interface for the NameNode and the JobTracker; by
default
they are available at:
- NameNode - http://localhost:50070/
- JobTracker - http://localhost:50030/
No comments:
Post a Comment