Sunday, May 1, 2016

Hadoop Install and Config

Install

  • Virtual Machine(optional)
  • Config iptables and ssh
  • Config hostname
  • Make sure we have JDK
  • Extract hadoop to /usr/local/hadoop
  • Set environment
  • Config files
  • Format Hadoop
  • Start Hadoop
  • Auto Start Hadoop
  • Warning
Virtual Machine(optional)
Required to make virtual machine and host can connect, so we can use ssh to config hosts
http://gvace.blogspot.com/2016/05/virtual-box-network-config.html

Config iptables and ssh

iptables: make sure ip is not blocked
http://gvace.blogspot.com/2016/04/linux-network-configs.html

ssh: can remote in terminal, and make Hadoop nodes can communicate each other
http://gvace.blogspot.com/2016/04/ssh.html

Config hostname
Config a hostname, and add known hostnames to config, so each node can communicate with hostnames instead of IPs.
http://gvace.blogspot.com/2016/04/linux-network-configs.html


Make sure we have JDK
java -version
javac -version

Download Hadoop and extract
http://hadoop.apache.org/

Extract hadoop to /usr/local/hadoop 

Set Environment

/etc/profile.d/jdk.sh
#JDK Environment
export J2SDKDIR=/usr/lib/jvm/java-7-oracle
export J2REDIR=/usr/lib/jvm/java-7-oracle/jre
export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin:/usr/lib/jvm/java-7-oracle/db/bin:/usr/lib/jvm/java-7-oracle/jre/bin
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export DERBY_HOME=/usr/lib/jvm/java-7-oracle/db

/etc/profile.d/hadoop-env.sh
#Hadoop Environment
export PATH=$PATH:/usr/local/hadoop/bin
export HADOOP_HOME=/usr/local/hadoop 
#Optional if see warning "HADOOP_HOME is deprecated"
export HADOOP_HOME_WARN_SUPPRESS=1


Config files


In /usr/local/hadoop/conf
  1. hadoop-env.sh
  2. core-site.xml
  3. hdfs-site.xml
  4. mapred-site.xml
hadoop-env.sh

Change JAHA_HOME to jdk directory
JAVA_HOME=/usr/lib/jvm/java-7-oracle/

core-site.xml
Assign hdfs default name, tmp directory
<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://hmain:9000</value>
  <description>NameNode</description>
 </property>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/tmp</value>
 </property>
</configuration>


hdfs-site.xml
<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>


mapred-site.xml
<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>hmain:9001</value>
  <description>Change hostname and port</description>
 </property>
</configuration>


Format Hadoop

hadoop namenode -format

We may find some issue by user permissions.

This is the error I got.
16/05/02 00:47:35 INFO namenode.FSNamesystem: fsOwner=yushan
16/05/02 00:47:35 INFO namenode.FSNamesystem: supergroup=supergroup
16/05/02 00:47:35 INFO namenode.FSNamesystem: isPermissionEnabled=false
16/05/02 00:47:35 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/05/02 00:47:35 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/05/02 00:47:36 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/05/02 00:47:36 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/05/02 00:47:36 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/local/hadoop/tmp/dfs/name/current

As wee see, by default, hadoop user is the creating user's username, and group is "supergroup"
We can change the whole folder /usr/local/hadoop and all its sub forlder/files to the username and group,(create the group name if not exist) and assign 755 on the whole folder. This will be fixed.

Cannot do hadoop namenode -format more than once
If did format more than once, we can delete /usr/local/hadoop/tmp folder, and format again


Start Hadoop

Run start-all.sh

See java processes: Run jps
We will have 5 processes from Hadoop

2701 SecondaryNameNode
2989 Jps
2793 JobTracker
2550 DataNode
2940 TaskTracker
2410 NameNode


Auto Start Hadoop
If we want to set hadoop running as user "yushan", and auto starts when machine boots, before user login or ssh login

We need to add commands into /etc/rc.local, because this file runs before user login
#Start Hadoop at machine startup(before login) as user yushan
su yushan -c '/usr/local/hadoop/bin/start-all.sh'
If we want hadoop to start when user "yushan" login
Just put the above script it into /etc/profile.d/hadoop.sh


Stop Hadoop

Run stop-all.sh


Warning

If warning raised like "HADOOP_HOME is deprecated"
Go back to environment config hadoop-env.sh, add the new line

export PATH=$PATH:/usr/local/hadoop/bin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HOME_WARN_SUPPRESS=1







No comments:

Post a Comment