Yushan Lu's Blog: Hadoop Install and Config

Install

Virtual Machine(optional)
Config iptables and ssh
Config hostname
Make sure we have JDK
Extract hadoop to /usr/local/hadoop
Set environment
Config files
Format Hadoop
Start Hadoop
Auto Start Hadoop
Warning

Virtual Machine(optional)
Required to make virtual machine and host can connect, so we can use ssh to config hosts
http://gvace.blogspot.com/2016/05/virtual-box-network-config.html

Config iptables and ssh

iptables: make sure ip is not blocked
http://gvace.blogspot.com/2016/04/linux-network-configs.html

ssh: can remote in terminal, and make Hadoop nodes can communicate each other
http://gvace.blogspot.com/2016/04/ssh.html

Config hostname
Config a hostname, and add known hostnames to config, so each node can communicate with hostnames instead of IPs.
http://gvace.blogspot.com/2016/04/linux-network-configs.html

Make sure we have JDK
java -version
javac -version

Download Hadoop and extract
http://hadoop.apache.org/

Extract hadoop to /usr/local/hadoop

Set Environment

/etc/profile.d/jdk.sh

#JDK Environment

export J2SDKDIR=/usr/lib/jvm/java-7-oracle

export J2REDIR=/usr/lib/jvm/java-7-oracle/jre

export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin:/usr/lib/jvm/java-7-oracle/db/bin:/usr/lib/jvm/java-7-oracle/jre/bin

export JAVA_HOME=/usr/lib/jvm/java-7-oracle

export DERBY_HOME=/usr/lib/jvm/java-7-oracle/db

/etc/profile.d/hadoop-env.sh

#Hadoop Environment

export PATH=$PATH:/usr/local/hadoop/bin

export HADOOP_HOME=/usr/local/hadoop

#Optional if see warning "HADOOP_HOME is deprecated"

export HADOOP_HOME_WARN_SUPPRESS=1

Config files

In /usr/local/hadoop/conf

hadoop-env.sh
core-site.xml
hdfs-site.xml
mapred-site.xml

hadoop-env.sh

Change JAHA_HOME to jdk directory

JAVA_HOME=/usr/lib/jvm/java-7-oracle/

core-site.xml

Assign hdfs default name, tmp directory

<configuration>
 <property>
  <name>fs.default.name</name>
  <value>hdfs://hmain:9000</value>
  <description>NameNode</description>
 </property>
 <property>
  <name>hadoop.tmp.dir</name>
  <value>/usr/local/hadoop/tmp</value>
 </property>
</configuration>

hdfs-site.xml

<configuration>
        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.permissions</name>
                <value>false</value>
        </property>
</configuration>

mapred-site.xml

<configuration>
 <property>
  <name>mapred.job.tracker</name>
  <value>hmain:9001</value>
  <description>Change hostname and port</description>
 </property>
</configuration>

Format Hadoop

hadoop namenode -format

We may find some issue by user permissions.

This is the error I got.

16/05/02 00:47:35 INFO namenode.FSNamesystem: fsOwner=yushan
16/05/02 00:47:35 INFO namenode.FSNamesystem: supergroup=supergroup
16/05/02 00:47:35 INFO namenode.FSNamesystem: isPermissionEnabled=false
16/05/02 00:47:35 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/05/02 00:47:35 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/05/02 00:47:36 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/05/02 00:47:36 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/05/02 00:47:36 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/local/hadoop/tmp/dfs/name/current

As wee see, by default, hadoop user is the creating user's username, and group is "supergroup"
We can change the whole folder /usr/local/hadoop and all its sub forlder/files to the username and group,(create the group name if not exist) and assign 755 on the whole folder. This will be fixed.

Cannot do hadoop namenode -format more than once
If did format more than once, we can delete /usr/local/hadoop/tmp folder, and format again

Start Hadoop

Run start-all.sh

See java processes: Run jps
We will have 5 processes from Hadoop

2701 SecondaryNameNode
2989 Jps
2793 JobTracker
2550 DataNode
2940 TaskTracker
2410 NameNode

Auto Start Hadoop
If we want to set hadoop running as user "yushan", and auto starts when machine boots, before user login or ssh login

We need to add commands into /etc/rc.local, because this file runs before user login

#Start Hadoop at machine startup(before login) as user yushan

su yushan -c '/usr/local/hadoop/bin/start-all.sh'

If we want hadoop to start when user "yushan" login
Just put the above script it into /etc/profile.d/hadoop.sh

Stop Hadoop

Run stop-all.sh

Warning

If warning raised like "HADOOP_HOME is deprecated"
Go back to environment config hadoop-env.sh, add the new line

export PATH=$PATH:/usr/local/hadoop/bin
export HADOOP_HOME=/usr/local/hadoop
export HADOOP_HOME_WARN_SUPPRESS=1

Yushan Lu's Blog

Navigation

Sunday, May 1, 2016

Hadoop Install and Config

No comments:

Post a Comment