- Virtual Machine(optional)
- Config iptables and ssh
- Config hostname
- Make sure we have JDK
- Extract hadoop to /usr/local/hadoop
- Set environment
- Config files
- Format Hadoop
- Start Hadoop
- Auto Start Hadoop
- Warning
Required to make virtual machine and host can connect, so we can use ssh to config hosts
Config iptables and ssh
iptables: make sure ip is not blocked
ssh: can remote in terminal, and make Hadoop nodes can communicate each other
Config hostname
Config a hostname, and add known hostnames to config, so each node can communicate with hostnames instead of IPs.
Make sure we have JDK
java -version
javac -version
Download Hadoop and extract
Set Environment
#JDK Environment
export J2SDKDIR=/usr/lib/jvm/java-7-oracle
export J2REDIR=/usr/lib/jvm/java-7-oracle/jre
export PATH=$PATH:/usr/lib/jvm/java-7-oracle/bin:/usr/lib/jvm/java-7-oracle/db/bin:/usr/lib/jvm/java-7-oracle/jre/bin
export JAVA_HOME=/usr/lib/jvm/java-7-oracle
export DERBY_HOME=/usr/lib/jvm/java-7-oracle/db
#Hadoop Environment
export PATH=$PATH:/usr/local/hadoop/bin
export HADOOP_HOME=/usr/local/hadoop
#Optional if see warning "HADOOP_HOME is deprecated"
Config files
In /usr/local/hadoop/conf
- hadoop-env.sh
- core-site.xml
- hdfs-site.xml
- mapred-site.xml
Change JAHA_HOME to jdk directory
Assign hdfs default name, tmp directory
<configuration> <property> <name>fs.default.name</name> <value>hdfs://hmain:9000</value> <description>NameNode</description> </property> <property> <name>hadoop.tmp.dir</name> <value>/usr/local/hadoop/tmp</value> </property> </configuration>
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> </configuration>
<configuration> <property> <name>mapred.job.tracker</name> <value>hmain:9001</value> <description>Change hostname and port</description> </property> </configuration>
Format Hadoop
hadoop namenode -format
We may find some issue by user permissions.
This is the error I got.
16/05/02 00:47:35 INFO namenode.FSNamesystem: fsOwner=yushan
16/05/02 00:47:35 INFO namenode.FSNamesystem: supergroup=supergroup
16/05/02 00:47:35 INFO namenode.FSNamesystem: isPermissionEnabled=false
16/05/02 00:47:35 INFO namenode.FSNamesystem: dfs.block.invalidate.limit=100
16/05/02 00:47:35 INFO namenode.FSNamesystem: isAccessTokenEnabled=false accessKeyUpdateInterval=0 min(s), accessTokenLifetime=0 min(s)
16/05/02 00:47:36 INFO namenode.FSEditLog: dfs.namenode.edits.toleration.length = 0
16/05/02 00:47:36 INFO namenode.NameNode: Caching file names occuring more than 10 times
16/05/02 00:47:36 ERROR namenode.NameNode: java.io.IOException: Cannot create directory /usr/local/hadoop/tmp/dfs/name/current
We can change the whole folder /usr/local/hadoop and all its sub forlder/files to the username and group,(create the group name if not exist) and assign 755 on the whole folder. This will be fixed.
Cannot do hadoop namenode -format more than once
If did format more than once, we can delete /usr/local/hadoop/tmp folder, and format again
Start Hadoop
Run start-all.sh
See java processes: Run jps
We will have 5 processes from Hadoop
2701 SecondaryNameNode
2989 Jps
2793 JobTracker
2550 DataNode
2940 TaskTracker
2410 NameNode
Auto Start Hadoop
If we want to set hadoop running as user "yushan", and auto starts when machine boots, before user login or ssh login
We need to add commands into /etc/rc.local, because this file runs before user login
If we want hadoop to start when user "yushan" login#Start Hadoop at machine startup(before login) as user yushansu yushan -c '/usr/local/hadoop/bin/start-all.sh'
Just put the above script it into /etc/profile.d/hadoop.sh
Stop Hadoop
Run stop-all.sh
If warning raised like "HADOOP_HOME is deprecated"
Go back to environment config hadoop-env.sh, add the new line
export PATH=$PATH:/usr/local/hadoop/bin
export HADOOP_HOME=/usr/local/hadoop
No comments:
Post a Comment