Sunday, May 8, 2016

HADOOP src code

Import hadoop src to eclipse
Go to hadoop src folder, enter hadoop-maven-plugins/
cd hadoop-maven-plugins/
mvn install
Then go back to src folder
mvn eclipse:eclipse -DskipTests
Take a while to download repositories



/**********************************************************
 * NameNode serves as both directory namespace manager and
 * "inode table" for the Hadoop DFS.  There is a single NameNode
 * running in any DFS deployment.  (Well, except when there
 * is a second backup/failover NameNode.)
 *
 * The NameNode controls two critical tables:
 *   1)  filename->blocksequence (namespace)
 *   2)  block->machinelist ("inodes")
 *
 * The first table is stored on disk and is very precious.
 * The second table is rebuilt every time the NameNode comes
 * up.
 *
 * 'NameNode' refers to both this class as well as the 'NameNode server'.
 * The 'FSNamesystem' class actually performs most of the filesystem
 * management.  The majority of the 'NameNode' class itself is concerned
 * with exposing the IPC interface and the http server to the outside world,
 * plus some configuration management.
 *
 * NameNode implements the ClientProtocol interface, which allows
 * clients to ask for DFS services.  ClientProtocol is not
 * designed for direct use by authors of DFS client code.  End-users
 * should instead use the org.apache.nutch.hadoop.fs.FileSystem class.
 *
 * NameNode also implements the DatanodeProtocol interface, used by
 * DataNode programs that actually store DFS data blocks.  These
 * methods are invoked repeatedly and automatically by all the
 * DataNodes in a DFS deployment.
 *
 * NameNode also implements the NamenodeProtocol interface, used by
 * secondary namenodes or rebalancing processes to get partial namenode's
 * state, for example partial blocksMap etc.
 **********************************************************/

Everything we need to use is just FileSystem, which is Implements by DistributedFileSystem.

The whole communication between NameNode and DataNode is through RPC, so all functions for user is looks like everything runs locally. But it actually done by Proxy.


ClientProtocol and DatanodeProtocol



public interface ClientProtocol extends VersionedProtocol {

// create rpc server


    this.server = RPC.getServer(this, socAddr.getHostName(),
        socAddr.getPort(), handlerCount, false, conf, namesystem
        .getDelegationTokenSecretManager());



//also starts a web server
startHttpServer(conf);

httpServer = new HttpServer("hdfs", infoHost, infoPort,
              infoPort == 0, conf,
              SecurityUtil.getAdminAcls(conf, DFSConfigKeys.DFS_ADMIN))


Communications:


/**********************************************************************
 * Protocol that a DFS datanode uses to communicate with the NameNode.
 * It's used to upload current load information and block reports.
 *
 * The only way a NameNode can communicate with a DataNode is by
 * returning values from these functions.
 *
 **********************************************************************/
@KerberosInfo(
    serverPrincipal = DFSConfigKeys.DFS_NAMENODE_USER_NAME_KEY,
    clientPrincipal = DFSConfigKeys.DFS_DATANODE_USER_NAME_KEY)
public interface DatanodeProtocol extends VersionedProtocol {


  /**
   * sendHeartbeat() tells the NameNode that the DataNode is still
   * alive and well.  Includes some status info, too.
   * It also gives the NameNode a chance to return
   * an array of "DatanodeCommand" objects.
   * A DatanodeCommand tells the DataNode to invalidate local block(s), 
   * or to copy them to other DataNodes, etc.
   */
DataNode sendHeartBeat() to NameNode, NameNode returns DatanodeCommand[] so DataNode can do job by following the returning command.
























No comments:

Post a Comment