Tuesday, May 10, 2016

HDFS and MapReduce introduction

HDFS
Hadoop Distributed File System

namenode: Only one
datanode: many

namenode
has content of file name, directory, file property(time, replication,permission), and position of DataNode
  • receive user request
  • maintain file system directory structure
  • manage relationship between file and block, relationship between block and datanode

datanode
  • store file
  • file are separated to blocks(default 128M), storing on hard drive
  • also has checksum of the data
  • to ensure safety, file will have multiple copies

Secondary NameNode

  • monitor HDFS status, background assistant program
  • grap snapshot of HDFS periodically




MapReduce

JobTracker: only one

  • receive user job request
  • allocate job to TaskTrackers
  • Monitor TaskTracker running status

TaskTracker: many

  • Run job which assigned from JobTracker

No comments:

Post a Comment