Yushan Lu's Blog: HDFS and MapReduce introduction

Tuesday, May 10, 2016

HDFS and MapReduce introduction

HDFS
Hadoop Distributed File System

namenode: Only one
datanode: many

namenode
has content of file name, directory, file property(time, replication,permission), and position of DataNode

receive user request
maintain file system directory structure
manage relationship between file and block, relationship between block and datanode

datanode

store file
file are separated to blocks(default 128M), storing on hard drive
also has checksum of the data
to ensure safety, file will have multiple copies

Secondary NameNode

monitor HDFS status, background assistant program
grap snapshot of HDFS periodically

MapReduce

JobTracker: only one

receive user job request
allocate job to TaskTrackers
Monitor TaskTracker running status

TaskTracker: many

Run job which assigned from JobTracker

Yushan Lu's Blog

Navigation

Tuesday, May 10, 2016

HDFS and MapReduce introduction

No comments:

Post a Comment