hadoop - Measure throughput at datanode -


i want measure throughput @ each datanode measuring time taken each read/write operation. confusing read through million functions , find out happening. list series of calls made while reading/writing block of data? using version 1.0.1. alternatively, if there api measures @ datanode use information.

the important classes study measure throughput fsdataoutputstream writes , fsdatainputstream reads.

file read: first thing node when reading file call open() on filesystem object. @ point, know node begin reading shortly , can place code after call returns prepare measurements. calling open() on hdfs instantiates distributedfilesystem communicates namenode collect block locations (sorted according calling node proximity). finally, distributedfilesystem object returns fsdatainputstream ("sees" reading file) in turn wraps dfsinputstream ("sees" reading blocks, handles failure). measurements scoped within read() , close() call on fsdatainputstream.

file write: node call create() on filesystem. various checks made @ point encompass file permissions, availability etc, upon successful completion returns fsdataoutputstream object wraps dfsoutputstream. same concept applies 1 sees continuous write other handles coherency of replication factor (i.e. 1 write = 3 writes) , failure. read, measurements scoped within write() , close() call on fsdatainputstream.

in order globally nodes in cluster, need override these methods part of distribution of hadoop share in cluster.


Comments