in hadoop mapreduce programming model; when processing files mandatory keep files in hdfs file system or can keep files in other file system's , still have benefit of mapreduce programming model ?
mappers read input data implementation of inputformat
. implementations descend fileinputformat
, reads data local machine or hdfs. (by default, data read hdfs , results of mapreduce job stored in hdfs well.) can write custom inputformat
, when want data read alternative data source, not being hdfs.
tableinputformat
read data records directly hbase , dbinputformat
access data relational databases. imagine system data streamed each machine on network on particular port; inputformat
reads data port , parses individual records mapping.
however, in case, have data in ext4-filesystem on single or multiple servers. in order conveniently access data within hadoop you'd have copy hdfs first. way benefit data locality, when file chunks processed in parallel.
i suggest reading tutorial yahoo! on topic detailed information. collecting log files mapreduce processing take @ flume.
Comments
Post a Comment