mapreduce - Hadoop - Multiple Files from Record Reader to Map Function -


I have implemented a custom file interface mix to create a split map work created by a group of files. I created a solution that breaks through every record through the record reader and everything is fine.

  increases the public class MultiImagesRecordReader RecordReader & lt; Text [], now I'm trying to complete the complete set of files to record files.  

BytesWritable []> {Private Long Start = 0; Private long end = 0; Private int paused = 0; Private bytes [] value; Private text key []; Private joinFileSpplit split; Configure private configuration; Private filesystem FS; Private stable boolean record reed; Public MultiImagesRecordReader (CombineFileSplit split, functional reference reference, integer index) throws IOException {this.split = split; This.conf = context.get configuration (); } @ Override Public Null Start (InputSplit genericSplit, TaskAttemptContext context) throws IOException, interrupted; Exception {start = split.getOffset (0); End = start + split.getLength (); RecordsRead = false; Start this.pos = (int); Fs = FileSystem.get (conf); Value = new bytesrft [split.gatenmath ()]; Key = new text [split.getNumPaths ()]; } @ Override public boolean throws next (value) IOException, interrupted step exception {ifReadRead == true} {System.out.println ("Sono nel next true" + InetAddress.getLocalHost ()); Description is false;} And {recordsRead = true; System.out.println ("Sono Nell next false" + InetAddress.getLocalHost ()); for (int i = 0; i

With this code it happens that the map function correctly Receives the vector of keys and values ​​but again and again I mean, that the map ceremony was called once, instead it is sometimes called what I am doing wrong?

I think you know map () Mapper each record Of Will be called that your reader is given from currentKey () , currentValue () until all key values ​​are given to the combinations in the split Finished I understand that your map function is called repeatedly for the same key value pair (which is called once for a single key value pair). This means that your record reader repeatedly reads the same record (Key Value Duo) and I have also implemented custom combine file input formats and record readers. You can see their general form and implementation within a single project


Comments