java - Custom RecordReader initialize not called -


I recently started messing with Hadop and prepared my own input to handle PDF is.

For some reasons, the method of starting it in my custom record reader class is not preliminary. (Checked it with a sysout because I did not have a debugging environment)

I am running the hauch 2.2.0 on Windows 7 32bit. Calling me with thread jars, as hoop jars are bogs under windows ...

  import ... public category PDFInputFormat file expands input format & lt; Text, text & gt; {@ Override Public RecordRider & lt; Text, text & gt; GetRecordReader (InputSplit arg0, JobConf arg1, Reporter arg2) throws IOException {Return new PDFRecordReader}} Public static class PDFRecordReader implements RecordReader & lt; Text, text & gt; {In the private FSDataInputStream file; Public string filename = zero; HashSet & LT; String & gt; Hasset = new hashset & lt; String & gt; (); Private lessons key = null; Private text value = zero; Private byte [] output = null; Private Entity = 0; @ Override public text texture () {int endpos = -1; For (Inti = Status; I & lt; Output.lampper; I ++) {if (output [i] == (byte) '\ n') {endpos = i; }} If (Endos == -1) {return new text (Arrays.copyOfRange (output, position, output length)); } Return new text (Arrays.copyOfRange (Output, Status, Endo)); } @ Override Public Null Start (InputSplit genericSplit, TaskAttemptContext job) throws IOException, interrupted; Exception {System.out.println (called "initialization"); FileSplit split = (FileSplit) genericSplit; Configuration conf = job.getConfiguration (); Path file = split.getPath (); File system fs = file.getFileSystem (conf); FileIn = fs.open (split.getPath ()); FileName = split.getPath () getName () toString () .. Println (fileIn.toString ()); PDDocument docum = PDDocument.load (fileIn); BytereonOperaproduction stream boss = new byteOutStream (); OutputStream Vector Ov = New OutputStream Water (Boss); PDFTextStripper Stripe = new PDFTextStripper (); Stripper Written text (document, ow); Ow.flush (); Output = boss To Batteryre (); As I had understood last night and I can help with someone else. This: 

RecordReader Hadoop has a deprecated interface and does not actually have any initial method, which states why it is not automatically asked.

Expand the Record Reader Class in Hadeep.com.Mpread lets you extend the initialization method of that class.


Comments