java - Hadoop mapreduce with input size ~ 2Mb slow -


I tried to distribute a calculation using thalop.

I am using sequence input and output files, and custom rightlists.

There is a list of the input triangle, the maximum size can be even around 2 MB but 50kb is intermediate values ​​and the output is a map (int, double) in custom written. Is this a hindrance?

The problem is that it is much slower than the version without computing. In addition to this, increasing the nodes from 2 to 10 does not speed up the process.

One possibility is that I do not get enough mapers due to the small input size. I experimented with changing the code mapreduce.input.fileinputformat.split.maxsize , but it was damaged, was not better.

I am using Hyp 2.2.0 locally, and elastic mappadues in Amazon.

Did I overlook something? Or is it just a work that should be done without any work?

Thank you. The public Zero map (IntWritable triangleIndex, triangle triangle, context reference) throws IOException, is interrupted Exception {StationWritable [] station = kernel.newton (triangle .getPoints ()); If (station! = Null) {for (stationways station: station) {reference.write (new intWritable (station.getId ()), station); }}} Square triangle applies fair rules {private last float [] points = new float [9]; @ Override throws the Public Wide List (DataAutput D) IOException {for (ii = 0; i & lt; 9; i ++) {d.writeFloat (number [i]); }} @ Override throws IOException to Public Wide Readfield (Data Input D) {for (int i = 0; i

If processing is really complex, then you should know Hadoop One advantage with using small files is the common problem with small files that Hadop will run a single Java process per file and it will create overhead to start several processes and slow down the output. In your case it seems like it applies. It is more likely that you have the opposite problem that only one mapper is trying to take action on your input and it does not matter how big your cluster is at that time using the input split looks like the right perspective, but because Your use case is very specific and very different from the ideal, you may need to zoom many components to get the best performance.

So you are able to benefit from decreasing the horoscope map, but it will probably take significant tuning and custom input handling.

He said that rarely (never?) Will be displayed faster than a purpose-built solution. It is a common tool that is useful in which it can be used to distribute and solve many different problems, without writing purpose-built solutions for each.


Comments