java - Improving performance of protocol buffers -


I am writing an application that needs to remove millions of messages from a file as soon as possible.

It is compulsory to apply to get a message from the file, do some work and then delete the message. Each message is made up of ~ 100 fields (all of them are not always parsed but I Everyone is required because the user of the application can decide which field he wants to work on).

At this time the application is in a loop, which executes only one run using the readDelimitedFrom () call.

There is a way to improve this issue to improve the problem (split in multiple files, etc ...). Apart from this, because of the number of messages and dimension of each message, I have to jip the file (and the price of the fields is quite effective in reducing the size since it has been quite repeated) - although it would be less If the CPU time is your obstacle (which is not possible if you are loading directly with HDD from the cold)

Cash, but there may be a case in other scenarios), then Here are some ways you can improve throughput:

  • If possible, use C ++ instead of Java and reuse the same message object. For recurrence it reduces the amount of time spent on memory management, because the same memory will be reused every time. Use this to read multiple messages in a single codedinput stream and so on:

      // Make it one time: coded InputStream CIS = CodedInputStream.New (Input ); // Then read each such message: int limit = cis.pushLimit (cis.readRawVarint32 ()); Builder.mergeFrom (cis); Cis.popLimit (range); Cis.resetSizeCounter ();  

    (A similar approach works in C ++.)

  • Use gzip instead of SPP or LJ4 compression The algorithms still achieve the correct compression ratio but are optimized for speed (LZ4 is probably better, though the balloon was developed by Google in protobobs of mind, so you want to test both on your data set.)

  • To use protocol instead of buffer Make and expensive. Unfortunately, there is no Java version yet, but edit : and implementation in many other languages ​​also supports languages, it looks a bit faster in it. (Appearance: I am the author of Captain Proto. I am the author of Protocol Buffers v2, which is a version published by Google's Open Source.)


Comments