Avro is my current project. It’s a slightly different take on data serialization.
Most data serialization systems, like Thrift and Protocol Buffers, rely on code generation, which can be awkward with dynamic languages and datasets. For example, many folks write MapReduce programs in languages like Pig and Python, and generate datasets whose schema is determined by the script that generates them. One of the goals for Avro is to permit such applications to achieve high performance without forcing them to run external compilers.
A few early Avro benchmarks are now in. A month ago, Johan Oskarsson (of Last.fm) ran his serialization size benchmark using Avro. And today, Sharad Agarwal (my Avro collaborator) ran an existing java serialization benchmark using Avro, and the initial results look decent. Curiously, Avro’s generic (no code generation) and specific (generated classes) APIs diverged significantly and unexpectedly despite sharing much of their implementation. This suggests that both might be easily improved.
May 12, 2009 at 12:18 pm |
Is a benchark more like a loanshark or an aardvark?
May 12, 2009 at 12:27 pm |
Typo fixed. Thanks, Anne!
June 23, 2009 at 4:08 am |
[...] Protocol Buffers, Thrift, Avro, more traditional: Hessian, Java serialization, early benchmarks [...]