Thursday, 7 October 2010

Getting started with Protocol Buffers – and introducing Protocol Buffers Workbench

So I’ve fallen out of love with Xml, and decided that Google Protocol Buffers is a better serialization format for a key (and voluminous) data structure in our application (as has Twitter, incidentally). For various reasons, I decided to write my own Reader and Writer rather than go with the existing implementations. The first thing I needed to do then, was to understand how the Protocol Buffers encoding works.

Google have published some very good documentation, both for the message definition language, and the encoding, but nothing beats seeing it in action. For that there’s a Protocol Buffers compiler, protoc.exe which, amongst other things, will handle encoding and decoding messages (get it from the Protocol Buffers website - choose “Protocol Buffers compiler – Windows Binary”). The only thing is, protoc.exe is a command-line tool, and not very friendly if you’re just wanting to fiddle around. So I’ve summoned all my WPF powers, and created a graphical wrapper around it. Allow me to introduce Protocol Buffers Workbench:

image

In brief

Using the Workbench is easy:

  1. Install it using the ClickOnce installer.
  2. Put your message definition in the first column, and, from the drop-down pick the message that will be at the root of your document
  3. Enter your message in the second column, then hit the encode button (“>”). The encoded message is shown in the third column.

Or, if you have an encoded message, you can paste it into the third column and hit decode (“<”) to have the message translated into text format. The Paste button above the third column is particularly useful if you want to decode protocol buffers messages stored in binary columns in Sql Server. If you view such columns in Sql Management Studio they are displayed like “0x0A1B2C4D…”: the Paste button will recognise text in the clipboard in that format and paste it as binary data.

If you’re wondering about the “C#” toggle beneath the binary editor: that will encode the binary data as a C# byte array – particularly useful if you’re wanting to seed your unit tests with authentic protocol buffer bytes!

And that’s it. Unless you want to know more about defining message types, and writing messages in text message format – in which case, read on.

Message Types

The first thing to understand are the message types. Message types are a bit like an Xml Schema. But whereas you can understand Xml just fine without a Schema, having the message types to hand is essential when parsing a Protocol Buffers message. That's because, in a bid to squeeze the output as small as possible, all the wordy field identifiers and even field type information is stripped out. All that's left for each piece of data is a field number, and a few bits telling you how big it is.

Message types look like this:

enum Binding { 
   PAPERBACK = 1;
   HARDBACK = 2;
} 
 
message Author { 
   required string name = 1; 
   optional string location = 2; 
} 
 
message Book { 
   required string title = 1; 
   required Binding binding = 2; 
   repeated Author author = 3; 
   optional int32 number_of_pages = 4; 
   optional double weight = 5; 
}

So for each field in a message, you define if it must be present ("required"), it may be present ("optional"), or if it can appear any number of times, including never ("repeated"). You then give its type - and you have a range to choose from, including enums and your own custom message types. Finally, you assign it a field number: this is the critical part, as this is all that will distinguish between the different pieces of data in the resulting messages.

Messages in Text Message Format

It took me a bit of puzzling to work out how to write messages in text format. For some reason, though Google have defined a text format for Protocol Buffer Messages, they don't appear to have documented it any where.

Here's an example, using the message types defined above:

title: "The Holy Bible"
binding: HARDBACK
author: {
  name: "Moses"
  location: "Sinai Peninsula"
}
author: {
   name: "Paul"
   location: "Tarsus"
}
author: {
   name: "Luke"
}
number_of_pages: 1723
weight: 0.225

As you can see, it has some similarities to JSON, but not many. Field names aren’t enclosed in strings, for example, and they aren’t terminated by commas.

5 comments:

Anonymous said...

>> For various reasons, I decided to write my own Reader and Writer rather than go with the existing implementations

a glimpse into Sam's mind...
(StackTrace)
...
if((CodeReuseReason == DeveloperMindState.Boring) || (!LicenseAcceptable))

...

Daniel15 said...

I'm surprised there's only one comment on this post. This is great! Thanks :)

Thirdvalve said...

Any code samples that illustrate reading and writing messages in text format?  This is neglected in Google's documentation.

noname said...

It is no where mentioned, how to do RPC with protocol buffers?
There are other projects for this work, which I don't want.
And it's own rpc is perhaps depricated as they write on their site.
I don't want to write lot's of code to create socket, connect etc etc.
Any Idea how to do that?

Samuel Jack said...

Sorry, it's been a while since I've done anything with Protocol buffers, so I'm not sure what's out there to help you best.

Post a Comment