Friday 20 November 2009

Highlights of ‘Microsoft Project Code Name “M”: The Data and Modeling Language’

I’m having great fun watching the Microsoft PDC 2009 session videos, and blogging the highlights for future reference. In case you want to jump into the video, I’ve included some time-checks in brackets (0:00). Read the rest of the series here.

Note that there is currently a problem with the encoding of the video linked to below which means you can’t seek to an arbitrary offset. Hopefully that will be fixed soon.

In a 45 minute lunch-time talk, Don Box (once he’d finished munching his lunch) and Jeff Pinkston gave an overview of the “M” language (quotes, or air-quotes if you’re talking, are apparently very important to the PDC presenters invoking the name “M”, because it is only a code-name, and the lawyers get angsty if people aren’t reminded of that).

“M” is a language for data (1:26). Since last year, Microsoft have combined the three components MGraph, MGrammar and MSchema into this one language “M” that can be used for data interchange, data parsing and data schema definition. It will be released under the Open Specification Promise (OSP) which effectively means that anybody can implement it freely.

As Don said, “M lives at the intersection of text avenue and data street” (2:56). This means that not only is M a language for working with data, but M itself can be represented in text or in the database.

At PDC last year, Microsoft demonstrated “M” being used as an abstraction layer above T-SQL, and also as a language for defining Grammar specifications. This year, they are introducing “M” as a way of writing Entity Framework data models (4:00).

Getting into the detail, Don started by describing how M can be used to write parsers: functions that turn text into structured data (5:02). You use M to define your language in terms of Token rules and Syntax rules (6:02): Token rules describe a patterns that sequences of characters should match, and Syntaxes are sequences of tokens. Apparently there are a few innovations in the “M” language: it uses a GLR parser and it can have embedded token sets to allow, for example, XML and JSON to be used in the same language (6:40) [sorry I can’t make it clearer than that - Don rushed through this part].

Then came a demo (8:15) in which Don and Jeff showed how a language can be interactively defined in Intellipad, with Intellipad even doing real-time syntax checking on any sample input text you provide it with.

Intellipad example

Notice from the screenshot how you can define pattern variables in your syntaxes (and also in tokens) and then use these in expressions to structure the output of your language: in syntax Tweet for example, the token matching HashTag is assigned to variable h, and then the output of the syntax (the part following the =>)  is output under the name “Hash”. Integration with Visual studio has also been improved to make it easy to load a language, use it to parse some input, and then work with the structured output in your code.

Next Don talked a bit about typing in “M” (22:10). “M” uses structural typing. Think of this as being a bit like duck typing: if two types have members with the same name and same types then they are equivalent. M has a number of intrinsic types (logical, number, text, etc.) and a number of “data compositors” – ways of combining types together – like collections, sequences and records.

Don followed this up with a demo of “Quadrant” (there go the air-quotes again) (25:36), showing how this can be used to deploy schemas defined in “M” to the database. “M” is tightly integrated into “Quadrant: you can type in “M” expressions and it will execute them directly against your database, showing the results in pretty tables (34:50).

Don finished off by talking about how M can be used for defining EDM models (34:19): many people prefer describing models in text rather than in a designer, especially since performance in the EDM designer suffers with large models.

Scott Hanselman interviews James Bach

To enliven my journey to work this morning I listened to Scott Hanselman’s interview with James Bach, an international consultant on software testing. They were talking about James' new book Voyage of a Buccaneer-Scholar.

James has led an interesting life. He was kicked out of home when he was 14, so moved into a motel room with his computer. There he taught himself Assembly language programming. When he was 16, he dropped out of high school and started a career as a video game programmer.

Listen to the interview to find out how he made the journey to tester, speaker, writer, and proponent of Exploratory testing.

Thursday 19 November 2009

Highlights of “Data Programming and Modeling for the Microsoft .NET Developer”

I’m having great fun watching the Microsoft PDC 2009 session videos, and blogging the highlights for future reference. In case you want to jump into the video, I’ve included some time-checks in brackets (0:00). Read the rest of the series here.

Don Box and Chris Anderson gave a very watchable presentation, Data Programming and Modeling for the Microsoft .NET Developer. This is an overview of how we .Net developers have done data access in the past, and how we will be doing it the future.

Chris Anderson started with a reminder of the dark ages of data access in .Net, SqlConnection and IDataReader (3:40). Then he showed how an Entity Framework data model could be layered on top of the database. Entity Framework provides something called an EntityConnection which works like SqlConnection, but in terms of the entities in your model, not data base tables. You can write queries in something called EntitySql, which allows you to “.” your way though object relationships without using joins (see 7:48). Most often though, Entities are accessed using the ObjectContext, which gives strongly-typed access to the entities and permits LINQ queries over them (11:18).

Attention then turned to the way we define our databases and models. Traditionally we would start with the database, and build an Entity Framework model on top of it. As from Entity Framework v4 we will be able to define the model first, and have Entity Framework generate the database from it (14:20). But we can go further than this. Using a CTP of an API that will be released after .Net 4, it’s possible to define a model entirely in code using Plain-Old-CLR-Objects (POCO), and then generate a database from this (19:44). But which approach is best. Chris provided this helpful slide:

WhichApproachToModelling

Don Box then took over (33:40) to talk about the OData Protocol. This is a new name for the protocol used by ADO.Net Data Services (formerly known as Astoria). It is based on the Atom publishing format and it provides Rest-based access to data. As well as supporting querying (sorting, filtering, projection, etc.) it also supports updates to the data.

OData picture

Don demoed how Sharepoint 2010 supports this format (37:35). He showed how it makes use of the Entity Data model to provide meta-data about the structure of the data (39:00). Excel 2010 has support for querying data in this format (39:40). Naturally .Net applications can query this kind of data (40:45), but there is also an API that makes it easy to write services that provide data in this format (45:00). According to Don, "OData is the new ODBC”!

In the last ten minutes, Don talked about the connection that all this has with the “M” language – how M can be used to create the Entity Model for example.

Future Directions for C# and Visual Basic

I’m having great fun watching the Microsoft PDC 2009 session videos, and blogging the highlights for future reference. In case you want to jump into the video, I’ve included some time-checks in brackets (0:00). Read the rest of the series here.

Luca Bolognese opened his session Future Directions for C# and Visual Basic by announcing that the strategy for future development of C# and Visual Basic is one of co-evolution. As he said, the previous strategy where each language was independent, and one would get new features that the other didn’t was “maximising unsatisfaction”. Now there is one compiler team responsible for both languages, and any big new features added to one will get added to the other. VS 2010 already brings the two languages much closer together, and this will increase in the future.

In the first half of the presentation, Luca talked about the three big trends in current languages: Declarative, Dynamic and Concurrent.LanguageTrendsIn a demo (starting at 6:20 in the video) Luca created a CSV file parser. He showed (12:08) how writing the code in declarative style (using LINQ) not only makes it easier to read, it also makes it easier to parallelize. As simple as adding AsParallel() to the middle of the query in fact (15:18). The Task Parallel Library (part of .Net 4) makes it possible to parallelize imperative code (for loops, etc.) but with much greater potential for bugs like unsynchronized collection access (16:20). Luca then went on to demonstrate the dynamic language features in C# 4, and the DynamicObject base class (24:38).

Then he turned to the future, but not without a disclaimer that there were no promises of anything he talked about actually shipping. It seems, however, that Microsoft are pretty firmly committed to the first item he mentioned: rewriting of the C# compiler in C# and the VB.Net compiler in VB.Net, and opening up the black box so that we can make use of the lexer, parser, code generator, etc. From what he said later on, I gather that most of the team are currently involved in completing this work.

Luca demonstrated (36:00) how in just 100 lines of code he could use the new APIs to create a refactoring that would re-order the parameters of a method (and take care of the call site). Previously, anyone wanting to do something like this would have first needed to write a lexer and a parser, but that would be provided for us.

RefactoringExample

The last demonstration was of something Luca called “resumable methods”. These appear to be something akin to async workflows in F#. By prefixing an expression with the “yield” statement, Luca indicated to the compiler that he wanted the method to be called asynchronously (50:30). The compiler then takes care of rewriting the code so that execution resumes at the next statement once the asynchronous execution of the first statement completes. The benefit of this is that the thread can be used for something else meanwhile. By getting the compiler to do the rewriting we can avoid a whole lot of ugly code (see the video at 46:24). ResumableMethodsImplementationOne other thing that Luca mentioned is being considered by the team is support for immutability (52:41) . He said that they had considered 4 or 5 different designs but hadn’t yet settled on one that was exactly right. Part of the problem is that so much of the language is affected: not just the types themselves, but parameters, fields, etc.

If you want more on this, read Sasha Goldstein’s account of Luca’s talk.

Wednesday 18 November 2009

PDC Day 2 Keynote round-up

Reading the Twitter stream before the Keynote of PDC Day 2, the general consensus was that they keynote of Day 1 was rather dull. Much like last year, it seems that Day 1 was for the Suits, whereas Day 2 for the Geeks.

And were the geeks thrilled today. Steven Sinofsky, after demonstrating a whole lot of luscious hardware (including a Server-replacement laptop, and another laptop so thin that it seemed to disappear when turned edge-on) announced that all fully-paid up attendees would be given a free Acer laptop. The Gu then went on to announce the Beta of Silverlight 4, and Kurt Del Bene flicked the switch on the Office 2010 Beta.

These announcements have been covered in detail elsewhere, so I’ll leave you with the best links I’ve found.

Thursday 12 November 2009

PDC 2009 @ Home

Can you believe it? A year flown by already since I jetted off to LA, reclined to the soothing sound of Scott Gu's Keynote, and all-round indulged in a general geek-out at Microsoft PDC 2008! And already Microsoft PDC 2009 is upon us.

Windows 7, which we gawped at for the first time twelve months ago, is now gracing our desktops; Windows Azure is almost ready to go live; and after a good 52 weeks of head-scratching developers are just beginning to work out what Oslo is and isn't good for.

What with our baby being born and Bankers busting our economy, I'm not going to be present in person at the PDC this year. But the Directors at Paragon have very kindly granted us all a couple of days to be present in spirit by taking advantage of the session videos that Microsoft generously post online within 24 hours of the event.

So which sessions will I be watching?

Top of my list are the PDC specialties, the sessions usually flagged up with the magic word "Future" in their title. What developer doesn't go to PDC with an ear greedy for announcements of shiny new features for their favourite platform?

The other thing PDC is famous for are “deep-dives”: talks by the architects and developers of the various technologies on show, laying bare the inner workings of their creations. This year I’ll probably focus on WPF and Silverlight.

As well as watching these sessions, I hope to find some time over the next week for blogging about my discoveries. So why don’t you follow along?

Monday 2 November 2009

A Fiddler plug-in for inspecting WCF Binary encoded messages

If ever you're needing to debug the interaction between a Web Service and its clients, Microsoft’s Fiddler is the tool to use - this includes WCF Services so long as they're using a HTTP transport. The only thing Fiddler won't do is decode messages that are sent using WCF's proprietary Binary encoding - until today, that is: at lunch time, I took advantage of Fiddler's neat extensibility mechanism and created a rough-and-ready Inspector that will translate binary messages from gobbledegook to plain xml for your debugging pleasure.

You can download the plug-in and source from MSDN Code Gallery. To use it, just drop the plug-in  in the Inspectors folder of your Fiddler installation. Once you've reloaded Fiddler, switch to the Inspectors tab and look for WCF Binary.WCFBinaryFiddlerPlugin

Implementation Notes

  • There’s a very helpful page on the Fiddler site which tells you how to build Inspectors in .Net.
  • Fiddler gives each Inspector the raw bytes of each message, and it can do with it what it likes. Here’s how I decode a WCF Binary encoded message:
using System;
using System.Runtime.Serialization;
using System.ServiceModel.Channels;

...

private static readonly BufferManager _bufferManager = BufferManager.CreateBufferManager(int.MaxValue, int.MaxValue);

...

private string GetWcfBinaryMessageAsText(byte[] encodedMessage)
{
    var bindingElement = new BinaryMessageEncodingBindingElement();
    var factory = bindingElement.CreateMessageEncoderFactory();
    var message = factory.Encoder.ReadMessage(new ArraySegment<byte>(encodedMessage), _bufferManager);
    return message.ToString();
}