Friday, October 31, 2008

PDC2008.Last;

I reported on the first half of PDC2008 the other day. Today I reflect on the last half of the conference.

Wednesday's keynote reviewed work being done by Microsoft Research, a division that's involved in pure research. They're an impressive group. Started in 1991, they have grown to over 850 PhD researchers in six locations around the world. Each summer they have 1000 graduate student interns working on projects. 25% of Computer Science PhD graduates in the United States have worked for Microsoft Research by the time they finish their degree. 15% of the division's budget is granted to universities. 30% of papers read at peer-reviewed conferences are submitted by Microsoft Research fellows.

One of the most interesting items they presented was a highly modified surface device. (A surface device is a multi-touch sensitive table top display.) This particular device was configured such that if you put another semi-opaque surface over it, something like a piece of tissue paper or frosted plastic, then a different image would be seen on the secondary screen. An example was a picture of an animal on the primary display. Hold a piece of paper over it and text about the animal was seen. Pretty cool stuff!

Another really cool piece of software technology presented in one of the sessions on Wednesday was the Concurrency Analysis Platform, or CAP for short. This is a low-level library used during the testing cycle that inserts itself between the application and the operating environment and replaces the thread scheduler. There are implementations for both Win32 and managed code. This provides a means for testing tools be built that explicitly control the scheduler. Think about this for a second. The scheduler is no longer non-deterministically interrupting your code, but rather can be controlled in a deterministic manner. Very, very cool.

CHESS is an automated tool built on this framework that analyzes a project and finds the points where schedule interleaving may make a difference. Without this analysis, it would have to run each thread with each line interrupted by the scheduler. This problem's scale is nnk where n is the number of threads and k is the number of lines. With the double exponent, relatively small numbers cause this to explode to more permutations than there are estimated atoms in the universe. By analyzing the code, the problem space can be reduced to (n2*k)c*nn where c is a small number, like 2 or 3, and n and k are as defined previously. This becomes a much more tractable problem space.

Within this reduced space, the tool runs the code with each permutation of schedule interleaving to detect assertions, dead-locks, live-locks and data race conditions. When a failure is found, the problematic interleaving can be captured to directly, consistently reproduce the problem. Over my career, this could have saved me literally weeks of debugging time. During the session, they presented a couple real life case studies where this tool was used to find and fix very obscure, rare threading bugs in non-trivial code bases. They have also used it on several projects to find and fix bugs that have not yet been experienced. Obviously, using a tool like this throughout the development cycle has the potential to significantly increase the reliability of threaded applications.

Thursday marked the end of PDC2008. We had the same number of break-out sessions but ended early since there was no keynote speaker and only a short break for lunch. As I attended the sessions, it seemed like they had saved some of the best material for last.

First up are two new things coming out the research arm: CodeContracts and Pex. These two independent but complementary technologies are designed to improve code quality.

Inspired by Bertrand Meyer's Design by Contract work and rooted in the earlier Microsoft Research project Spec#, CodeContract is a .NET library to enforce method and class contracts. Calls to the CodeContract class are put at the beginning of methods to indicate requirements (pre-conditions) and expectations (post-conditions). As far as the build process is concerned, these become part of the method signature, so warnings and errors are emitted during build for cases where static analysis can detect calling code that violates the contract. Further, parts of classes can be marked as invariant, allowing the same checks on derived classes without re-declaring the constraints. Lastly, these same conditions can be placed on interfaces. This allows the checks to automatically be applied to all implementing classes without the class having to explicitly declare them.

Pex is a static code analyzer which works without CodeContracts but will use the additional information contained in them if they exist. It provides additional information about the warnings emitted by the CodeContract as well as the facility to automatically generate test cases based on method signatures and branches within the code that it has analyzed. Basically, it maximizes code coverage in the target class while minimizing the number of test cases needed for that coverage.

The last session I attended was a well presented introduction to the newish language F#. It is a functional language[1] in the same way C++ is object oriented. The purpose of C++ is to write object oriented code while not excluding procedural code. Similarly, F#'s purpose is to write functional code while not excluding an object oriented paradigm. F# has been around for a couple years as a research language and has matured to the point where it's going to become a fully supported language in the near future.

The presenter started with a common, simple math problem and implemented it in F# using a typical procedural approach. It was basically line for line how it'd be written in C# with F# syntax. He then rewrote it in a slightly more mathematical way and then again using a fully functional approach.

Following this he presented a more realistic problem of downloading a CSV file from Yahoo's financial web site, slicing the data apart, running it through a function and emitting some results. He leveraged existing classes of the .NET framework for the data gathering and used a bit of F# code to do the data manipulation and glue things together. The first implementation had a single thread of execution. He finished with a highly impressive finale: with very minor changes to about three lines of code, he transformed this into a multi-threaded app that gave the same results in significantly less time.

All in all, this was a great conference with a lot of really interesting information. Videos of the sessions are available at microsoftpdc.com/.

1. One of the major tenets of functional languages is side-effect free methods. This eliminates many dependencies on order of execution, in turn making threading a much easier problem.

2 comments:

Anonymous said...

A little rectification on Pex:

Pex is a runtime analysis tool. It does an analysis similar to automated white box testing, which is also called dynamic symbolic execution in the literature.

This has very important implication: Pex is a 0-noise tool, when it finds a bug, it has actually executed it.

Harley Pebley said...

Peli,

Yes, thanks for the clarification.

Pex uses static analysis to compute optimal input parameters to maximize code coverage and then actually calls the methods with those parameters, reporting on errors. Like the documentation says "Don't run Pex on code that launches real missiles."