Description Language

The key to making Signpost work is the description language. Effectively we are removing the variability of keyword searching by providing a behavioral representation of a bug in a way that we can match against at run time. Its all done using events and properties.

There are two sections on this page, the syntax of the description language, and examples/justification of its use

Syntax

A bug is a sequence of events. At the moment, Signpost only has the capability of using function calls as events but that's more to do with the runtime monitoring that the actual language. As we shall see, its really easy to add additional events into the description language

Events are described by properties. For a function, there are three top-level properties.

Each property can also have sub-properties. For the function event, the property/sub-property tree looks like...

 

We write these events and properties by using textual tag. The syntax is very straightforward

Property/Sub-Property Syntax
Function FN
This Pointer THISPTR
Parameters PARAM
Return RET
Parent P
Member M
Value V
Type T

Some properties may have values associated with them (e.g. function/parameter names). Simple strings are enclosed in square brackets for exact matches or expressions can be used for matching in certain tags.

Example Meaning
FN[bar] or
FN[foo::bar]
The function’s name, including class if applicable.
PARAM[flags] The parameters name
P[class person] The type of the parent class
M[name] The name of the member variable
V[1] The explicit value of a parameter, return or member
V[<=123]

A value less than or equal to 1234 of a parameter, return or member

V[C"abc" ] A substring of "abc" in value of a parameter, return or member
V[#8] Compares the value of a parameter, return or member to the bit mask of 8
T[int] The type. A simple string that can be used as an additional matching characteristic

Finally, connectors are used to string events and properties together

Connector Syntax Example of use Meaning
Followed by -> FN[foo]->FN[bar] Function foo followed by function bar
With => FN[foo]=>PARAM[x] Function foo with a parameter named x
Has >> FN[foo]=>PARAM[x]>>V[2] Function foo’s parameter x has a value of 2
And && FN[foo]=>(PARAM[x]>>V[2] && PARAM[y]>>V[3]) Functions foo’s parameter x has a value of 2 and parameter y has a value of 3
Or || FN[foo]=>PARAM[x]>>(V[2] || V[3]) Functions foo’s parameter x can have a value of 2 or 3
Not ! FN[doe]->!(FN[ray])->FN[me] Function doe followed by function me before function ray

This all build up into quite a descriptive language for such a small amount of tokens.

Examples and justification

The number one comment regarding the description language is how useful can it be? In particular, is is easy to write and does it cover the types of bugs in the knowledge bases. There are a few deficiencies in the language that I'll address at the end of this section but first I'll show a number of examples in encoding a knowledge base articles behavior and some information on coverage. Each of the following are full images of knowledge base articles pulled from Microsoft's MSDN with relevant information highlighted, explained, and the full description shown in the lower gray box. Once again, click the images to make them big.




As can be seen, encoding articles is quite straightforward. This was by design as the intention is for helpdesk staff or other programmers to quickly be able to describe bugs in an intuitive way and not resorting to an overly restrictive/descriptive syntax. Using the sample of knowledge base articles (see kb survey) we attempted to encode a random 10% of the articles, noting the incompatibilities. Both Win32 and MFCwere attempted to try and get a feel for the difference between procedural and object-oriented code.

Win32 Article Sample (32 articles) %
Encoded OK 47
Environment defect (e.g. IDE, debugger, etc) 3
Incomplete article details 3
Not enough detail in runtime data to match against 3
Need to count string length 3
Need to use regex 3
Need to store a value 9
Need to use thread information 3
Misfiled article (i.e. should be INFO or DOC article) 25


MFC Article Sample (59 articles) %
Encoded OK 34
Environment defect (e.g. IDE, debugger, etc) 10
Compiler/linker defect 10
Not enough detail in runtime data to match against 7
Missing code (incomplete behavior, code has to be added) 10
Need to count string length 2
Need to use regex 7
Need to store a value 8
Need to trap assignment 2
Need to use “function calls function” 5
Need to ensure object reference 5
Misfiled article (i.e. should be INFO or DOC article) 2

At first glance we have an appalling rate of encoding. However, if we remove misfiled articles, environment defects and bugs to do with the compiler (all because these types of bugs do not produce runtime code) a more respectable 62% of Win32 articles and 61% of MFC articles could be effectively encoded and successfully matched against.

These figures were derived using strict rules of encoding - if a full description could not be written and successfully used to match against the article's bug, it was categorized as needing a particular language extension. My experience however suggest that some of these articles can be successfully matched against using an incomplete or an overly complex description (e.g multiple "OR" connectors). Future versions of the description language should be able to utilize regex's in the description as well as store values and have syntax to describe the operating system, modules loaded and their version. This should effectively improve the descriptive capabilities to over 70-80% as well as remove possibilities of false positives, but further work and studies are needed.

 

 

[Back to top]