Archive for 2009
Architecture Diagrams in Structure101 are mapped to the physical code by patterns associated with each cell in the diagram. This enables the visual specification of rules that can then be applied to a specific version of your code so that Structure101 can overlay any violations on the diagram and let you discover the offending code items.
When you let Structure101 create the diagrams for you, and stick to changes like adjusting the layering, expanding packages, or moving packages to new parents, etc., you can mostly leave it to Structure101 to handle the patterns. However, occasionally you may see some behavior that seems odd unless you understand how patterns interact. And of course you may want to take advantage of the more advanced capabilities.
A key aspect that is not obvious at first is the “most specific pattern” rule. This says that where a physical class maps to more than one cell, the Architecture Diagram associates the class with the cell that has the “most specific” pattern.
Here’s an example diagram which was automatically generated from part of the Findbugs code-base:

… and here is the same diagram showing the pattern associated with each cell:

If I have a code-base that happens to contain a class edu.umd.cs.findbugs.classfile.impl.ClassFactory, then that class matches the pattern associated with the cell called ClassFactory. However it also matches the patterns for both ancestor cells “impl” and “classfile”. Since the ClassFactory cell’s pattern is the most specific (no wildcards), that is the cell to which Structure101 associates our physical class. And since a class is associated with at most one cell, that class is not associated with either ancestor cell, even though it matches their (less specific) patterns.
As expected, here are the associated items that Structure101 reports for the ClassFactory cell:

… and the parent “impl” cell:

Now I delete the 3 inner cells so that the diagram looks like this (the patterns for the remaining cells are unaltered):

Since the cells with the more specific patterns are gone, the classes that were associated with them now match the next most specific pattern – the parent “impl” cell:

This can be puzzling – I click on the “expand” button, the child cells are added to the diagram (which I expect), but the list of items associated with the parent is suddenly empty (which I don’t).
Why would we do it this way? We figured it was the best way to be future proof (the wildcards handle any added/removed classes) while at the same time supporting refactoring (e.g. if I move a class to a new package, the most specific pattern goes with it, but all the other classes still match to the original parent).
“Go” is a new systems programming language created by Google. Syntax is based on C++, and it compiles (like greased lightning apparently) – even has a Printf()! But beyond trivial similarities it is a very different beast:
- Interfaces replace class inheritance, but unlike Java interfaces, no explicit reference to the interface is required – as long as a type provides the methods named in the interface, it implements the interface.
- Garbage collection.
- Arrays are first class citizens – you don’t need to (can’t) use pointers to access the elements – instead you use [], and “slices”.
- A “slice” is a section of an array and behaves very much like an array – you can create a slice of a slice.
- Strings are first class and immutable, so memory allocation is not an issue.
- “Maps” are hash tables and are also first class.
- Threading is part of the language (no RTOS needed). “Goroutines” run in parallel and communicate via “channels”.
- No header files (that would have ruled it out of my book). “Packages” are used to group and reference (“import”) stuff.
- Reflection.
- Type conversion is explicit, no function overloading (that’s an odd one – presume it’s to speed up compile time – expect a lot of awkward function names), no user-defined operators.
Google reckon it’s “fun” to use. Compared to C++ that’s presumably a no-brainer – I’d say Java programmers would probably take to it as easily. Assuming it does what it says on the can.
From a dependency management point of view, I don’t much like the implicit (read “hidden”) implementation of interfaces. But I like the absorbsion of concurrency into the language – should allow modelling of references that disappear into the OS in C++. But it’s not ready for primtime yet – it’ll be a while before we go building a Go backend for Structure101…
Herbert Simon’s parable of the watchmakers was constructed to convey his belief that complex systems will evolve from simple systems much more rapidly if there are stable intermediate forms present in that evolutionary process than if they are not present.
Arthur Koestler built on this in his 1967 book “The Ghost in the Machine“, in the process coining the term holon to denote something that is simultaneously a whole or a part, depending on how you look at it. Here Mark Edwards explaining this duality:
Every identifiable unit of organization, such as a single cell in an animal or a family unit in a society, comprises more basic units (mitochondria and nucleus, parents and siblings) while at the same time forming a part of a larger unit of organization (a muscle tissue and organ, community and society). A holon, as Koestler devised the term, is an identifiable part of a system that has a unique identity, yet is made up of sub-ordinate parts and in turn is part of a larger whole.
Importantly, Koestler further described holons as
… autonomous, self-reliant units that possess a degree of independence and handle contingencies without asking higher authorities for instructions. These holons are also simultaneously subject to control from one or more of these higher authorities. The first property ensures that holons are stable forms that are able to withstand disturbances, while the latter property signifies that they are intermediate forms, providing a context for the proper functionality for the larger whole. [Summary text from wikipedia]
Though the terminology is different, I am sure the key tenets of Koestler’s principles will resonate with most people in software. Certainly, the importance of meaningful wholes (within the context of a wider system) is well recognized, and reflected in established principles such as Single Responsibility and Reuse Release Equivalency. Similarly, most would agree that the ability to withstand disturbances is hugely desirable, though we generally talk about this in terms of agility (or its converse fragility). The one aspect that might jar a little is the reference to higher authorities – I’ll revisit this later in the context of Emergent Design.
Koestler also introduced the term holarchy to denote a hierarchy of holons. As I suggested in my previous post on this subject area, I rather feel that, mostly, today’s software thinking tends to buy Koestler’s notions on holons but fall down on holarchy. Specifically, we tend to pay little or no attention to the world of complexity between the low-level coding constructs (classes, methods) and the unit of deployment (jar, dll).
Just as one example of this, see Bob Martin’s Principles of OO Development. He describes five principles that apply to the class level, and six that operate at the unit of deployment. Nothing inbetween. Similarly, and related, there are lots and lots of (not necessarily very useful) metrics that measure aspects of classes and methods, but there is an almost complete vacuum at the (what Booch would have called) “class cluster” level. One of the very few exceptions to this is DMS and related stability metrics for Java packages (based on Martin’s Acyclic Dependencies Principle). However, and somewhat amusingly, it would seem that these metrics only came into being because of confusion over Martin’s use of the term “package” (apparently, he actually intended this to denote unit of deployment)…
The situation changes instantly if we embrace hierarchy, holarchy. I do not see this as anything particularly radical, rather just a generalizing of existing principles. However, the ramifications could be quite far reaching. In the next two posts, I will explain for example how holarchy opens the door to automated visualization and holistic measurement.
The parable of the two watchmakers was introduced by Nobel Prize winner Herbert Simon to describe the complex relationship of subsystems and their larger wholes.
There once were two watchmakers, named Hora and Tempus, who made very fine watches. The phones in their workshops rang frequently and new customers were constantly calling them. However, Hora prospered while Tempus became poorer and poorer. In the end, Tempus lost his shop. What was the reason behind this?
The watches consisted of about 1000 parts each. The watches that Tempus made were designed such that, when he had to put down a partly assembled watch, it immediately fell into pieces and had to be reassembled from the basic elements. Hora had designed his watches so that he could put together sub-assemblies of about ten components each, and each sub-assembly could be put down without falling apart. Ten of these sub-assemblies could be put together to make a larger sub-assembly, and ten of the larger sub-assemblies constituted the whole watch.
I am reasonably sure that most software people reading this little parable would be inclined to nod. For sure, modularity is and always has been a hugely desirable trait in our attempts at software development and design.
In fact, though, I would suggest that the overwhelming majority of software projects today follow the example of Tempus, who lost his shop, rather than Hora. Why?
Because, mostly, we only pay attention to aspects of modularity and component-ness at two levels of granularity: low-level code (classes, methods) at one end of the spectrum, and unit of deployment (jar, dll) at the other. Everything in between we tend to treat as a largely amorphous blob comprising hundreds or even thousands of interacting entities. Even in those case where we do have meaningful abstractions/layers between the low-level code and and the unit of deployment, these are generally invisible and unmeasured. In this context it is hardly surprising that they will tend to degrade over time.
Simon’s parable was one of the key drivers behind Koestler’s theory of holons and holarchy. I will follow up on this – and its (to my mind) huge relevance to software thinking today – in a future post.
Code is like traveling: the less baggage the better. No bags is bliss, a little backpack hardly noticeable. Chunky wheelie bag: bearable but irksome. But several chunky wheelie bags, and it starts to get … logistically challenging. Not to mention increased risk of hernia.
Often, of course, some amount of baggage is unavoidable. If you are embarking on an expedition to the North Pole, for instance, you would be well advised to take a decent supply of warm underwear.
Pretty much all code has baggage, but some code has more baggage than other code. Ask a developer what they would prefer to tackle – implement a little standalone utility, or write something that sits astride all the obscure notions and constructs emitted by others – and I’m pretty sure I know which one most would pick.

Successful baggage limitation
Of course, you can’t code up a (meaningful) system without some number of building blocks. So even in a perfectly architected and layered system, you inevitably accumulate some baggage as you move up the stack. The trick, though, is to try and minimize this (while also hiding off the details of the contents).
This is hardly new or revolutionary: much of software theory is specifically dedicated to strategies that help us to avoid excessive coupling and so promote modularity. That said, I do rather like the baggage metaphor and am inclined to see minimizing baggage as a primary goal, with e.g. re-usability a side-effect, rather than the other way around.
So what do I mean by baggage? Very informally, I’m thinking of this as “stuff you need to know about” to implement another bit of stuff. In this sense, baggage is a universal aspect of software development, entirely independent of e.g. programming language or framework.
Where a code-base is written in a strongly typed language like Java, it is relatively easy for static analysis to detect most of the baggage automatically, for instance A carries B baggage because class A extends class B and/or method A.foo() calls method B.bar(). And tools like Structure101 for Java exploit this to provide visualization and analysis of the baggage landscape.
It is important to understand, however, that there are always likely to be blind spots in such tools. A highly specific example I came across recently was where class X emitted a convoluted (highly X-specific) string that got passed to class Y which contained custom code to parse that string. In a (simplistic) static analysis of the code (and in the absence of a class Z that wraps the string in some form), Y does not depend on X (or Z). Conceptually, however, Y is most definitely carrying X baggage.
Other (tool-specific) blind spots may become apparent, so to speak, if we look beyond the confines of the immediate code-base. For example, consider a piece of code that constructs and executes a gruesomely contorted SQL statement. Static analysis that only looks at the code reveals a dependency on say javax.sql.* but misses the additional baggage that arises from intimate knowledge of the database schema. The same kinds of issue arise if we are using e.g. internal DSLs as part of a wider solution.

Baggage landscape
Does this invalidate the use of static analysis tools as some have argued (see previous thread, especially the comments)? Well, strictly speaking, I guess it is a percentages game and depends to a large event on the specifics of the analysis engine and project in question. When it comes to blind spots outside the code-base (as in the database example above), the key factor to my mind is their contribution to overall system complexity. In the typical database scenario, I would tentatively suggest that this is generally marginal (assuming that the relevant code is suitably compartmentalized). As for within the code-base, clearly, the higher the correlation between reported and conceptual baggage, the greater the utility of the tool. In the case of Java (and strongly typed languages in general), I would say the correlation is extremely strong (though my viewpoint here may be rather predictable, given I am one of the guys behind Structure101). There is also the question of whether accurate subset visibility is preferable to no visibility at all…

Whose bag is it anyway?
When playing the percentages game, however, it is important not to confuse the baggage that you are genuinely carrying with other suitcases that just happen to be in the same space. This is the distinction between static and runtime views of the world. I’ll paraphrase the issue here as: I’ll worry about my baggage and let others worry about their’s.
For example, if I were given the job of coding up java.util.ArrayList (an array-style container implementation), my baggage would be (broadly) just my interface (java.util.List) and members (instances of java.lang.Object). At runtime, someone may use an ArrayList to hold a collection of Foo instances; so when the list’s get() method is invoked, the returned object is in fact an instance of Foo. But that does not mean that my ArrayList is in any way conceptually dependent on their Foo. This is their baggage, not mine.
A similar nugget in the Java space is reflection (and e.g. dependency injection a la Spring), often seen as a gap in static analysis tools in the sense that some dependencies are missed. However, this is really just the same issue as the list of Foos above. At coding time, all I need to know is that (say) some input string will be a class name that I can use to instantiate an appropriate implementation of something or other (often involving a cast to an interface that does get picked up by static analysis). The rest is runtime, someone else’s baggage.
That said, there is a scenario where reflection can be used to induce a blind spot wilfully. For example, I know that the object I am getting is a Foo and that I will invoke its bar() method, but I deliberately do this using reflection rather than casting. The baggage is there whichever approach I choose but in one case (typed invocation) the baggage is transparent while in the other (reflection) it is obfuscated. There is a danger that a blind adoption of rules and metrics around baggage measurement may, in extreme circumstances, encourage some team members to adopt the obfuscation approach. To “game the system”. I think here that there would be a static analysis counter-measure – namely to control access to reflection – but obviously the better approach is to address any such dysfunctionality at source…

Invisible baggage?
In this sense, dynamic languages are (of course) really just an extension of the reflection paradigm. The baggage is still there – it’s just a heck of a lot harder (though not necessarily impossible) to detect. This means that there tends to be way less tooling support, but also, and more importantly, it may be much more difficult for the developers to understand their baggage situation. Interestingly, this has led some to question whether dynamic languages can scale to larger code-bases and teams because of a finite “complexity budget”. For an overview of some of the issues here, see this post by Ted Neward.

Shit happens
Finally, if everyone pays attention to their baggage, does that mean that the system is guaranteed to work? No, of course not. When I check in my bags at the airport, I should ensure that they are securely closed and suitably labeled. That in itself, however, is absolutely no guarantee they will be there at the other end for me to collect (though it should at least make life easier for the airport’s baggage management system and so help to make the desired outcome more probable). The one and only thing I can be sure of is that any screwups will not be my fault. Seems to me that this is the essence of good software: lots and lots of well-defined, self-contained, autonomous units doing their own job faithfully and keeping fingers crossed that others do the same…
Here an interesting use case.
I am currently working on a reuse project. We have a large legacy Java application that we are trying to farm for implementations of some high level functions in a new application. To do this we are identifying the top-level classes that provide the initial entry point(s) to the desired high level functionality and then trying to discover all of the classes in the old system needed to support the identified top-level classes.
I have been doing this manually by going to the collaboration perspective in Structure101, selecting the “go to suppliers” option of a top-level class, and then manually drilling down through all of the classes the top-level class uses (that is, for every class the top-level class uses, I select “go to suppliers” and find out all the classes that class uses, etc. etc. etc.), tracking the needed classes as I go. This is not feasible to do given the size of the project.
Is there anyway I can get Structure101 to basically give me the transitive closure of all the classes used by the identified top-level classes, preferably as plain old ASCII text? Stucture101 seems to have already computed all the information I need, I just cannot figure out a non-manual way of getting the information.
As it happens, there is no first class support in Structure101 for this specific feature. However, it is do-able by leveraging other features and model options. Here’s how you would go about it.
First step is to set Overview granularity in the project properties. With this setting, the model stops at the outer class level but still takes account of all the member-level dependencies. So if Foo.x() calls Bar.y(), the model shows Foo and Bar and a “uses” dependency from Foo to Bar.
Second, tagging. Select a top-level class A in e.g. the Composition perspective, right click and choose Tag / Used by selected / Indirectly. Tag adornments (little blue dots) will appear on all the classes that A uses directly or indirectly (transitive closure of A). Then right click again and choose Tag / Selected so that A is also tagged. Repeat this for other top-level classes.
Now you have all the classes tagged, so all you need to do is export the tag list. Unfortunately, Structure101 does not have a button for that (grrrr) so we have to find another way of getting there…
From the main menu choose Tag / Invert item tags followed by Tag / Hide tagged. You have just subsetted the model to contain only those classes that you are interested in. Now all we have to do is get them into one table where we can right click and choose Copy / Copy all. Easiest for this is probably to switch to the Slice perspective, choose the “Outer class” level, and then you’ll likely see a single main cluster. Select this (actually it will get automatically selected for you). Hey presto, the table bottom left (Items tab of the Graph Contents viewer) contains the full list so a right-click should seal the deal.