Graphviz on steroids

December 3rd, 2008 By Ian Sutton

We just released our new generic jobbie, Structure101g.

If you already know Structure101 for Java or Structure101 for C/C++, you probably already have a good idea of what Structure101g might be. This is for those (and there are many, oh so many) who do not.

Graphviz is a wonderful tool that can be used to create graph visualizations of stuff. All you do is stick in a logical graph model via a text file and out pops a nice picture. For the seminal example, see this view of the Unix family tree (and here is the corresponding text input file). Visit their gallery for lots more examples.

Occasionally the input is written by hand, but mostly it is generated by some piece of code that parses the domain-specific artifacts. For example, Graphviz is widely used to obtain subset pictures in code-base scenarios, e.g. the set of files in a directory and the includes/imports relationships between those files.

Although it is generally totally wonderful, graphviz (and other tools of its ilk) have one big weakness: graph visualizations do not scale. Subset pictures work fine (as in the includes example above) but there is no way to get a meaningful visualization of all the files across all directories.

The key in Structure101g, as in all Structure101 products, is to view the big model through the prism of hierarchy. Divide & conquer - use slices as a mechanism to get both subset views and “big picture” views. The other key differentiators are rich browsing and analysis environment rather than bitmap (or SVG) image, and some nice stuff around plumbing so that end users can create models interactively without leaving the UI.

Ok, so far so product pitch. Here the important stuff:

  • To display data froma particular domain, Structure101g needs a meta-description (xml) of the entity and relationship types in that domain. We call this a flavor.
  • In nearly all real-world cases, a flavor has an associated runner: This is the piece of code that parses the domain specific artifacts (or perhaps just some glue on top of an existing parser).
  • At time of writing, there are flavor/runner implementations for the domains OSGi/Eclipse (bundles), Maven (POMs), Ant (targets and properties in a build.xml file), XSL (stylesheets), and Web (html pages and associated images, scripts, stylesheets, etc.).
  • Graphviz is completely free to all and sundry.
  • Structure101g is completely free for the above flavor/runner pairs, but we (Headway) have control-freakish tendencies so you need to talk to us about making a flavor generally available (either for free or commercially) or buy a domain license for proprietary usage.
  • Graphviz does lots of different graph types and layouts. Structure101g is hard-wired to directed graphs with hierarchical top-down layout.

For more info and background, see this demo or visit the Structure101g home page.

Structure101, Visualization ,

Software erosion and package tangles

December 3rd, 2008 By Ian Sutton

My recent post on architectural erosion in the findbugs code-base was generally well received, but there were some skeptical voices.

In a comment, Emeric questioned whether cyclic dependencies at the package level are anything more than a smell (if that). Itay Maman was a little more forthright, offering a little series of posts arguing that I was peddling myths, tangled packages are the norm (so they must be okay), and all static analysis is in any case completely pointless.

In both cases, they honed in exclusively on the rather narrow issue of package tangles, while also ignoring the time dimension, and in this sense I think both rather missed the point (though perhaps some more than others).

As I said in the opening paragraph of the original post, the key for me is levels of abstraction above the raw code: architectural components within a code-base if you like. In the case of findbugs, there are several instances where you can see that an architectural decision was made, only for this to be come blurred and ultimately lost over time. In all the early releases (e.g. 0.8.6), and surely not by accident, the ba component does not use the findbugs component. In 0.8.8, a rogue dependency creeps in. If you follow the full series of snapshots, you will see that this back dependency steadily rises from an initial weight of 2 code-level references (that could be easily reversed out) to the point where the interdependency is deeply entrenched in the code. Other examples are the blurring of the relationship between config and findbugs, and the attempt to interface off the dependencies on the specific parser library (asm or bcel).

It is not in the least bit surprising to me that this form of erosion happens over time for the simple reason that it is generally invisible. How can we rationalize about that which we cannot see (or measure, or define)?

Enter Structure101. This is based on the simple principle that, in order for design items to be first class citizens in the code-base, we need to be able to see them and, especially, the interactions between them.

Note that I am using terms like “architectural component” and “design item”, not “package”. In general, I am loathe to assume that there is necessarily a one-to-one correspondence between the Java package hierarchy on the one hand, and the “design hierarchy” on the other - for sure, there is absolutely nothing in the language specification to say that this must be so. I can say with confidence that this correlation exists for the Structure101 code-base (because I co-own it) and the Spring code-base (because they make a lot of noise about it), but I think this is a dubious assumption in general. For findbugs, however, I think this little leap-of-faith was reasonable given the clarity of the package diagrams in the early releases.

Now suppose that you have do have a code-base with a formal design hierarchy but one that does not correspond to the package structure. If you point Structure101 at that code-base, the initial (default) views will leave you completely cold because you are looking at the interactions between arbitrary subsets of code. You don’t care about these in the slightest, and quite rightly so. However, that does not mean that the tool is of no use - you can use transformations to map the code so that the resulting hierarchy does mirror the design. This scenario is actually quite common, for instance where the first level of breakout is managed via separate IDE projects / jar files and the logical package view results in (unintended) package name collisions.

With that background in place, let’s take a closer look at some of the dissenting voices.

@Emeric
But what are your arguments to say that 2 or 3 interdependents packages are a “blob” which you imply is bad ?
Don’t these packages deserve a name in their own and an isolation of comprehension even if they have cyclic dependencies ?
I think they initially deserve the separation, even with cyclic dependencies.

This does not seem unreasonable and is in fact a good fit with the tangle of findbugs, config and filter in 0.8.8. Here is the raw package diagram (note that I’m excluding io and anttask as noise):

So let’s transform this model to introduce a new architectural component - controller - as the union of all three of these. Here’s what we get:

This gives us a much better view of the architectural components at this level in the code-base, and shows just the one (clearly) rogue dependency from ba back to the controller. Needless to say, if we drill into the controller component, we will see that it contains package tangles…

… but it is essentially up to you (the user, team, …) whether or not you choose to care about this. Indeed, you can formally capture those things you do care about (and, by implication, those you do not) using architecture diagrams. Note however that the XS (excessive complexity) metric will always punish tangles at higher levels in the code (in that it measures distance from a structural ideal) - more on this in a bit.

Let’s now take another look at the most recent version (1.3.5). We could apply the same principle here, and transform all the packages involved in the tangle into a single “architectural component”, though this time it’s harder to think of a name. Let’s go with … errr … blob.

And, hey presto, we have an acyclic graph at the first level of breakout. However, 99% of the code-base is now located within blob so we have not really achieved terribly much in terms of architectural divide & conquer. Also, needless to say, the package breakout within the blob component is still essentially anarchic.

This leads on to the final point about package tangles. There is always a simple fix way to fix any package tangle: just merge all the classes into a single package. Here is another view of the blob component (ok, so you’ll need very good eyesight for this one) but this time I tweaked the transformations to strip out all the sub-packaging.

This one is way too “fat” (642 classes and 6,830 dependencies) to do as a nice diagram so I switched to a matrix view and set the cell size to 1 pixel. In many ways, I prefer this view because it is a much more accurate representation of how the code really is (a bunch of classes), as opposed to the raw package view which is basically just showing arbitrary subsets. That said, this view doesn’t actually help me to understand the code-base in any way shape, or form. Instead, it makes me think of Jonathan Edwards’ fine quote that “the human mind can not grasp the complexity of a moderately sized program, much less the monster systems we build today“.

This aspect of structure - the dichotomy between tangles and fat - is important when it comes to measurement, since it is clearly insufficient to take account of one without the other. The XS metric does factor in both sides of this coin - I do not claim that it is perfect, but I do rather suspect it is the best we have.

@Emeric
I agree that cyclic dependencies is a bad smell, except perhaps when there is a large dependency [in one direction] and a small backward dependency [in the other].

I agree that cyclic dependencies between packages is merely a smell, but I would argue that cyclic dependencies between architectural components is more than that. Where the backward dependency is small, I would see this as probably indicating good abstractions but with some rogue code that should ideally be fixed some time (but see also Keep A Lid On It). Heavy interdependency between architectural components is essentially the same state as no architectural components…

@Emeric
May I ask you opinion on the possibility to refactor, in the future, findbugs with Java Modules and friends packages.

I think I have addressed this already in the sense that the design (module, component, …) hierarchy is not necessarily the same as the package hierarchy (friends or no friends) but please follow up if I’m missing something here.

In his “Mythbusting” series, Itay hauls out the scattergun and sprays it around pretty indiscriminately. I was concerned that it would take me a long time to respond to all the various points, but actually I am delighted to says that others have already done a far better job than ever I could have done, so for the most part I will just refer you to the meaty comments sections. However, there is one aspect I would like to follow up on.

Here is the very first line in the very first post:

@Itay
(Disclaimer: as always, the issue of reasoning about software quality is largely a matter of personal judgment)

This is later fleshed out a little more (in a comment):

@Itay
It all boils down to the fact that we don’t have a single-, absolute-, objective-, metric of software quality (we all wish we had). Hence, we are constantly looking for approximation techniques. We must be careful not to mistake the approximation for the real thing.

This standpoint is thoroughly defensible, and reminds me of a number of conversations I have had with customers about the Structure101 metrics. The general feeling here, however, has consistently been that metrics are critical in terms of bridging the gap between technical staff (who instinctively understand why a particular activity is needed) and management staff (who need things like line charts and red and blue bars to be able to justify such activities further up the food chain). The two most interesting metrics here are XS (mentioned above) and number-of-architectural-violations. The former is all predefined and set in stone (though you can tweak the thresholds) while the latter is totally in the control of the team because it is calculated based on the architecture diagrams that the team (not the tool) defines. Some customers use one, some the other, and some both. I think it is correct to say that all are careful and none would claim for a second that these are absolute measures of perfection.

Had this - being careful with stats and metrics - been the essence of Itay’s posts, I think he would have caused less of a storm (though perhaps the storm was always his objective). Instead, Itay chose to mostly bury the “being careful…”  bit and lead with sweeping statements such as “Dependency analysis is largely useless”. Shame…

Complexity, Emergent Design , , , ,

Software erosion in pictures - Findbugs

November 27th, 2008 By Ian Sutton

My particular area of interest in software these days is the importance of levels of abstraction above the raw code. In Java, the most natural place for these to manifest themselves is through the package structure (though this is certainly not the only possibility).

Recently I used Structure101 to do some analysis on the evolution of the findbugs code-base, and was rather stunned at what I saw. Here is the root dependency graph for the first public release (0.7.2) back in March 04.

This diagram shows us the top-level packages in the code-base and the interactions between them (the numbers beside the arrows denote the number of code-level references).

With just a little knowledge about what findbugs does (it is a static analysis tool that scans bytecode for potential bugs), it is easy to rationalize about how this code-base is internally architected:

  • graph is a re-usable (baggage-less) data structure to model the control flow within a method body
  • visitclass wraps a bytecode parser with a visitor pattern to shield the parser implementation from the interpretation of the parser callbacks
  • ba (bug analyzer?) is the bit where specific rules (policies, strategies, …) are implemented
  • findbugs is the controller that drives the interactions between the other components

The image below again shows the top-level breakout, but this time several releases later in Oct 04 (0.8.6).

Although the code-base has grown significantly, it is still absolutely possible to rationalize about the architecture and where the new pockets of code (annotations, xml, config, etc) fit into the “big picture”. The only apparent blemish is that io is now disconnected (perhaps dead code).

The first significant imperfection creeps in in April 05 (0.8.7).

The relationship between config and findbugs has become blurred. Since both packages are dependent on each other, it is no longer clear that either of these in isolation represent meaningful or useful abstractions, and it may make more sense to think about the relevant “component” as being the union of the two.

Skip forward just a month…

… and the confusion has spread (0.8.8). The filter package has been added but it too has a 2-way dependency with the findbugs package, so it seems reasonable to say that the whole world of controller and config (incl. filtering) is in essence a blob where the individual packages do not really contribute anything in isolation.

There is also a rogue dependency here from ba to findbugs. This is clearly contrary to the original architectural intent.  The weight is just 2, so this would have been very easy to reverse out had it been spotted.

If we fast-forward a year (1.0.0), however, we see that this rogue dependency has become entrenched (the weight has increased from 2 to 99).

Moreover, more and more packages are being pulled into the tangle such that it is hard or impossible to talk about these as meaningful entities in their own right. For example, what is the point of a util package if it contains code that depends on the findbugs package?

Nevertheless, we can still see evidence of meaningful architectural decisions. For example, the bcel and asm packages are presumably wrappers for the BCEL and ASM bytecode engineering libraries that, together with the classfile package, enable an element of plug&play in terms of which library actually gets used for the analysis.

However, moving on to Nov 07 (1.3.0)…

… we see that these too have been sucked into the tangle. From now on, it seems, all testing, deployment etc. will need to include both.

And here is the most recent snapshot from September 08 (1.3.5):

This diagram doesn’t help any more - nearly all the higher-level abstractions appear to have eroded away. Moreover, a peek under the hood reveals that there is a large code-level tangle involving 43% of all the classes and spanning 33 packages - this implies that the interdependency has become deeply entrenched in the code. Shame…

For a quick view of the full history, I did up a little animated gif showing the “progression” through all 27 releases. If you are interested in something meatier, see the “Structure101 in a Nutshell Part 1″ presentation on the Headway Take a Tour page.

Complexity, Emergent Design , , , ,

Reconstructing blog

November 25th, 2008 By Ian Sutton

Just a brief and largely uninspiring note to say that (a) this blog has moved from its old location at chris.headwaysoftware.com and (b) I have been plucked from the joys of comfortable anonymity to join the fray. In practice, this means that there will be two of us ruminating on the wonders of software structure rather than just Chris.

I know, you can hardly wait…

Uncategorized

Structure101 Takes Home Jolt Productivity Award

March 11th, 2008 By Chris Chedgey

Joltaward_6Just got back from SD West and unpacked our Jolt Productivity Award. Huge credit and thanks to all the users that provided the stream of feature suggestions that has contributed to making Structure101 such a great product. And congratulations to the product and development team at Headway who spent so many tortuous hours discussing and honing each new feature (sometimes politely!) so that the product stayed usable as it got deeper.

Structure101

Structure101 named 2008 JOLT finalist …

January 22nd, 2008 By Chris Chedgey

… in the Design and Modeling category. Yeah!

Press release

Uncategorized

Structure101 3.1 - Software Architecture Sandboxing

December 13th, 2007 By Chris Chedgey

Just released, Version 3.1 adds lots of new stuff to the Architecture perspective to make it much easier to discover the current structure and move classes or packages around to define a preferred architecture.

First thing is a simple expand and collapse button on each cell. So for example you can ask Structure101 to create a high-level architecture diagram from the current code-base - no need to worry about how deep to make the initial diagram since you can now expand and collapse cells once you have the initial diagram.

Let’s say you get the following diagram:

Architecture Diagram 1

This shows a layering violation from the strut2 package to the dispatcher package. Click the “+” to find out what is being used at the next level of detail.

Architecture Diagram 2

So something in struts2 is using something in dispatcher.controller. Expand both cells…

Architecture Diagram 3

And we see there is a single class-to-class dependency causing the violation. Also, since the Dispatcher class is below the other classes in the dispatcher.controller layering, we know that the other classes use, but it does not use the other classes. And the layering in the struts2 package indicates that we can move Dispatcher to the indicated level to fix the violation without creating new ones. Just drag and drop Dispatcher to create this:

Architecture Diagram 4

Collapsing the 2 top-level cells by clicking the “-” icons shows the original layering but with the underlying code rearranged so there is no longer an architecture violation.

Diagram5

Other stuff that has been added in this version:

  • Once you have moved classes or packages, you can get a list of all the moved items.
  • You can convert the list of moved items into transformations on the underlying model. This lets you iterate on a restructuring job - use an architecture diagram as a kind of “Sandbox” until you have some set of changes you’re happy with, then convert them into transformations and start sandboxing the next set of changes.
  • You can now apply strict layering to a diagram. This means that cells can only use cells immediately below them, not below that.
  • You can show any dependencies (i.e. not just violations) either on the whole diagram or on selected cells.
  • You can drag items onto an architecture diagram from the dependency breakout.

Final note, if you are using the architecture diagrams to drive your developers, you should choose the expansion level of cells consciously before publishing them to the project repository. Their code changes will be checked against the visible levels only.

Architecture, Structure101

Spring 2.5 Architecture Diagrams

November 1st, 2007 By Chris Chedgey

I have updated the architecture diagrams for the just-released Spring 2.5. Any new or changed packages are highlighted (since 2.0.6). The diagrams are also online - if you pointed your IDE plugin at these after my previous entry, you will be seeing the updated diagrams in your IDE already, and any compile time messages about architecture violations will be based on the new versions.

Here’s the new top-level architecture:

Top

And here are the internals of the larger subsystems:

org.springframework.aop:

Aop

org.springframework.beans:

Beans

org.springframework.jdbc:

Jdbc

org.springframework.jms:

Jms

org.springframework.orm:

Orm

org.springframework.web:

Web

Architecture, Dependency Management, Emergent Design, Java, Spring, Structure101

Structure101 V3 Released, Adds Architecture Control for Teams

October 17th, 2007 By Chris Chedgey

Released today, the new version 3 capabilities make Structure101 a nicely rounded architectural control solution in addition to the previous structural analysis and complexity measurement capabilities.

For example:

  • You can now define layering constraints on your code-base using simple, intuitive architecture block diagrams
  • Communicate these architecture diagrams to the development team through IDE plug-ins
  • Developers get warned immediately if they make code changes that are inconsistent with the architecture
  • RSS activity feeds let you know if new architecture violations make it into a build

Also, there’s a new online demo (about 13 minutes, with audio (me!)) and the version 3 Help is available online.

Full press release

Enjoy!

Architecture, Emergent Design, Structure101, Visualization

A Periodic Table of Visualization Methods

August 14th, 2007 By Chris Chedgey

A lot of work went into this. A "periodic table" of visualization methods for data, information, concepts, strategy, metaphors, process and structure.

Here’s a screen shot - be sure and visit the original if you’re interested - when you mouse over each cell, you get an example of the corresponding visualization method.

Periodic_table

I didn’t see any of the visualization techniques used by structure101 for visualizing software dependencies and architectural layers. It is more focused on business processes, though Data Flow (Df), Entity-relationship (E) and Flowchart (Fl) diagrams are there.

Visualization