Archive for the 'Complexity' Category

Software erosion and package tangles


My recent post on architectural erosion in the findbugs code-base was generally well received, but there were some skeptical voices.

In a comment, Emeric questioned whether cyclic dependencies at the package level are anything more than a smell (if that). Itay Maman was a little more forthright, offering a little series of posts arguing that I was peddling myths, tangled packages are the norm (so they must be okay), and all static analysis is in any case completely pointless.

In both cases, they honed in exclusively on the rather narrow issue of package tangles, while also ignoring the time dimension, and in this sense I think both rather missed the point (though perhaps some more than others).

As I said in the opening paragraph of the original post, the key for me is levels of abstraction above the raw code: architectural components within a code-base if you like. In the case of findbugs, there are several instances where you can see that an architectural decision was made, only for this to be come blurred and ultimately lost over time. In all the early releases (e.g. 0.8.6), and surely not by accident, the ba component does not use the findbugs component. In 0.8.8, a rogue dependency creeps in. If you follow the full series of snapshots, you will see that this back dependency steadily rises from an initial weight of 2 code-level references (that could be easily reversed out) to the point where the interdependency is deeply entrenched in the code. Other examples are the blurring of the relationship between config and findbugs, and the attempt to interface off the dependencies on the specific parser library (asm or bcel).

It is not in the least bit surprising to me that this form of erosion happens over time for the simple reason that it is generally invisible. How can we rationalize about that which we cannot see (or measure, or define)?

Enter Structure101. This is based on the simple principle that, in order for design items to be first class citizens in the code-base, we need to be able to see them and, especially, the interactions between them.

Note that I am using terms like “architectural component” and “design item”, not “package”. In general, I am loathe to assume that there is necessarily a one-to-one correspondence between the Java package hierarchy on the one hand, and the “design hierarchy” on the other – for sure, there is absolutely nothing in the language specification to say that this must be so. I can say with confidence that this correlation exists for the Structure101 code-base (because I co-own it) and the Spring code-base (because they make a lot of noise about it), but I think this is a dubious assumption in general. For findbugs, however, I think this little leap-of-faith was reasonable given the clarity of the package diagrams in the early releases.

Now suppose that you have do have a code-base with a formal design hierarchy but one that does not correspond to the package structure. If you point Structure101 at that code-base, the initial (default) views will leave you completely cold because you are looking at the interactions between arbitrary subsets of code. You don’t care about these in the slightest, and quite rightly so. However, that does not mean that the tool is of no use – you can use transformations to map the code so that the resulting hierarchy does mirror the design. This scenario is actually quite common, for instance where the first level of breakout is managed via separate IDE projects / jar files and the logical package view results in (unintended) package name collisions.

With that background in place, let’s take a closer look at some of the dissenting voices.

@Emeric
But what are your arguments to say that 2 or 3 interdependents packages are a “blob” which you imply is bad ?
Don’t these packages deserve a name in their own and an isolation of comprehension even if they have cyclic dependencies ?
I think they initially deserve the separation, even with cyclic dependencies.

This does not seem unreasonable and is in fact a good fit with the tangle of findbugs, config and filter in 0.8.8. Here is the raw package diagram (note that I’m excluding io and anttask as noise):

So let’s transform this model to introduce a new architectural component – controller – as the union of all three of these. Here’s what we get:

This gives us a much better view of the architectural components at this level in the code-base, and shows just the one (clearly) rogue dependency from ba back to the controller. Needless to say, if we drill into the controller component, we will see that it contains package tangles…

… but it is essentially up to you (the user, team, …) whether or not you choose to care about this. Indeed, you can formally capture those things you do care about (and, by implication, those you do not) using architecture diagrams. Note however that the XS (excessive complexity) metric will always punish tangles at higher levels in the code (in that it measures distance from a structural ideal) – more on this in a bit.

Let’s now take another look at the most recent version (1.3.5). We could apply the same principle here, and transform all the packages involved in the tangle into a single “architectural component”, though this time it’s harder to think of a name. Let’s go with … errr … blob.

And, hey presto, we have an acyclic graph at the first level of breakout. However, 99% of the code-base is now located within blob so we have not really achieved terribly much in terms of architectural divide & conquer. Also, needless to say, the package breakout within the blob component is still essentially anarchic.

This leads on to the final point about package tangles. There is always a simple fix way to fix any package tangle: just merge all the classes into a single package. Here is another view of the blob component (ok, so you’ll need very good eyesight for this one) but this time I tweaked the transformations to strip out all the sub-packaging.

This one is way too “fat” (642 classes and 6,830 dependencies) to do as a nice diagram so I switched to a matrix view and set the cell size to 1 pixel. In many ways, I prefer this view because it is a much more accurate representation of how the code really is (a bunch of classes), as opposed to the raw package view which is basically just showing arbitrary subsets. That said, this view doesn’t actually help me to understand the code-base in any way shape, or form. Instead, it makes me think of Jonathan Edwards’ fine quote that “the human mind can not grasp the complexity of a moderately sized program, much less the monster systems we build today“.

This aspect of structure – the dichotomy between tangles and fat – is important when it comes to measurement, since it is clearly insufficient to take account of one without the other. The XS metric does factor in both sides of this coin – I do not claim that it is perfect, but I do rather suspect it is the best we have.

@Emeric
I agree that cyclic dependencies is a bad smell, except perhaps when there is a large dependency [in one direction] and a small backward dependency [in the other].

I agree that cyclic dependencies between packages is merely a smell, but I would argue that cyclic dependencies between architectural components is more than that. Where the backward dependency is small, I would see this as probably indicating good abstractions but with some rogue code that should ideally be fixed some time (but see also Keep A Lid On It). Heavy interdependency between architectural components is essentially the same state as no architectural components…

@Emeric
May I ask you opinion on the possibility to refactor, in the future, findbugs with Java Modules and friends packages.

I think I have addressed this already in the sense that the design (module, component, …) hierarchy is not necessarily the same as the package hierarchy (friends or no friends) but please follow up if I’m missing something here.

In his “Mythbusting” series, Itay hauls out the scattergun and sprays it around pretty indiscriminately. I was concerned that it would take me a long time to respond to all the various points, but actually I am delighted to says that others have already done a far better job than ever I could have done, so for the most part I will just refer you to the meaty comments sections. However, there is one aspect I would like to follow up on.

Here is the very first line in the very first post:

@Itay
(Disclaimer: as always, the issue of reasoning about software quality is largely a matter of personal judgment)

This is later fleshed out a little more (in a comment):

@Itay
It all boils down to the fact that we don’t have a single-, absolute-, objective-, metric of software quality (we all wish we had). Hence, we are constantly looking for approximation techniques. We must be careful not to mistake the approximation for the real thing.

This standpoint is thoroughly defensible, and reminds me of a number of conversations I have had with customers about the Structure101 metrics. The general feeling here, however, has consistently been that metrics are critical in terms of bridging the gap between technical staff (who instinctively understand why a particular activity is needed) and management staff (who need things like line charts and red and blue bars to be able to justify such activities further up the food chain). The two most interesting metrics here are XS (mentioned above) and number-of-architectural-violations. The former is all predefined and set in stone (though you can tweak the thresholds) while the latter is totally in the control of the team because it is calculated based on the architecture diagrams that the team (not the tool) defines. Some customers use one, some the other, and some both. I think it is correct to say that all are careful and none would claim for a second that these are absolute measures of perfection.

Had this – being careful with stats and metrics – been the essence of Itay’s posts, I think he would have caused less of a storm (though perhaps the storm was always his objective). Instead, Itay chose to mostly bury the “being careful…”  bit and lead with sweeping statements such as “Dependency analysis is largely useless”. Shame…

 

Software erosion in pictures – Findbugs


My particular area of interest in software these days is the importance of levels of abstraction above the raw code. In Java, the most natural place for these to manifest themselves is through the package structure (though this is certainly not the only possibility).

Recently I used Structure101 to do some analysis on the evolution of the findbugs code-base, and was rather stunned at what I saw. Here is the root dependency graph for the first public release (0.7.2) back in March 04.

This diagram shows us the top-level packages in the code-base and the interactions between them (the numbers beside the arrows denote the number of code-level references).

With just a little knowledge about what findbugs does (it is a static analysis tool that scans bytecode for potential bugs), it is easy to rationalize about how this code-base is internally architected:

  • graph is a re-usable (baggage-less) data structure to model the control flow within a method body
  • visitclass wraps a bytecode parser with a visitor pattern to shield the parser implementation from the interpretation of the parser callbacks
  • ba (bug analyzer?) is the bit where specific rules (policies, strategies, …) are implemented
  • findbugs is the controller that drives the interactions between the other components

The image below again shows the top-level breakout, but this time several releases later in Oct 04 (0.8.6).

Although the code-base has grown significantly, it is still absolutely possible to rationalize about the architecture and where the new pockets of code (annotations, xml, config, etc) fit into the “big picture”. The only apparent blemish is that io is now disconnected (perhaps dead code).

The first significant imperfection creeps in in April 05 (0.8.7).

The relationship between config and findbugs has become blurred. Since both packages are dependent on each other, it is no longer clear that either of these in isolation represent meaningful or useful abstractions, and it may make more sense to think about the relevant “component” as being the union of the two.

Skip forward just a month…

… and the confusion has spread (0.8.8). The filter package has been added but it too has a 2-way dependency with the findbugs package, so it seems reasonable to say that the whole world of controller and config (incl. filtering) is in essence a blob where the individual packages do not really contribute anything in isolation.

There is also a rogue dependency here from ba to findbugs. This is clearly contrary to the original architectural intent.  The weight is just 2, so this would have been very easy to reverse out had it been spotted.

If we fast-forward a year (1.0.0), however, we see that this rogue dependency has become entrenched (the weight has increased from 2 to 99).

Moreover, more and more packages are being pulled into the tangle such that it is hard or impossible to talk about these as meaningful entities in their own right. For example, what is the point of a util package if it contains code that depends on the findbugs package?

Nevertheless, we can still see evidence of meaningful architectural decisions. For example, the bcel and asm packages are presumably wrappers for the BCEL and ASM bytecode engineering libraries that, together with the classfile package, enable an element of plug&play in terms of which library actually gets used for the analysis.

However, moving on to Nov 07 (1.3.0)…

… we see that these too have been sucked into the tangle. From now on, it seems, all testing, deployment etc. will need to include both.

And here is the most recent snapshot from September 08 (1.3.5):

This diagram doesn’t help any more – nearly all the higher-level abstractions appear to have eroded away. Moreover, a peek under the hood reveals that there is a large code-level tangle involving 43% of all the classes and spanning 33 packages – this implies that the interdependency has become deeply entrenched in the code. Shame…

For a quick view of the full history, I did up a little animated gif showing the “progression” through all 27 releases. If you are interested in something meatier, see the “Structure101 in a Nutshell Part 1″ presentation on the Headway Take a Tour page.

 

Package design matters – Part 1


Java packages are often used like file-system folders to organize source. But source files differ from “normal” files in that they are highly inter-dependent. Considering this interdependence as a package hierarchy evolves can have significant productivity benefits.

Packages as Folders

Java packages provide an ideal way to organize code into a scalable, hierarchical structure that helps us find specific code.

In this sense, packages can be used like folders in a file-system:

  • We place files with something in common in the same folder.
  • When a folder grows too big and we find we’re having trouble finding files, we split the folder into sub-folders according to some criteria that makes sense to us.
  • We share files by placing them in a common area on company network, in which case the structure evolves according to the varying criteria of different people.
  • We often have trouble deciding which folder a file best belongs, and make an arbitrary decision.

Often Packages are only used as a kind of filesystem equivallent.  However the package hierarchy can also be used to reinforce the intended design and associated development activities.

Packages as Design Abstractions

Source files differ from other files typically stored in the file-system:

  • They depend on the detailed contents of other source files.
  • Are created and edited in groups of multiple files.
  • Are subject to a high number of relatively small changes.
  • Are edited by a team of developers rather than individuals.
  • Should be reusable on future projects.
  • Should be easy to change without impacting other files too widely.
  • Must support the defined deployment environment.
  • Are subject to a QA, version control and release processes.

The aim of a package design should be to support these characteristics.  For example, the design could explicitly support Martin’s “Reuse/Release Equivalence Principle (REP)” (article, book) whereby packages are developed, built, tested and released against released versions of the packages upon which they depend.

Design is not something that happens once at the start of the project – it is an activity that spans the life of an application or product.  This fact has become explicit with the iterative and Agile development models.  As the code-level design continues, the package-level design emerges.
Unfortunately, the emergent design is often invisible and so forgotten.  Not only does the original design degrade, but the overall structure tends to become excessively complex.  As the supporting structure dissolves, development activities become harder and the cost of each new feature increases.

This priciple of emergent design is important here. Clearly when I have a project of 50 classes in a half-dozen packages, the overhead of a sub-optimal package dependencies isn’t going to slaughter me. But if my project is going to grow to 5000 classes, then putting in minimal effort from the start can save huge effort when things get more complicated.

In part 2 I’ll take a closer look at cyclic package dependencies and why they matter.

 

Martin Fowler – the design pseudo-graph


How much effort should you put into contolling the structure of your code-base? A nice article by Martin Fowler.

"The problem with no-design, is that by not putting effort into the design, the code base deteriorates and becomes harder to modify, which lowers the productivity, which is the gradient of the line. Good design keeps its productivity more constant so at some point (the design payoff line) it overtakes the cumulative functionality of the no-design project and will continue to do better."

http://www.martinfowler.com/bliki/DesignStaminaHypothesis.html

 

DevX review of Structure101


"Getting your arms (and eyes) around large, complex code bases has never been easy, but Structure101 from Headway Software may just be the elegant solution to this age-old problem. Find out how this visual design tool analyzes your enterprise projects and lets you zone in on issues quickly and gracefully."  Full Story by Derek Lane.

 

Complexity Debt – don’t “fix it”, “keep a lid on it”


So you just discovered that your code-base has racked up a whole load of complexity debt. This  maybe explains why progress seems so painfully slow lately. You briefly think of suggesting a major complexity-reducing refactoring effort. This will delay the next release significantly, but foreshorten the time to the following releases. Plus a cleaner, simpler code-base will make the world a nicer happier place, right?

But you don’t suggest this. You’re human and self-preservation is an instict. Precisely because of the recent slow progress, there is a lot of disapointment on the whole product delivery front at the minute. Suggesting another big delay doesn’t feel like the best career move just now.

Luckily there is another, more subtle way to get to that happier place without climbing out on long limbs over thin ice.

Don’t repay the debt in one big painful bang – just keep a lid on it. And watch it begin to disipate as though by magic.

You use personality, charisma, leadership and/or donuts to convince your team that henceforth, they will not add any more complexity debt to the code base. Now watch what happens…

If I need to add to a method with a CC of 20 (where the threshold is say 15) and I add a couple of new paths, then I temporarily increase the complexity from 20 to 22. Uh oh, I said I wouldn’t do that. No problem – I’m working on the method already, so I have a good handle on what it does. I just extract a suitable lump into a new method with a nice helpful name and bingo, I have 2 methods each within tolerance instead of 1 over. The 2 methods are simpler and easier to understand and maintain than the 1 before, and the overall code-base debt just went down a bit. Well, I feel good about this.

But wait. That one new method pushes the containing class over the class-level complexity threshold. Again, I refactor the class while its workings are in my head already (perhaps I use move field or extract class). Again, if the class was previously over-threshold, then I probably just reduced the overall debt a bit more.

The same will happen when anyone trys to add to any overly-complex package. And as the xs framework sets thresholds at every level of design breakout, the developers are relieved of the temptation to “hide” complexity by pushing it up or down the hierarchy. The code-base becomes truly less complex, without anyone really trying.

This is cool enough to be named – how about “KALOI” for “keep a lid on it”.

KALOI is supported by Structure101 and there is more explicit support in the pipeline. More on this later.

 

Structure101 v2 goes GA today


Additions let you see complete slices of a code-base at any level, home in on structural complexity, view dependency graphs in matrix form, and map code items and groups (like tangles) through different hierarchies, slices and perspectives (more download).

 

Tracking complexity debt


Un-monitored, the complexity of a code-base increases with its size. Jboss and Struts are perfect examples. However monitoring complexity helps you keep complexity debt under control, or even down to zero.

If you publish the last couple of years worth of releases of your project to a Structure101 repository, you’ll probably see something like this in the Structure101 Tracker web application:

Jbossxschart

jboss over time

Structure101 matches the amount of XS to the lines of code that cause it. Unless someone pays attention to it, the same team will tend to code-in a consistent degree of complexity debt as they go – in this case they’re running at a not-unusual but alarming 80%.

Struts shows a similar chart running at about 57% XS:

Strutsxschart

struts over time

I was in with the Prime Carrier guys yesterday. They’ve been tracking their excess complexity for several months now and it really shows.  Here’s the chart for one of their projects:

Pc1

Prime Carrier project 1 over time

As you can see, XS is hugging the zero line even as the size of the code-base grows. This is why we call it “excess” (XS) – it really is excessive and totally avoidable. Occasionally a cyclic dependency (tangle) or fat class may get into the build, but it’s flagged and usually fixed during the next iteration.

They recently released a new version of another product under severe deadline pressure and consciously decided to pick up a bit of short-term debt. As you can see, they’re already paying it back before it incurs too much “interest”:

Pc2

Prime Carrier project 2 over time

These are a great bunch of hard-nosed developers who are keeping the code-base clear of debt so they can focus on the principal – customer requirements.

 

Manage complexity like debt


Ben Hosking writes in Managing Complexity – The aim of Designing Code that:

The most important part of design is managing complexity

I like the simplicity of that. What happens if you don’t manage complexity. Well, it starts to cost. Talking at OOPSLA 2004, Ward Cunningham (Mr. Wiki) compared complexity with debt:

“Manage complexity like debt,” Cunningham told attendees. Using this analogy, he likened skipping designs to borrowing money; dealing with maintenance headaches like incurring interest payments; refactoring, which is improving the design of existing code, like repaying debt; and creating engineering policies like devising financial policies.

In an interview with Bill Venners (Artima), Andy Hunt (Pragmatic Programmer) extends the analogy concisely:

“But just like real debt, it doesn’t take much to get to the point where you can never pay it back, where you have so many problems you can never go back and address them.”

It’s a lovely metaphor. But it does breaks down in one place. Project managers don’t get a pile of bills through the door every month. Even if they wanted to, they can’t rip them open, sum them up, compare them against income and outgoings and discover just how fragged they are, or even hell, that they can afford loads more debt!

Well it’s not quite that bad. We can at least measure and sum up the complexity of items at different levels of design breakout (methods, classes, packages, subsystems and projects).  We may not be able to put a hard complexity number on the tipping point (insolvency), but we can give you a number. With this you can compare projects, monitor trends that show where it’s getting more or less complex, and discover which items at what level are causing the trend.

For example here is the home page for the Structure101 Tracker web application showing the sizes and over-complexity of several projects:

Tracker

Now, correlate XS with the depth of furrow on team leaders’ foreheads, and you’ve really got something to go on…

 

CAT-scan a code-base


Structure101 v2 goes beta today. With it you can walk through the code-base in slices from the class-level, to the package-level and up through the design levels, spotting tangles and seeing how far they have spread.

This is a snag of the Slice perspective with the slice selector highlighted:

Sliceselector

You can now see dependency graphs as matrices, which tend to be better for very large graphs (like slices). A value in a cell indicates a dependency from the column item to the row item. Here’s the equivallent of the tangle shown as a diagram above – as a matrix (highlighted) it now fits in on the screen:

Smallmatrix

And here is a much bigger slice of all the classes in the code-base grouped by parent package (the orange areas).

Bigmatrix

Even zoomed way out, it is possible to pick out some patterns on the matrix. The rows and columns are ordered so that as far as possible items only use items below or to the right, so any dots (dependencies) above the diagonal indicate cyclic dependencies. Horizontal lines indicate heavily used items, vertical lines indicate items that use a lot of other items.

Version 2 lets you “tag” (mark) code-level items (like methods and classes), and any higher-level item (like a package) that contains tagged items is shown as tagged. This lets you tag items in one slice and then see how it maps to other slices and hierarchies. For example, you could tag a big class-level tangle in the Slice perspective and then go to the Composition perspective to see how the tangle is distributed across the package design – it would look like this:

Taggedhierarchy