Archive for the 'Dependency Management' Category

Mapping Architecture Diagrams to Code – the “most specific pattern” rule


Architecture Diagrams in Structure101 are mapped to the physical code by patterns associated with each cell in the diagram. This enables the visual specification of rules that can then be applied to a specific version of your code so that Structure101 can overlay any violations on the diagram and let you discover the offending code items.

When you let Structure101 create the diagrams for you, and stick to changes like adjusting the layering, expanding packages, or moving packages to new parents, etc., you can mostly leave it to Structure101 to handle the patterns. However, occasionally you may see some behavior that seems odd unless you understand how patterns interact.  And of course you may want to take advantage of the more advanced capabilities.

A key aspect that is not obvious at first is the “most specific pattern” rule. This says that where a physical class maps to more than one cell, the Architecture Diagram associates the class with the cell that has the “most specific” pattern.

Here’s an example diagram which was automatically generated from part of the Findbugs code-base:

18-11-2009 15-48-38

… and here is the same diagram showing the pattern associated with each cell:

18-11-2009 16-16-02

If I have a code-base that happens to contain a class edu.umd.cs.findbugs.classfile.impl.ClassFactory, then that class matches the pattern associated with the cell called ClassFactory. However it also matches the patterns for both ancestor cells “impl” and “classfile”. Since the ClassFactory cell’s pattern is the most specific (no wildcards), that is the cell to which Structure101 associates our physical class. And since a class is associated with at most one cell, that class is not associated with either ancestor cell, even though it matches their (less specific) patterns.

As expected, here are the associated items that Structure101 reports for the ClassFactory cell:

18-11-2009 16-51-41

… and the parent “impl” cell:

18-11-2009 16-55-15

Now I delete the 3 inner cells so that the diagram looks like this (the patterns for the remaining cells are unaltered):

18-11-2009 17-11-59

Since the cells with the more specific patterns are gone, the classes that were associated with them now match the next most specific pattern – the parent “impl” cell:

18-11-2009 17-17-54

This can be puzzling – I click on the “expand” button, the child cells are added to the diagram (which I expect), but the list of items associated with the parent is suddenly empty (which I don’t).

Why would we do it this way? We figured it was the best way to be future proof (the wildcards handle any added/removed classes) while at the same time supporting refactoring (e.g. if I move a class to a new package, the most specific pattern goes with it, but all the other classes still match to the original parent).

 

Travelin’ lite (the only way to fly)


Too much baggage

Code is like traveling: the less baggage the better. No bags is bliss, a little backpack hardly noticeable. Chunky wheelie bag: bearable but irksome. But several chunky wheelie bags, and it starts to get … logistically challenging. Not to mention increased risk of hernia.

Often, of course, some amount of baggage is unavoidable. If you are embarking on an expedition to the North Pole, for instance, you would be well advised to take a decent supply of warm underwear.

Pretty much all code has baggage, but some code has more baggage than other code. Ask a developer what they would prefer to tackle – implement a little standalone utility, or write something that sits astride all the obscure notions and constructs emitted by others – and I’m pretty sure I know which one most would pick.

Successful baggage limitation

Of course, you can’t code up a (meaningful) system without some number of building blocks. So even in a perfectly architected and layered system, you inevitably accumulate some baggage as you move up the stack. The trick, though, is to try and minimize this (while also hiding off the details of the contents).

This is hardly new or revolutionary: much of software theory is specifically dedicated to strategies that help us to avoid excessive coupling and so promote modularity. That said, I do rather like the baggage metaphor and am inclined to see minimizing baggage as a primary goal, with e.g. re-usability a side-effect, rather than the other way around.

So what do I mean by baggage? Very informally, I’m thinking of this as “stuff you need to know about” to implement another bit of stuff. In this sense, baggage is a universal aspect of software development, entirely independent of e.g. programming language or framework.

Where a code-base is written in a strongly typed language like Java, it is relatively easy for static analysis to detect most of the baggage automatically, for instance A carries B baggage because class A extends class B and/or method A.foo() calls method B.bar(). And tools like Structure101 for Java exploit this to provide visualization and analysis of the baggage landscape.

It is important to understand, however, that there are always likely to be blind spots in such tools. A highly specific example I came across recently was where class X emitted a convoluted (highly X-specific) string that got passed to class Y which contained custom code to parse that string. In a (simplistic) static analysis of the code (and in the absence of a class Z that wraps the string in some form), Y does not depend on X (or Z). Conceptually, however, Y is most definitely carrying X baggage.

Other (tool-specific) blind spots may become apparent, so to speak, if we look beyond the confines of the immediate code-base. For example, consider a piece of code that constructs and executes a gruesomely contorted SQL statement. Static analysis that only looks at the code reveals a dependency on say javax.sql.* but misses the additional baggage that arises from intimate knowledge of the database schema. The same kinds of issue arise if we are using e.g. internal DSLs as part of a wider solution.

Baggage landscape

Does this invalidate the use of static analysis tools as some have argued (see previous thread, especially the comments)? Well, strictly speaking, I guess it is a percentages game and depends to a large event on the specifics of the analysis engine and project in question. When it comes to blind spots outside the code-base (as in the database example above), the key factor to my mind is their contribution to overall system complexity. In the typical database scenario, I would tentatively suggest that this is generally marginal (assuming that the relevant code is suitably compartmentalized). As for within the code-base, clearly, the higher the correlation between reported and conceptual baggage, the greater the utility of the tool. In the case of Java (and strongly typed languages in general), I would say the correlation is extremely strong (though my viewpoint here may be rather predictable, given I am one of the guys behind Structure101). There is also the question of whether accurate subset visibility is preferable to no visibility at all…

Whose bag is it anyway?

When playing the percentages game, however, it is important not to confuse the baggage that you are genuinely carrying with other suitcases that just happen to be in the same space. This is the distinction between static and runtime views of the world. I’ll paraphrase the issue here as: I’ll worry about my baggage and let others worry about their’s.

For example, if I were given the job of coding up java.util.ArrayList (an array-style container implementation), my baggage would be (broadly) just my interface (java.util.List) and members (instances of java.lang.Object). At runtime, someone may use an ArrayList to hold a collection of Foo instances; so when the list’s get() method is invoked, the returned object is in fact an instance of Foo. But that does not mean that my ArrayList is in any way conceptually dependent on their Foo. This is their baggage, not mine.

A similar nugget in the Java space is reflection (and e.g. dependency injection a la Spring), often seen as a gap in static analysis tools in the sense that some dependencies are missed. However, this is really just the same issue as the list of Foos above. At coding time, all I need to know is that (say) some input string will be a class name that I can use to instantiate an appropriate implementation of something or other (often involving a cast to an interface that does get picked up by static analysis). The rest is runtime, someone else’s baggage.

That said, there is a scenario where reflection can be used to induce a blind spot wilfully. For example, I know that the object I am getting is a Foo and that I will invoke its bar() method, but I deliberately do this using reflection rather than casting. The baggage is there whichever approach I choose but in one case (typed invocation) the baggage is transparent while in the other (reflection) it is obfuscated. There is a danger that a blind adoption of rules and metrics around baggage measurement may, in extreme circumstances, encourage some team members to adopt the obfuscation approach. To “game the system”. I think here that there would be a static analysis counter-measure – namely to control access to reflection – but obviously the better approach is to address any such dysfunctionality at source…

Invisible baggage?

In this sense, dynamic languages are (of course) really just an extension of the reflection paradigm. The baggage is still there – it’s just a heck of a lot harder (though not necessarily impossible) to detect. This means that there tends to be way less tooling support, but also, and more importantly, it may be much more difficult for the developers to understand their baggage situation. Interestingly, this has led some to question whether dynamic languages can scale to larger code-bases and teams because of a finite “complexity budget”. For an overview of some of the issues here, see this post by Ted Neward.

Shit happens

Finally, if everyone pays attention to their baggage, does that mean that the system is guaranteed to work? No, of course not. When I check in my bags at the airport, I should ensure that they are securely closed and suitably labeled. That in itself, however, is absolutely no guarantee they will be there at the other end for me to collect (though it should at least make life easier for the airport’s baggage management system and so help to make the desired outcome more probable). The one and only thing I can be sure of is that any screwups will not be my fault. Seems to me that this is the essence of good software: lots and lots of well-defined, self-contained, autonomous units doing their own job faithfully and keeping fingers crossed that others do the same…

 

Package design matters – Part 1


Java packages are often used like file-system folders to organize source. But source files differ from “normal” files in that they are highly inter-dependent. Considering this interdependence as a package hierarchy evolves can have significant productivity benefits.

Packages as Folders

Java packages provide an ideal way to organize code into a scalable, hierarchical structure that helps us find specific code.

In this sense, packages can be used like folders in a file-system:

  • We place files with something in common in the same folder.
  • When a folder grows too big and we find we’re having trouble finding files, we split the folder into sub-folders according to some criteria that makes sense to us.
  • We share files by placing them in a common area on company network, in which case the structure evolves according to the varying criteria of different people.
  • We often have trouble deciding which folder a file best belongs, and make an arbitrary decision.

Often Packages are only used as a kind of filesystem equivallent.  However the package hierarchy can also be used to reinforce the intended design and associated development activities.

Packages as Design Abstractions

Source files differ from other files typically stored in the file-system:

  • They depend on the detailed contents of other source files.
  • Are created and edited in groups of multiple files.
  • Are subject to a high number of relatively small changes.
  • Are edited by a team of developers rather than individuals.
  • Should be reusable on future projects.
  • Should be easy to change without impacting other files too widely.
  • Must support the defined deployment environment.
  • Are subject to a QA, version control and release processes.

The aim of a package design should be to support these characteristics.  For example, the design could explicitly support Martin’s “Reuse/Release Equivalence Principle (REP)” (article, book) whereby packages are developed, built, tested and released against released versions of the packages upon which they depend.

Design is not something that happens once at the start of the project – it is an activity that spans the life of an application or product.  This fact has become explicit with the iterative and Agile development models.  As the code-level design continues, the package-level design emerges.
Unfortunately, the emergent design is often invisible and so forgotten.  Not only does the original design degrade, but the overall structure tends to become excessively complex.  As the supporting structure dissolves, development activities become harder and the cost of each new feature increases.

This priciple of emergent design is important here. Clearly when I have a project of 50 classes in a half-dozen packages, the overhead of a sub-optimal package dependencies isn’t going to slaughter me. But if my project is going to grow to 5000 classes, then putting in minimal effort from the start can save huge effort when things get more complicated.

In part 2 I’ll take a closer look at cyclic package dependencies and why they matter.

 

Spring Framework 2.1 M3 Architecture


Here are some architecture diagrams for Spring Framework 2.1 M3 (released yesterday). You can point the (free) structure101 plug-in at these and get IDE warnings if your customizations break Jeurgen’s architecture.

Here is the top level breakout of org.springframwork:

Springarchitecture

Structure101 created this from the physical code-base. All the cells in the diagram use only lower-level cells. With such a clean structure, I did no further editing of the diagram, other than to adjust the level of nesting.

Below is a further breakout of some of the larger modules.

org.springframework.aop:

Springaop

org.springframework.beans:

Springbeans

org.springframework.jdbc:

Springjdbc

org.springframework.jms:

Springjms

org.springframework.orm:

Springorm

org.springframework.web

Springweb

You can view these online here (I’ll update later today), and if you have a Spring-based project, you could install the structure101 Eclipse or IntelliJ plug-in (free from here) and point it to the Spring project in the online repository (use this url: http://www.structure101.com/java/data in the plug-in properties) and the diagrams will be visible inside your IDE, any existing violations flagged (i.e. if you have created any upward dependencies), and you will be warned if and when you make code-changes that are inconsistent with the layering.

This is a new-ish feature – please email me directly and let me know how you got on or if you have questions.

 

Code Organization Guidelines for Large Code Bases


In an excellent on-line presentation Juergen Hoeller gives rationale and guidelines for controlling the structure of large, evolving code-bases. Juergen is the chief architect of the Spring framework, which as I have previously pointed out is structurally almost perfect. This didn’t happen by accident.

If you don’t have time go though the 88 minute presentation, here is a nice sysnopsis by Mike Nereson.

 

Eclipse Plugin (OSGi) Visualization


If you are going mad trying to figure out the dependencies between lots of Eclipse plug-ins, or work with other large OSGi systems, you may be interested in this.

We’ve had a few people looking for an Eclipse/OSGi backend for Structure101, and with all the hype about OSGi lately, we decided to lift the lid on it.  Here is an early version that you can download. If you’re an Eclipse user, just point it at your plugins directory to see the same kind of views, hierarchies and slices that you get with the Java version.

Below is a random screen shot of my Eclipse plug-ins to give you the idea (click for the full-size image).

It’s pretty rough at the moment – the download page indicates where we’ve made some arbitrary decisions on the model structure and where we think it’s probably not quite right. I’d love some comments from OSGi heads on if/how you’d like us to change it to make it more finished. If you think you’d find it useful, talk to Paul about an extended license key.

 

Jar Hell


Jarhell A lot of jars can contribute to (and mask) the logical package/class structure. Here’s how to make sense of the whole mess using Structure101.
1. View your project in the Logical hierarchy
2. Tag the classes or packages you’re interested in
3.  Switch to the Jar Hierarchy and see which jars contain tagged items.

Do this in reverse to figure out how code from specific jars maps to (and interacts with) other code in the logical structure.

Here is an example from the Jboss code-base.

In the Composition perspective, select the Package hierarchy, and tag the package org.jboss.security – a blue dot indicates the package is "tagged".

Taggedpackage

Stay in the Composition perspective, but switch to the Jar hierarchy. Any jars that contribute to org.jboss.security are tagged (with a blue dot).

Taggedjars1

Taggedjars2

The solid tag jboss-srp-client.jar and jboss-srp.jar indicate that all of the contents of the jar are tagged – i.e. are in the package org.jboss.security.  The faded blue dots indicate that only part of the jar is tagged – you can drill down to find out what other stuff is in the jar – in this case, contributions to the package org.jboss.crypto:

Jardrilldown

Working the other way, tag jboss-srp.jar and switch back to the Package hierarchy to see how the code in that jar contributes to the package structure:

Packagedrilldown

The dependency diagram shows how the classes in the tagged jar interacts with the rest of the package:

Classdiagram_1

There are times when understanding how different hierarchies relate can be just what the doctor ordered.

 

DevX review of Structure101


"Getting your arms (and eyes) around large, complex code bases has never been easy, but Structure101 from Headway Software may just be the elegant solution to this age-old problem. Find out how this visual design tool analyzes your enterprise projects and lets you zone in on issues quickly and gracefully."  Full Story by Derek Lane.

 

Spring 2’s architecture – A single dependency cycle slipped in


The Spring guys have let a single dependency cycle into their architecture. A very small flaw, but it’s a perfect example of why you need to check your code-base at different levels to keep it truly tangle-free.

I did a quick analysis of the Spring Framework some time back and sure enough found their claims of a cycle-free architecture to be correct – a pleasure to behold!

The recent announcement of Spring 2.0 rc4 prompted me to point Structure101 version 2 at same and check they were keeping up the high standard they had set themselves. The Structure101 “notables” quickly took me to the org.springframework.aop package which contains the following tangle:

Springtangle_1

Ok, this is not exactly a fatal flaw, far from it, but it surprised me because I know that Juergen keeps an eye on this stuff. Then I took a look at the leaf package slice (leaf packages being packages that contain classes), and guess what? Not a single tangle.  It is only when you look at the slice one level up that the tangle is apparent:

Springtangle2

Taking a look at the leaf packages contained by these 2 packages:

Springtangle3

(The package names are relative to org.springframework.aop). The dependency between the tagged packages (blue dots) is the one causing the problem. Overlaying the parent package boundaries on this graph, you can see why it is that, although the package diagram is acyclic, dependencies between the parent packages go in both directions, making them cyclically dependent.

Springtangle4

I presume they check only at the flat-package level, which is why this one slipped through the net.

 

Tracking complexity debt


Un-monitored, the complexity of a code-base increases with its size. Jboss and Struts are perfect examples. However monitoring complexity helps you keep complexity debt under control, or even down to zero.

If you publish the last couple of years worth of releases of your project to a Structure101 repository, you’ll probably see something like this in the Structure101 Tracker web application:

Jbossxschart

jboss over time

Structure101 matches the amount of XS to the lines of code that cause it. Unless someone pays attention to it, the same team will tend to code-in a consistent degree of complexity debt as they go – in this case they’re running at a not-unusual but alarming 80%.

Struts shows a similar chart running at about 57% XS:

Strutsxschart

struts over time

I was in with the Prime Carrier guys yesterday. They’ve been tracking their excess complexity for several months now and it really shows.  Here’s the chart for one of their projects:

Pc1

Prime Carrier project 1 over time

As you can see, XS is hugging the zero line even as the size of the code-base grows. This is why we call it “excess” (XS) – it really is excessive and totally avoidable. Occasionally a cyclic dependency (tangle) or fat class may get into the build, but it’s flagged and usually fixed during the next iteration.

They recently released a new version of another product under severe deadline pressure and consciously decided to pick up a bit of short-term debt. As you can see, they’re already paying it back before it incurs too much “interest”:

Pc2

Prime Carrier project 2 over time

These are a great bunch of hard-nosed developers who are keeping the code-base clear of debt so they can focus on the principal – customer requirements.