Architecting the Build Process

Posted by buildmeister on April 14th, 2006
Filed under:  deployment  release  build  process 
There are 0 comments on this article.
Bookmark and Share

Introduction

In a large number of software development projects, build process are still not systematically implemented. Instead they are usually created in an ad hoc fashion and evolve over time with limited effort and resources. If there is an Architect or Buildmeister involved in defining the build process, then they will usually implement the process based on past experience or instinct. Although some guidance and information is available on implementing build tools and environments (and that can be found on the Internet or in technical books),there is still very little industry standard guidance or techniques that can be applied to assess the relative merits of different build processes. Consequently, it is difficult to say whether any one build process is good or bad. Build processes also tend to get "stale" very quickly and often remain untouched because no one can understand them or dare risk changing them. In most projects, the only time that a build process will be changed is on certain key events, such as:

  • the initiation of a new project
  • a fundamental change in system architecture
  • a need to increase a project's delivery rate
  • a requirement to conform to a regulatory compliance mandate

Whatever the reason, it can be really difficult to know where to start in either creating a new build process or to systematically assess the capability of your current build process and know how you could improve it. This article is therefore intended to help with this dilemma by describing how best to architect a typical build process and what a "good" build process might possibly look like. Obviously your own build process will also have some technology specific parameters, however the contents of this article should be generic enough for most environments.

SCM foundation

Any build process is wholly dependent on the environment in which it is implemented in and in particular the version control or Software Configuration Management (SCM) environment. Without control over the versions and baselines in your project you will have little chance of implementing a successful build process. One of the first things that you should do to create an effective SCM environment is to make sure you have drawn up an SCM Plan. Such a plan does not necessarily have to be a large tome (unless you are required to prove some adherence to a software process standard such as CMMI or SPICE); rather it needs to effectively convey the following information:

  • The definition of the roles on your project (such as Configuration Manager, Build Manager, Release/Deployment Manager) and who is responsible for them.
  • The set of tools to be used (such as CVS, Subversion, ClearCase, Make or Ant) and their configuration; for example the SCM server infrastructure.
  • The naming conventions for any metadate that is to be held in the SCM tools, i.e. branch, baseline or tag names, workspace names and so on.
  • The project directory structure; for example the composition of source, build and intermediate directories.
  • The project branching strategy; for example how on-going releases will be managed, whether build or release branches will be used and so on.
  • The definition of the change request management process, i.e., what types of changes are to be captured (such as enhancements or defects) and how they are to be related to builds and releases.
  • The definition of the release and deployment process, i.e., how releases are packaged and deployed to their run-time environments.

Although the definition of a lot of this information might seem like overkill for defining a build or release process, it is very difficult to construct a reliable and automated build process without it. For example, if the project’s baselines and branches are not created according to some convention, then how are you expected to automate the creation of them or hook into them for further automation?

Incremental build functions

In Defining The Build Process, I described a number of possible build functions such as Compilation, Unit Testing and Packaging. There is usually a minimal set of of these functions that you will need to implement for a "useful" build process. The rest you can implement when time allows. Doing sogives you the ability to plan and increase your build capability incrementally and over time. An example of incrementally implementing build functions in three phases is illustrated in the diagram below:

[Incremental Build Functions]

Note that implementing a build function means that it is an automated process, not manual. There is little point in creating a wide-ranging and functionally rich build process if it consists of many manual and error prone steps. However there is a caveat to this. A common misconception in software development is that automation is good and everything that possibly can be automated should be. Although this is true to a degree, automating everything does take time and the question that you need to ask yourself is how often is the task that you are automating going to be run. If it is going to be run 100 times, then great the effort of automating it will be worth it; however, if it is going to be run just two or three times then is the level of investment in automation really worth it? As an example take the Deployment function. If you have a complex deployment process that requires an application to be installed in many different environments then it is essential to script it. However if your deployment process is simply the installation of a pre-packaged application - such as a Windows install file - then you should probably not spend too much time worrying about how to automate its deployment, and instead let users (such as Testers) browse to a known location and install it themselves.

Project rhythm

One of the most important decisions that you will need to make on your project is how often you will create an Integration or Release Build. There are at least three standard build schedules in use today:

In general the more often you build, the more integration problems you will discover at an early stage. Although most organizations recognise this basic premise it is still amazing the amount who do not carry out at least a Daily Build and Smoke Test. This is usually because they do not know or communicate their develop and delivery rate and so it is hard to say the optimal time of when a build should be carried out. To be able to define which build schedule would be more appropriate you can start by discovering your "project rhythm".

To some degree your release schedule will already be defined by project and customer expectations. However, internal to the project you will have greater control over how changes are integrated and on what schedule. There is no pre-defined schedule; you need to find your own “project rhythm” that suits you best, typically by looking at patterns such as how long the build takes, how often developers can sensibly deliver and so on. If you are adopting a Continuous Integration approach, this can mean building many times a day: maybe every 20 minutes or every hour. With more traditional forms of development this can mean building once a day (maybe as part of a nightly build). Your “project rhythm” will also be determined by any dependencies that exist between components or projects and how long the actual build process takes from beginning to end.

The types of questions that you can ask to help determine your own “project rhythm” are as follows:

  • Are there any dependencies between internal components? If so, do the components have to be built in any specific order?
  • Are there any dependencies on the output of external projects or components? If so, how will these outputs be made available?
  • How many developers are working on the project? How often are they expected to deliver changes into the integration area?
  • What are the inputs for the developer’s Private Builds? Will they build against the complete source code structure or against a partial structure and pre-built binaries from the Integration or Release Build?
  • How long does it take to execute a full Integration or Release Build (including workspace population, compilation, unit-testing and internal deployment)?
  • How often are Integration Builds expected to be carried out? Where will the outputs be stored?
  • How often are Release Builds expected to be carried out? Where will the outputs be stored?

This is not an exhaustive list of questions but it should be enough to help you determine the set-up of your initial automated build and release environment. In the next section, I will answer some of these questions in the context of a working example.

One important point to note is that any one project might have multiple frequencies, typically one for each scale of integration or type of build as illustrated in the diagram below:

[Build Frequency]

In this diagram as the scale of integration progresses builds are carried out less frequently. This is normally because each kind of build has a different type and level of visibility, and hence a more appropriate frequency to go with it as is illustrated in the table below:

Build type Consumers Frequency between builds

Private Build

Developers

minutes/hours

Component Integration Build

Individual Development Team

hours/days

Multi-Component Integration Build

Multiple Development Teams

days/weeks

System Integration Build

System Release Management

weeks/months

Release Build

Customer and/or end-user

months/quarters

As the table shows, each type and scale of build has a direct consumer, and the consumer needs to be involved in deciding what the build frequency is. It should be as frequent as possible, but may also be subject to constraints within the tolerance level of the consumer, for example System Release Management, might not be able to tolerate a build per-day for each component, or customers might not be able to easily accept a new releases as often as monthly or even quarterly without significantly disrupting their own environment. The important thing is to plan and identify when the different types of builds will typically happen in your own environment.

Build componentization

The level of communication and control that a continual, repeatable build process can give to a project team cannot be understated. However if your build process takes too long you will be effectively delaying this feedback until the build has completed (with either success or failure). Exactly what "too long" is only you can define, however by assessing and finding your "project rhythm" as described in the section above you should be able to at least being to decide on this. You should also be able to easily (re)build composite parts of your complete application if and when necessary, for example if you need to implement and release a customerpatch or hotfix.In order to meet both of these requirements you should componentize your build process where possible. At the application level build componentization will typically be aligned with your high level system architecture, however there are a number of other areas where build componentization can be implemented as follows:

  • decompose your build process into build functions and steps
    There are typically many areas where you could thread your build process - running different, unrelated parts in parallel. To do so you can first break your build process down into discrete functions as discussed in Defining The Build Process. A typical build function should be able to be applied to any software component for example Compilation of the Credit Card Validation component.You can then break down those functions into individual steps or commands, for example the step to compile include files and the step to compile source files.When you have achieved this you can either script the execution of these steps in parallel yourself or use a build control/automation tool  to help achieve this.
  • stage previously built objects to create a build pipe-line
    Often software development teams build everything - the complete code base - at each build. This obviously means that each build will take a predictable but maximum amount of time. Although such a process might still be required for Release Builds - when you are deploying the build to external customers - for ongoing development and integration purposes this can significantly slow things down. One of the ways to mitigate this is to create a build pipeline and stage the outputs of certain build components. You canthen consume them as binaries rather than rebuilding them from scratch. This requires careful management and composition but can also help in other areas such as re-use and the prevention of developer workspace pollution.
  • execute unit testing in phases
    If you have a comprehensive suite of unit tests - and I sincerely hope you have - then executing the complete suite may take a significant amount of time and increase the total build time. One of the ways to mitigate this is execute unit testing in phases, for example you could execute a core set of unit tests during the day so that development could gain essential feedback and progress and then execute the complete suite of tests overnight.

Following some of these techniques as opposed to always carrying out a full rebuildmight introduce a degree of risk, however I believe that this risk is worth taking in the short term in order to rapidly progress development. Also, I prefer to only rebuild if strictly necessary - certainly at the component level. If a component has not changed then consume the previous built version rather than re-building it. Your customers will often make you do this too. Although you might feel thatit is as quick and more reliable to do a complete build, try giving a complete set of re-built binaries to a customer when only a minor change has been made!

Infrastructure consolidation

In Defining the Build Process, I discussed the typical components of a build infrastructure. Implementing and administeringsuch an infrastructure can be costly and time consuming for an individualproject. Therefore, from an organizational point of view it makes sense to consolidate and share this infrastructure across many projects. If your organization already has a central or virtual SCM team then suchconsolidation should be relatively straightforward to agree, implement and administer. However, one of the concerns of such an implementation for an individual project is ensuring that sufficient resource is available for them when they need to execute their builds, i.e. there is no gaurantee that other projects haven't already completely loaded the build infrastructure. In order to mitigate against this a number of strategies can be implemented:

  • implement consolidation at the business unit level - although some organizations agree with the concepts of build infrastructure consolidation they prefer to implement it at the organization or business unit level. By doing so they ensure that anyresources are at least being utilised by the same nominal cost centre.
  • implement a mixture of dedicated and shared infrastructure - in order to gaurantee that a minimum amount of build servers are available you can allocate one or more dedicated servers to a project. When you subsequently execute a build, it will be executed on these servers first and then check the shared pool of servers to see if resources are available. If so the build load will be spread to these servers thus reducing total build time.

In order to be able to implement these techniques you will need access to technology that manages load balancing and sharing across servers. There are many hardware and software solutions to achieve this as well as build specific solutions like IBM Rational BuildForge or Electric Cloud ElectricAccelerator. Note that as well as consolidating the actual build server infrastructure, it can also be cost effective to consolidate build server tools. However, since build processes are project-specific you will need to ensure that project's have sufficient access rights to the tools.

Model-view-controller design

Model-view-controller (MVC) is a well known software design pattern that separates an application's data model, user interface and control logic into three distinct components in order that modifications to one component can be made with minimal impact to the others. Adopting such an approach to the design of your build scripts enables you to raise the level of abstraction and visibility of your scripts and make maintenance easier. An example of how you could achieve this is as follows:

  • Model - the core logic of your build scripts, usually the discrete build functions that you wish to carry out, e.g. steps to compile an application.
  • Controller - an overriding beginning to end process that calls the individual build script functions, e.g. process to build a complete application.
  • View - any mechanism or tool that presents the build scripts functions and process so that you can execute, view and report on its implementation, e.g. a tool such as CruiseControl or IBM Rational BuildForge.

As well designing your build scripts in such a way, you should also aim to separate out configureable data from the core logic of the model. For example, instead of hard-coding values in your scripts such as directory names, environment variables and debugging parameters, place them in separate configuration files.

Adopting this type of approach will raise the abstraction of your build process so that it is visible and executable by many members of your development team - not just buildmeisters and also allow you to re-use build functions in different processes.

Build metrics

Most peopleuse the build process as the primary mechanism for capturing project metrics; for example, when executing a build you might gather metrics on unit tests passed, code coverage orlines of code. However very few people actuallycapture metrics about the build process itself, such as how many builds have passed or failed, the total/average time for builds and so on. An example of the types of metrics that can be captured are are as follows:

  • Number of passed/failed builds per project
  • Build confidence - the ratio of successful to failed builds
  • Average total build time per project
  • Components reused in a build
  • Change tasks and/or Defectsimplemented in each build

Some of these metrics will be easy to capture, some more difficult. In particular "Components reused in a build", is not a trivial metric to capture, however I believe this metric is vital if you developer shared or reusable components. Being able to illustrate how these components have been used will go a long way in justifying the cost of preparing/packaging components for reuse.

I'm sure there are lots of other metrics for the build process that you could think of. However, the point with any set of metrics is to define a small set, automate their capture and continually assess them. If people are used to seeing certain metrics then it gets so much easier to recognize bad "smells" and to consequently do something to address them.

Summary

To summarize, architecting a build process is not fundamentally difficult however there are a few key concepts that you should consider if you are either implementing a new build process or refining an existing one. Most people tend to get involved straight away in the technology aspects of the build process, however their are other non-technology aspects that you should also consider and which I have discussed here. One of the keys to achieving a good build process is to continually assess, refactor and refine it just as you would any other software development artifact. Also, try and involve other members of the wider development team in its definition, for example ask Testers how they would like the build process to help them with reducing testing time and cycles, or ask Project Manager what kinds of metrics would be useful for them. Raising the profile of the build process in such a way will also make it easier to justify investment in time and resources when you need to change it. Finally, for a pragmatic list of tips that can help you define a complete, consistent and reproduceable build process see here.

Bookmark and Share

Comments

There are no comments on this article.

Back to Top

Submit a new comment

All fields in bold are required.