The following is an attempt to define standards for a build environment for myGrid.
From this it appears therefore that myGrid should consist of a number of modules, which are largely independent, or where dependencies are clearly enumerated.
Although its not a specific requirement, it appears at the moment that ant is likely to be the main build tool. This will work so long as we only develop in Java. My experience suggests that ant is clunky when used for other languages.
The best way to address these issues is to define a single unified directory structure which all of the projects should follow. Each project should also design a single build file with standard entry points (targets). These individual build files can then be directed "from above" by a top level build file to perform a unified build. Here I call the individual build files "module build files", and the top level build file "the main build file".
This addresses a number of requirements. The projects maintain independence in their build environment, but can still be addressed in a unified way. It will also mean that if external users want to build several different subsystems of myGrid, but not the whole thing, then they have a unified interface to do so. This clearly also addresses subsystem build.
I considered suggesting just a unified build, and not also a unified directory structure. However the are several reasons for arguing against this. Firstly a simple case of familiarity will make it easier for developers working on more than one package. Secondly, some tasks needed for the unified build will need to operate directly on the directory structure, rather than through the module build files. For example, driving a "javadoc" target through module build files would result in a series of javadoc directories, which would not be hyperlinked. If we want these hyperlinks we have to invoke Javadoc on all of the source files simultaneously (and hope the machine has enough memory). I can see no way around this other than a unified directory structure.
From a secondary non build point of view, this will also have an advantage when using tools. Within developing the "ontologytools" package for instance, we were all using Emacs-JDE, and I was able to write a "project file" which set up all the options appropriately. It would be easy to share this. The same would be true where any two (or more) developers shared an IDE.1
The main disadvantage with this system is that we will end up with considerable code duplication between the module build files. It may be possible to get around this by providing a default module build file, with appropriate hooks, and settable properties. On the other hand it may not. My suspicion is that it wont.
The second disadvantage is that the compiler will not just work out dependencies for us, and we will have to build modules in the correct order. This is addressed in later.
There are any number of potential directory structures, but most of the projects that I have seen are broadly similar. The main consequences of the directory structure relate to the build files, and hopefully no one will care enough about it, to find it worth disagreeing.
The basic directory structure that I am proposing for myGrid is shown in the following table. I have somewhat mixed up end directory structure and directory structure in the CVS. Those directories marked with †should be present in the CVS, those without will be generated directories.
| Directory or File Name | Usage |
|---|---|
| ext/ | All external dependencies should be found here |
| Licenses/† | Location of all the license documents should be here. There may be quite a few of these, as external dependencies will need them. |
| doc/† | Top Level documentation which applies to all modules. Would probably include stuff like this document, general help documentation. Also probably an "AUTHORS" document. |
| javadoc/ | Unified Javadoc repository, created from all of the source. |
| build/ | Used for temporary files during the build. What this includes will depend on the "build mode". |
| build/test-results/ | JUnit test results should go in here |
| build/classes/ | Generated class files. |
| lib/ | For jar files. What this includes will depend on the "build mode". |
| samples† | Examples which run and do cute things |
| module-n† | The various modules that we need to do things. |
Below this directory structure we will have the various modules which actually do things.
| Directory of File name | Usage |
|---|---|
| src/ | All source code should be placed in here. By source we mean, anything that is need to generate the functional software. And not anything else. This would exclude documentation, and also "source" code that is actually built from something else |
| build/ | The place where everything that is generated during the build process goes, which is not required for the final software. Obvious candidates would include .class files. Less obvious would be generated .java files. |
| lib/ | The place where all generated files needed to run the system will go. In general this should include only jar files. |
| ext/ | When packaged stand-alone, external dependencies should go here. These should be copied out from the top level ext/ directory. |
| samples/ | Examples which run and do cute things |
There will be a number of standard build targets that each module build file will need.
| Target | Depends | Usage |
|---|---|---|
| compile | none | Compile everything that needs to be compiled for functional code. Not documentation. Compile means generate .class files, and not .jar files. |
| doc | none | Generate any documentation which needs generating. |
| javadoc | none | Generate javadoc. |
| jar | compile | Package everything up into .jars in the lib/ directory |
| dist | jar | Generate all of the distribution files, both zip and tar.gz |
| test | compile | Run complete test sets on files in build/ directory. |
| clean | none | Delete everything that is generated. This should return the system to what appears out of the CVS. |
| deploy | compile | Create the web apps structure within the build |
| install | deploy | Install webapps to tomcat, using the admin upload client. There are also reload, remove, and list targets using the ant tasks shipped with tomcat |
A couple of these target could probably do with some further subdivision, in particular "clean", which probably needs a separate "clean_classes", which I've found very useful during development. Likewise dist, could depend on dist-zip, and dist-tgz.
I think we will need two different build modes. The first will be "modular", which will build a module on it's own, within its own directory structure. The second will be "unified" which will build the whole system in a unified manner. Essentially this means re-directed the class path, and target directories from "module-n/build" to the top level "build" directory, and so on.
I'm not convinced we need this at the moment. It might be nice to have a distribution which presents mygrid in a single build environment, and it would probably be reasonably easy to achieve. But it might not be, and its probably not worth a lot of hassle.
Either way it can be achieved with a unified use of ant properties.
It may be nice to present the individual developers with the option of altering certain properties of the ant files from within the individual module files. I use this facility for instance to set up various ant properties, such as for instance, the "build.compiler" to jikes (which is faster during development), and "build.compiler.emacs" to true, which makes jikes emit emacs parsable error message (which is the other reason I use jikes, cause it gives column numbers). I normally achieve this by loading a properties file from ".ant/module-name.props. With this technique you can also subvert the build in fairly major ways, which can be very useful under some circumstances (for instance, keeping source in one backed up directory, and all generated files to somewhere else entirely). This is easy to support in the build files.
We have not got an agreed standard for generating documentation. Like IDE's, its probably not a good idea to try and get one. My own feeling is that we should include both the source of the documentation, and a unified presentation format, which will be PDF. Sub-directories should be created where this is appropriate for the format (LaTeX?, docbook or linuxdoc, and so on). Once we have a specified build platform, we should also include documentation builds in the main build, although, for end user convenience, pre-built copies should be in the download!
In one sense it would be really nice to have an automated top level build file, which does everything based purely on the directory names below it.
I am not convinced that this is possible at the moment. As noted previously, we have to build modules in the correct order because of dependencies between them. At first sight this is a disadvantage of the recursive build strategy. However it has one advantage. We have to be explicit about these dependencies, so we will know what they are, and when they are inappropriate. We will also not be able to support circular dependencies (a depends on b, b depends on a), because one of them will have to be built first. My own feeling is that if two modules are intertwined in this way, they should actually be one module.
If we can not support an automated top level build, we should probably have an automated warning system, which checks that the list of modules the build system has, is the same as the directories that exist. That way we will know if we have missed something, but will not have to find a way to define dependencies automatically.
The module build files should be considered to be part of the test suite from the top level build file. It should check whether the module build files implement all the targets that they should do.
Some services will require publication of web documents on tomcat. There does not appear to be an automatic way of doing this at the moment. I suggest all web documents go in "ROOT/module-n-site" where "module-n" is the name of the module. This should avoid name clashes.
At the current time there appears to be no easy solution to the problem of file permissions. The prepare target should try and create the directory, and if that fails (as it will under unix), the developer will have to create that directory by hand as root, and then do a chown. This should be a one off setup cost.
Axis doesn't seem to provide an easy install client, as tomcat does, so jar files will have to be copied into the "AXIS_HOME/lib" directory. We can avoid problems here with the a few rules. No jar files should be placed in lib, unless they are created by the build (in which case they should be named after the module), or unless they are external dependencies, identified by the gather target. This way even if one module overwrites a file placed by another we can guarantee that its the same file. For a unified build even this should not be necessary as the files should all be up-to date.
File permission problems also arise here. The lib directory will have to be readable and writable by the user running ant. This is not great, but I can see no other solution.
There is an issue here of who does what, and who is responsible for maintenance of which bits. My own feeling is that this should be as follows...
The top level build file should be Work Package 8. They should also generate a canonical module build file which other work packages can steal. They should also be responsible for investigating the possibility of a unified module build file with appropriate hooks.
If we can not do unified module build files, individual module developers should be responsible entirely for these.
There are a number of outstanding issues at the moment.
There are a number of requirements that a machine will need before the build will work. These are...
I believe the build system suggested fulfils all of the requirements that I have stated at the beginning of the document. In this version the requirements have come largely from my head, and so are a point for discussion.
The build system also does not appear to be that complex, so should be easy enough to implement. The system described here as the module build files was implemented initially in a few hours by Sean for oiled (which is now in the ``ontologytools'' section of the CVS), and probably has had a couple more hours work by myself and Angus, during subsequent development.
The module build system is also fairly similar to other standard pieces of software like xerces and xalan, and bears a significant resemblance to the GNU autoconf/automake style of build, so should be easy to understand.
The top level build system, also should not be too complex, mostly just invoking module ant files in the appropriate order. They may be slightly more complex if we go for a ``unified build'' mode.
1 It might be worth doing a survey of the developers to find out who uses what IDE. If there are a few main candidates, then it may be worth providing resources to supporting one or two, within myGrid. Our experience with the ontologytools suggested that this saved time and effort over everyone having to do it independently. I am not suggesting here mandating a single IDE, which is likely to be greeting with a stony silence from many quarters. Including me, unless its the one I use, of course.
-- RichCawley - 22 Oct 2002