The following summarizes tuning experiences I've had with development teams that had various levels of tuning experience. These summaries may help provide you with a broad outline for creating your tuning strategy. I've categorized the teams into five levels from no experience to fully competent. Note that these are summaries, not actual reports, so no summary corresponds to any single development team I have worked with.
This development team has no tuning experience or strategy. We start with a half- to full-day presentation of the overall architecture of the system. The focus is entirely on performance, analysing the architecture for bottlenecks. (The obvious potential bottlenecks are easy to spot: they are the connecting lines on the diagrams; the single threaded components; the components with many connecting lines attached; etc.) Refer the team to chapter 13 of Java Performance Tuning which discusses many generic architecture and design issues that affect performance, with an emphasis on shared resources. In addition, chapter 10 on multi-threading and chapter 12 on distributed computing may be applicable.
Inform the team that they need to define performance targets. These targets should be detailed and sufficient to cover all identified bottlenecks and performance requirements. Initially, these targets can take the form of selecting the major user activities and specifying expected response times for these activities. Refer the team to chapter 1 of Java Performance Tuning which covers setting performance targets.
Finally, ensure that the team will address the identified design & architecture issues, including any hardware or operating system configuration changes that may be identified. Ensure also that application performance is measureable for the given performance targets. The team must develop a test environment which represents the running system. This test-bed should support testing the application at different loads, including a low load and a fully scaled load representing maximum expected usage. Refer to chapters 1, 2 and 14 of Java Performance Tuning which discuss performance measurement techniques at the Java and the operating system level.
The team has achieved resolution of the issues raised at level 1. As a consequence of now having well defined performance targets and also being able to accurately measure performance across application boundaries, the actual application performance shortfall can be determined. Having done so, the design and architecture should be revisited as aspects not previously noticed may now be highlighted by the results of the performance measurements. Refer the team to chapter 13 of Java Performance Tuning which discusses many generic architecture and design issues that affect performance.
The team should try out various compilers and, if applicable, various VMs to determine whether a different configuration will improve the performance of the application. Because these changes should not affect the code in any significant way, this should be a low cost, high impact performance tuning measure. (Note that sometimes different compilers and VMs do require changes to the code, which may rule them out.) Refer the team to chapter 3 of Java Performance Tuning which covers the differences between VMs and compilers.
The team also needs to develop code tuning expertise. A pre-requisite to developing tuning expertise is that the test-bed is ready to be used, and that the developers are able to run tests, and able to make changes quickly and easily re-run tests. Assuming this ability is present, code profiling and tuning (sometimes called micro-tuning) can be started. The first tuning activity should be single-user tuning, and introduces the team into how to tune application code. Refer to chapter 2 of Java Performance Tuning which discusses performance measurement techniques and profiling for Java code.
As the developers become more comfortable with the tuning procedures, each subsequent tuning activity speeds up. At this stage, initial proof of concept tuning takes half an hour to half a day per activity, depending on how difficult it is to track down bottlenecks, and how easy it is to identify how to remove the bottleneck. Refer the team to the performance checklists of chapters 4 to 12 of Java Performance Tuning, which give summary lists of techniques for speeding up Java code.
At level 2, proof of concept on removing the bottleneck is preferred to completely fixing and releasing the changes, so that the expertise in identifying and fixing bottlenecks is quickly gained without spending too much time on associated standard development activities such as integration, documentation, testing, etc. Proof of concept bottleneck removal consists of using profilers to identify bottlenecks, then making simplified changes which may only improve the performance at the bottleneck for a specialized set of activities, and proceeding to the next bottleneck.
Multi-user testing can be difficult to set up, and the team needs to be aware that incorrectly simulated data or user activity patterns can invalidate the testing completely - all tuning time could be targeted on the wrong parts of the system and would be completely wasted.
Multi-user tests can typically take a full day to run and analyse. Furthermore, the analysis often identifies several factors that may need to be varied, and many of these factors may need to be varied independently. For this reason, even simple multi-user performance tuning can take several weeks. My usual mode of operation is to analyse and identify possible problems, target the factors most likely to provide an immediate performance boost and attempt to re-test once with those factors altered appropriately. Refer the team to chapters 1 and 13 of Java Performance Tuning, which discuss multi-user tuning.
As the architecture and configuration is now deemed adequate, the team should now target bottlenecks in the Java code using their code tuning expertise. The tuning strategy is straightforward, detailed in chapters 1 & 2 of Java Performance Tuning. Individual improvements are likely to be one of a standard set, listed in the performance checklists of chapters 4 to 12 of Java Performance Tuning. In addition, the main body of text in chapters 4 to 12 of Java Performance Tuning provide many examples of performance tuning Java code, and it is probable that one or more of the examples will be directly applicable.
It is unusual to come across an application requiring level 4 tuning. Sometimes these are specialist applications involving graphical games or intensive numerical calculations. More often, there are isolated subsections of applications that require more intensive tuning. Tuning according to level 3 still applies. However, profiling is often of little use at level 4. Instead loops, structures and algorithms used must be targetted more intensely, and performance improvements must be gradually gained after many successive changes. The downsides to level 4 tuning is that it can take a lot of time, and also leave the code looking much more complex and difficult to maintain.
Chapters 4 to 12 of Java Performance Tuning provide examples of lesser used performance tuning techniques, as well as examples using many of these techiques. Two chapters, 7 and 11, specifically consider the detailed changes needed for level 4 tuning applied to loops, structures and algorithms. In addition, chapter 14 covers tuning the operating system, and at level 4 this type of tuning begins to be more important.
The application currently has adequate performance. If this is a shrink-wrapped application, feedback must be examined to identify any parts of the application that have inadequate performance, and these should be addressed in future versions. In enterprise systems, performance should be continually monitored to ensure that any performance degradation can be promptly identified and addressed. Chapters 13 & 14 of Java Performance Tuning provide details on monitoring and maintaining performance of running systems.