Continuous integration
Martin Fowler & Matthew Foemmel
© CopyRight Martin Fowler, All Rights Reserved
Original link: http://martinfowler.com/articles/continuousintegration.html
Chinese transcripts: http://gigix.topcool.net/download/continuousintegration.pdf
Translator: On January 23, 2002, we are honored to listen to Mr. Martin Fowler in the online communication of the Umlchina organization. In communication, Martin Fowler recommended this article to all China Software Developers: Continuous Integration ("Continuous Integration"). In the first reading, I felt its components, Agilechina's Lin Xing also praised: "The idea is very good. The master is master." Then, I used it for a week, I finally translated this article. Readers.
Since this is a gift for Mr. Fowler to all Chinese software developers, I will never dare to exclusive. Anyone can reprint this article anywhere, but please keep this article integrity during reprinting - including title, copyright statement, description link, translator ... In short, please do not make any changes or add any changes in reprinting. In addition, if you can give me a mail when you reprint it, I will be more happy.
Below, please start appreciation of this wonderful article.
There is an important part in any software development: getting a reliable software creation (Building) version. Despite the importance of creation, we will still surprise because of the failure of creation. In this article, we will discuss the process of implementing Matt (Matthew Foemmel) in an important project of Thoughtworks, which is increasingly valued in our company. It emphasizes fully automated, repeatable creation processes, including multiple automation tests running multiple times a day. It allows developers to integrate system every day, thereby reducing problems in integration.
ThoughtWorks has opened the source code of CruiseControl software, which is an automated integration tool. In addition, we also provide consultant services in CruiseControl, Ant, and continuous integration. If you need more information, please contact Josh Mackenzie (Jmackenz@thoughtworks.com).
This article has the following main contents:
The more integrated integration of continuous integration, the better the results, the successful creation is? Single code source automation creation script self-test code master creation code belongs to summary
There are a variety of "best practices" in the field of software development, and they are often talked, but there seems to be rarely realized. These practices are the most basic and most valuable: there is a fully automated creation, testing process, allowing the development team to create their software multiple times a day. "Day creation" is also a view that people often discuss, McConnell creates Japan as a best practice in his "Fast Software Development", which is also a development method that Microsoft is famous. However, we support the view of XP community: day creation is just the minimum requirement. A fully automated process allows you to do multiple creation multiple times a day, it is also entirely worth it.
Here, we use "Continuous Integration" terms, this term is from a practice of XP (extreme programming). But we believe that this practice has long been present, and many people who have not considered XP are also using it. However, we have always used XP as a standard for software development process, and XP also has a profound impact on our terminology and practice. Despite this, you can still use only continuous integration without having to use any other part of XP - In fact, we believe that: Connected integration is a basic part of any practical software development activities. Implementation Automation Day Creation requires the following work:
Save all source code in a single location, allowing everyone to get the latest source code (and previous versions) from here. Enable the creation process to completely automate anyone, you can complete the creation of the system when you enter one command. Enable the test to automate, so that anyone can only enter a command to run a complete system test. Make sure everyone can get the latest, best executable.
All of this must be guaranteed by the system. We have found that introducing these systems to a project requires considerable energy. However, we have also found that once the system is established, keep it normal operation without spending how much effort.
Sustainable integration
The biggest difficulty describing the continuous integration is that it fundamentally changes the entire development mode. If you don't have worked in a continuous integrated practice environment, you'll be difficult to understand its development mode. In fact, most people can feel this atmosphere while working alone - because they only need to integrate with their own systems. For many people, the word "team development" always allows them to remember some of the problems in the field of software engineering. Sustained integration reduces the number of these problems, in order to a certain system.
The most basic advantage of continuous integration is: it fully avoids the "Insect Conference" of developers - the previous developers often need to open this, because someone stepped into the field of others when they work, affected Others' code, and those affected still don't know what happened, so BUG will appear. This bug is the most difficult case, because the problem is not in the field of a person, but is on the exchange of two people. As time goes by, the problem will gradually deteriorate. Typically, the bug appearing in the integration phase has already existed before and even a few months. As a result, developers need to consume a lot of time and effort in the integration phase to find the root of these bugs.
If you are using continuous integration, the vast majority of such bugs can be found in the same day introduced. Moreover, because there is not much part of the change in the day, the location of the error can be found quickly. If you can't find BUG, you can also integrate these annoying code into the product. So, even if you have the worst case, you just don't add the characteristics of the bug. (Of course, you may have a hatred of new features, but at least you can choose more.)
So far, continuous integration can not guarantee the bug that appears when you caught all integration. Sustained integrated troubleshooting ability depends on test technology, well known, and testing cannot be proven to have found all errors. The key is that continuous integration can catch enough bug in a timely manner, which has been on the overhead of it.
Therefore, continuous integration can reduce the time of "catching insects" in the integrated phase, thus ultimately improving productivity. Although it is not known whether some people have scientific research on this method, it is obvious that it is quite effective as a practical approach. Continuous integration can significantly reduce the time in "integrated hell", in fact, it can turn hell into a piece of dish.
The more frequently integrated, the better the effect.
Sustained integration has a basic key points that are contrary to intuition: the frequent integration is better than rare integration. For continuous integrated practitioners, this is natural; but for those who have never practiced sustained integration, this is contradictory with intuitive impression. If your integration is not frequent (less than once every day), then the integration is a painful thing, it will cost you a lot of time and energy. We often hear some people say: "In a large project, you can't use day creation." It is actually a very stupid point of view.
However, there are still many projects practice sustained integration. In a 50,000-line code project, we have to integrate more than 20 times a day. Microsoft is still created during the project of tens of millions of lines.
Sustained integration, because the integrated workload is proportional to the square of the two integrated intervals. Although we have no specific measure of data, it can be estimated that the workload required for integration once a week is definitely not 5 times the integration every day, but approximately 25 times. So if integration makes you feel pain, you may explain that you should integrate more frequently. If the method is correct, more frequent integration should reduce your pain, let you save a lot of time.
The key to continuous integration is automation. Most integrations can be completed automatically. Read source code, compile, connect, and test, which can be done automatically. Finally, you should get a simple information, tell you whether this creation is successful: "Yes" or "no". If successful, this time is integrated; if you fail, you should be able to undo the last modification, go back to the previous successful creation. During the entire creation process, you don't need your brain.
If you have such an automation process, you just want to create more frequently created. The only limiter is that the creation process itself will consume a certain time. (Translation: But compared with the time required to catch insects, this time is negligible.)
What is the success of a successful creation?
There is an important thing to determine: What kind of creation is successful? It looks very simple, but such a simple thing sometimes becomes a mess, it is worth noting. Once, Martin Fowler went to check a project. He asked whether the project was created in the date of executing day, got a sure answer. Fortunately, Ron Jeffries were also present, and he mentioned a question: "How do you deal with creation error?" Answer Yes: "Let's send an e-mail." In fact, this project has not been successful for several months. created. This is not created day, this is just the attempt to create.
We are quite confident for the following "successful creation" standards:
All the latest source code is configured to manage system verification Qualified All files are connected by recompileable target files (in us is Java Class file) through connections, get the executable file system start running, the system test suite ( There are about 150 test classes here, start running if all steps are not wrong, no one is interference, all tests have passed, we have got a successful creation
Most people think "compile connection = creation". At least we believe that creation should also include launching applications, simply testing the application (McConnell is called "smoke test": Open the switch to run the software, see if it will "smoke"). The more detailed test set can greatly improve the value of continuous integration, so we will prefer more detailed testing.
Single code source
In order to achieve daily integration, any developer needs to be able to easily get all the latest source code. In the past, if we want to integrate, we must run through the entire development center, ask every programmer to have a new code, then copy these new code, then find the appropriate insertion position ... Nothing than this worse . The way is very simple. Anyone should be able to bring a clean machine, connect to the LAN, and then get all the source files with a command, start the system's creation.
The easiest solution is to use a set of configuration management (source code control) system as a source of all code. Configuration management systems are usually designed with network functions, and with tools that make developers easily get source code. Moreover, they also provide version management tools so you can easily find the previous version of the file. Cost is even less problem, CVS is a set of excellent open source configuration management tools.
All source files should be saved in the configuration management system. I said this "all" often more than people think, it also includes creation scripts, attribute files, database scheduling DLL, installation scripts, and other things you need to create on a clean machine. I often see this situation: The code has been controlled, but some other important files can't be found.
Try to make sure all things are stored in the same code source tree of the configuration management system. Sometimes people use different projects in the configuration management system in order to get different components. This is the trouble that is: people have to remember which component is used which version uses other components. In some cases, you must separate the code source, but this happens is much smaller than you think. You can create multiple components from a code source tree, which can be solved by creating scripts without having to change the storage structure.
Automation creation script
If you have written a small program, there are only more than a dozen files, then the creation of the application may just be a command: javac * .java. More projects require more creation: You may put your files in many directories, you need to make sure that the target code is in place; in addition to compilation, there may be a connection step; you may also be from other The code is generated in the file, which needs to be made before compiling; the test also needs to be run automatically.
Large-scale creation will often take some time, if only a little change is done, of course, you will not want to re-do all these steps. So good creative tools automatically analyze the parts that need to be changed, the common method is to check the dates of the source file and the target file, only when the dates of the source file is late, it will be recompiled. As a result, the dependence between the files requires a little skill: if a target file changes, only those target files depend on its target file will be recompiled. The compiler may handle this type or it may not.
Depending on your own needs, you can choose a different type of creation: The system you created can have test code, or you can choose different test sets; some components can be created separately. Creating scripts should allow you to choose different creative goals according to different situations.
After you enter a simple command, help you provoke this heavy burden is often the script. You may use a shell script or a more complex scripting language (such as Perl or Python). But soon you will find a specially designed creation environment is useful, such as the Make tool under UNIX.
In our Java development, we will soon find a more complex solution. Matt used a considerable amount of time to develop a creation tool for enterprise Java development, called Jinx. However, we have recently turned to use open source creation tool ANT (http://jakarta.apache.org/ant/index.html). Ant's design is very similar to JINX, but also supports Java file compilation and JAR packaging. At the same time, it is also easy to write Ant's extension, which allows us to complete more tasks during the creation process. Many people use IDE, and most of the IDEs contain the functions of creation management. However, these documents are dependent on specific IDEs and are often relatively fragile, but also need to work in IDE. IDE's users can build their own project files and use them in their own separate development. But our main creation process is built with ANT and runs on a server using Ant.
Self-test code
It is still not enough to make the program. Although a strong type of language compiler can point out many problems, even if it is successful, it may still leave a lot of errors. In order to help track these errors, we highly emphasize the automation test - this is another practice of XP advocating.
XP divides the test into two categories: unit testing and accommodating test (also called function test). The unit test is written by the developer, usually only one class or a group of classes. Access tests are typically prepared by the customer or external test group in the help of the developer, and the end-to-end test of the entire system is used. We will use these two tests and try to improve the degree of automation test.
As part of the creation, we need to run a group of tests called "BVT" (Build Verification Tests, create a confirmation test). All tests in BVT must pass, and then we can announce a successful creation. All XP-style unit tests belong to BVT. Since this article is about the creation process, what we say "test" basically refers to BVT. Remember, in addition to BVT, there is also a test line exists (translation: referring to a function test), so don't mix BVT and overall testing, QA, etc. In fact, our QA team will not see code without BVT because they only test successful creation.
There is a basic principle: while writing code, developers should also write corresponding tests. After the task is completed, they must not only return the CHECK IN product code, but also to return these code tests. This is also very similar to the "test first" programming style of XP: before writing the corresponding test, and see the test failure, you should not write any code. So, if you want to add new features to your system, you should first write a test. This test can only be passed after the new features have been implemented. Then, your job is to make this test.
We write these tests with Java, using the same language, so writing tests and writing code is not too big. We use junit (http://www.junit.org/) as an organization, write a test framework. JUnit is a simple framework that allows us to quickly write tests that run the test kit as a kit, and run the test kit with interactive or batch mode. (Junit is the Java version of the XUnit family - xunit includes almost all language test frameworks.)
During the process of writing software, after each compilation, developers usually run a part of the unit test. This actually improves the developer's work efficiency because these unit tests can help you find logic errors in your code. Then, you don't have to debug the error, just pay attention to the code that is modified after the last test is running. This modification should be small, so it's easy to find BUG. Not all people strictly follow the style of XP "test first", but the benefits of writing tests in the first time are obvious. They not only make each person's work efficiency, but the BVT composed of these tests can more capture errors in the system. Because BVT is running several times a day, any questions checked out by BVT are relatively easy. The reason is simple: We only have a considerable small modification, so we can find bugs in this range. Running the wrong miscarriage in the modified piece of code, is of course more effective than tracking the entire system.
Of course, you can't expect to test to help you find all the questions. As people often say: Tests cannot prove that there is no error in the system. However, we are not perfect for our only request. Not enough testing is much better than the "perfect test" that is not always written frequently.
Another related question is: Developers write tests for their own code. We often listen to people: developers should not test their code because they easily ignore the mistakes in their work. Although this is also the fact, the self-testing process needs to quickly transfer the test to the code basis. This rapid conversion value exceeds the value of the independent tester. So, we still use developers to prepare your own BVT, but still have independent preparation tests.
Another important part of the self-test is that it improves the quality of the test by feedback -XP. The feedback here comes from the bug escaping from the BVT. The rules of the self-test are: unless you have added a corresponding test in the BVT, you cannot correct any errors. In this way, whenever you want to correct an error, you must add the corresponding test to make sure the BVT will not put the mistake. Moreover, this test should guide you to consider more tests, write more tests to enhance BVT.
Main creation
Automation of creating a process is very meaningful for single developers, but it really glows, or generated throughout the system. We found that the main creation process allows the entire team to come together and let them find the problem in integration.
The first step is to select the machine running the master. We have chosen a computer called "Torch Car" (we often play "" Imperial Times "J), this is a server with four CPUs, which is very suitable for designing. (This horsepower is required (this horsepower is required) due to complete creation.
The creation process is made in a Java class that keeps run at any time. If you do not create a task, create a process is always waiting, check the code warehouse every few minutes. If no one is returned any code after the last creation, the process continues to wait. If there is a new code in the code warehouse, you will start to create.
The first phase of the created is the code in the warehouse. Starteam has provided us with a very good Java API, so it is easy to cut into the code warehouse. Daemon will observe the warehouse before five minutes, see if there is anyone in the last five minutes to return the code. If there is, the daemon will consider and then extract the code (so as not to extract in the process of the code).
The daemon extracts all the code into a directory of the torch machine. After the extraction is complete, the daemon will call the ANT script in this directory. Then, Ant will take over the entire creation process and make a complete creation of all source code. The ANT script is responsible for the entire compilation process and put the obtained Class file into six JAR packets, published on the EJB server. When Ant completes the compilation and release of work, create a daemon starts running new JAR on the EJB server, and starts running the BVT test kit. If all the tests can run normally, we get a successful creation. Then create a daemon will return to StarTeam and create a number on all extracted source code. Then, the daemon will observe if someone is returned during the creation process. If there is, start again, if not, the daemon returns to its loop, waiting for the next return.
After creating, the Create a daemon will give all the developers who have returned the code to the latest creative code, send an e-mail, and report the creation. If you leave the creation after the code is returned, don't use e-mail to informed the developer, we usually think this is a bad form.
The daemon writes all the steps in the XML format log file. A servlet will run on the stone car, allowing anyone to check the log through it to observe the status created. (see picture 1)
Figure 1: Servlet running on the stone car
The time you create is running and start running on the screen. There are all history, successful, failed records on the left. Click on some of the records, it will show the details of this creation: whether the compilation passes, the result of the test, what changes have occurred ...
We have found that many developers often look at this page because it allows them to see the direction of the project development, and they see the changes that have incorporated the code as people. Sometimes we will put some other project news on this page, but you need to grasp the scale.
It is important to allow developers to simulate the main creation process on their local machine. In this way, if an integrated error occurs, developers can study, debug, without having to implement the main creation process on their own machine. Moreover, developers can also create local execution before returning code, thereby reducing the possibility of the main creation failure.
There is a more important issue here: the main creation should be clean creation (starting from the source code) or incremental creation? Incremental creation will be much faster, but it also increases the risk of introducing errors because some parts are not compiled. And we have the risk that cannot be recreated. Our creation is quite fast (200,000 line code for about 15 minutes), so we are happy to create cleanly every time. However, some teams like to create increments in most of the time, but when those strange problems suddenly appear, they often do clean creation (at least once a day).
Code returns (Check IN)
Using automation creation means that developers should follow some rhythm to develop software, the most important thing is that they should be integrated. We have seen some organizations, they also created day, but the developers did not regain the code. If the developer returns a code for a few weeks, what is the meaning of the day creation? The principle we follow is: Each developer is at least once a day.
Before starting new tasks, developers should first synchronize with the configuration management system. That is, they should first update the source code on the local machine. Write the code on the foundation basis, which will only bring trouble and confusion.
The developer then keeps the file update. Developers can integrate code to the entire system after completion of a task, or integrate in the midway of the task, but every test must be guaranteed when integrated. The first step in integration is to make the developer's local files again synchronize with the code warehouse. All newly changed files in the code warehouse are copied to the developer's working directory. When the file conflicts, the configuration management system will warn the developer. Then, the developer needs to create the work set after synchronous, run BVT for these files, and get the correct result.
Now, developers can submit new files into the code warehouse. After submitting, developers need to wait for the main creation. If the main creation is successful, then this return is also successful. If the primary creation fails, developers can modify local locally. If the modification is simple, you can submit them directly; if the modification is more complicated, developers need to give up this modification, resynchronize their work directory, and continue to develop, debug, and then submit again.
Some systems are forcibly required to return the process one by one. In this case, there will be a creation token in the system. Only one developer can get a token at the same time. Developers get created to create tokens, synchronize files, submit changes, and then release the token. This ensures that you can only have a developer in the update code warehouse during the creation process. However, we found that even if there is no to create a token, we have few troubles, so we don't have to use this method. There are often many people to create submission of code to the same master, but this is rarely created failure, and such errors are also easy to fix.
At the same time, we also let developers will decide the extravagance of the process of returning. This reflects the chance of developers' evaluation of integrated errors. If she feels that it is likely to have an integrated error, then she will create a local creation before returning; if she feels that it is impossible to have an integrated error, then she can return directly. If it makes a mistake, she will discover when she created the run, and then she must give up her own modification and find an error place. If the error is easy to find, it is easy to repair, then this error can be accepted.
to sum up
It is important to develop a system-stricken automation creation process for project control. Many software sages say this, but we found that this process is still rare in the field of software development.
The key is to let all things are fully automated and to integrate frequently so that the error can be found as soon as possible. Then, people can modify what they need to modify at any time because they know: If they have made modifications, it is easy to discover and repair. Once these benefits are obtained, you will find yourself no longer let them.