Interactivity in the Generation of Test Cases with Evolutionary Computation
Publication
Currently under review.
Abstract
TO-DO
Additional material
Introduction to EvoSuite
EvoSuite is a popular search-based test generation tool for Java classes based on evolutionary computation of test cases, independently or as part of whole test suites. Further information on the main features, execution and extension of this tool can be found in its webpage. We make use of EvoSuite to implement a proof of concept that allows interacting with a tester to incorporate readability subjective assessment of test suites. In this proof of concept, the tester will be asked to evaluate the readability of the most promising candidates (i.e., those with the highest coverage) with the aim to guide the search towards the generation of more readable test suites. To do this, we have made some modifications in EvoSuite. First, a new class has been added:
- InteractiveEvaluationSecondaryObjective: this class represents a new secondary objective that considers readability scores assigned by testers through interactivity. It contains two main methods:
- setReadability(): selected candidates for evaluation are minimized (to remove irrelevant statements and improve their appearance), shown to the tester and the readability scores provided by the tester are assigned to their respective candidates.
- compareChromosomes(): this method allows comparing two candidates based on their readability scores.
Also, the main classes and methods of EvoSuite that have been modified are listed below:
- TestSuiteSecondaryObjective: this class now incorporates the possibility to select the new secondary objective InteractiveEvaluationSecondaryObjective for the execution.
- Chromosome: this class, representing a candidate, has been added a new attribute to store a value associated with its readability, called readabilityValue.
- GeneticAlgorithm: the changes to this class mainly correspond to the following two methods:
- updateSecondaryCriterion(): this method now allows enabling or disabling the new secondary objective based on the selected properties for the adjustment of the interaction time and the frequency of the interactions.
- sortPopulation(): when the new secondary objective is enabled and there are ties between candidates, a subset of the most promising candidates are selected for their evaluation before the sorting process takes place.
- ReplacementFunction: when the new secondary objective is enabled and there is a tie in the replacement process between children and their parents, the best child and its best parent are shown to the tester before the replacement takes place.
- Properties: some new properties have been added to customize the interaction schedule and the information integration strategy:
- When_to_revise: Sorting, replacement or in both cases.
- Max_times_sort/Max_times_replace: Number of times the user is willing to interact at most during the search when sorting and replacing the population, respectively.
- Percentage_to_revise: Percentage of candidates in the population that the user is willing to review at most in one interaction. This option is only taken into account when the number of tied candidates surpasses this percentage. In that case, some candidates are discarded at random to remove the excess of selected candidates to review.
- Revise_after_percentage_of_coverage: Percentage of all the goals that the best candidate found so far has to cover before enabling the new secondary objective. This can be useful to prevent too early revisions.
- Frequency: Number of generations that have to be produced between one interaction and the next one.
- Revisit_candidates: This option specifies whether already valued candidates will be shown to the tester in case they appear later in the search again. When this option is true, the tester will be informed about the value previously assigned to it, and he/she will have to decide whether to preserve that score or change it. Otherwise, already valued candidates will not be shown to the tester and will be directly assigned the former value. This is achieved thanks to an archive where selected candidates for readability assessment are saved once minimized.
Step-by-step illustrative execution
1. Start of the search until secondary objective activation
The search starts with the new secondary objective disabled. Hopefully, the fitness of the candidates –i.e., their coverage of goals– will be increased overall with each new generation. When the coverage of the best candidate is greater or equal to Revise_after_percentage_of_coverage, the secondary objective is enabled. In our example, this happens at the 15th generation:
2. Selection of candidates
In our example, 26 out of 50 candidates present the same best fitness. However, according to Percentage_to_revise, only 3 of them can be selected for revision. These three candidates are then chosen at random:
3. Minimization
The selected candidates are minimized. In our example, two of the three candidates present exactly the same minimization. Therefore, they are grouped together and only two test suites are shown to the tester:
4. Interaction with the tester for readability assignment
At this point, EvoSuite is prepared for an interaction. The two minimized versions are:
The execution is paused to wait for the tester’s feedback. In our example, the second minimized test suite is given a better score (8) than the first one (6):
5. Information integration and sorting
The readability scores are transferred to their respective candidate test suites, including those in the same minimization group. These candidates are re-ordered in the population going from highest to lowest legibility score. In our example, the test suite with id=119 has the best score (legibilityValue=8), and, therefore, is placed at the first position:
6. Evolution continues until next interaction
The search is resumed, checking if new interactions are needed to break ties at the configured Frequency (10 generations). In our example, no ties are produced at the 25th generation, so the next interaction happens at the 35th generation. Since the maximun number of interactions (Max_times_sort) has not been exhausted, a new interaction is prepared. As in step 3, minimization is executed for the three candidate solutions selected. In this case, each test suite has a different minimized version:
7. Second interaction with the tester for readability assignment
The three minimized test suites are shown to the tester, who assigns new readability scores:
8. Final test suite returned after the search
No additional interactions are scheduled, so the evolutionary algorithm ends the search. The returned test suite corresponds to the candidate solution with highest coverage and best readability coverage (the 3rd candidate test suite in the last interaction). Notice that EvoSuite adds some test cases to cover additional goals at the end of the search, using an archive of pairs (test case, test goal) taken from any individual within the population. Final statistics are computed, showing that including the readability assessment does not impede coverage improvement.