EvoSuite-interactive-example

Interactivity in the Generation of Test Cases with Evolutionary Computation

Publication

Aurora Ramírez, Pedro Delgado-Pérez, Kevin Valle-Gómez, Inmaculada Medina-Bulo, José Raúl Romero. Interactivity in the Generation of Test Cases with Evolutionary Computation. IEEE Congress on Evolutionary Computation. June 2021.

Abstract

Test generation is a costly but necessary testing activity to increase the quality of software projects. Automated testing tools based on evolutionary computation principles constitute an appealing modern approach to support testing tasks. However, these tools still find difficulties to detect certain types of plausible faults in real-world projects. Besides, recent studies have shown that, in general, automatically-generated tests do not resemble those manually written and, consequently, testers are reluctant to adopt them. We observe two key issues, namely the opacity of the process and the lack of cooperation with the tester, currently hampering the acceptance of automated results. Based on these findings, we explore in this paper how the interaction between current tools and expert testers would help address the test case generation problem. More specifically, we identify a number of interaction opportunities related to the object-oriented test case design driven to boost their readability and detection power. Using EvoSuite as base implementation, we present a proof of concept focused on the possibility to integrate readability assessment of the most promising test suites into a genetic algorithm.

Additional material

Introduction to EvoSuite

EvoSuite is a popular search-based test generation tool for Java classes, which uses evolutionary computation to create test cases, independently or as part of whole test suites. Further information on the main features, execution and extension of this tool can be found in its webpage. We make use of EvoSuite to implement a proof of concept that allows interacting with a tester to incorporate readability subjective assessment of test suites. In this proof of concept, the tester will be asked to evaluate the readability of the most promising candidates (i.e., those with the highest coverage) with the aim to guide the search towards the generation of more readable test suites. To do this, we have made some modifications to EvoSuite classes. First, a new class has been added:

InteractiveEvaluationSecondaryObjective: this class represents a new secondary objective that considers readability scores assigned by testers by means of interaction. It contains two main methods:
- setReadability(): selected candidates for evaluation are minimized (to remove irrelevant statements and improve their appearance), shown to the tester and the readability scores provided by the tester are assigned to their respective candidates.
- compareChromosomes(): this method allows comparing two candidates based on their readability scores.

Also, the main classes and methods of EvoSuite that have been modified are listed below:

TestSuiteSecondaryObjective: this class now incorporates the possibility to select the new secondary objective InteractiveEvaluationSecondaryObjective for the execution.
Chromosome: a new property, called readabilityValue, has been added to this class (representing a candidate test suite) to store a value associated with its readability.
GeneticAlgorithm: the changes to this class mainly correspond to the following two methods:
- updateSecondaryCriterion(): this method now allows enabling or disabling the new secondary objective based on the added properties to adjust the interaction schedule (these properties are explained below).
- sortPopulation(): when the new secondary objective is enabled, and ties between candidates are detected, a subset of the most promising candidates are selected for their evaluation before the sorting process takes place.
ReplacementFunction: when the new secondary objective is enabled and a tie happens in the replacement process between children and their parents, the best child and its best parent are shown to the tester before the replacement takes place.
Properties: six properties have been added to customize the interaction schedule and the information integration strategy (see the paper for details about these components):
- When_to_revise: Sorting, replacement or in both cases.
- Max_times_sort/Max_times_replace: Number of times the user is willing to interact at most during the search when sorting and replacing the population, respectively.
- Percentage_to_revise: Percentage of candidates in the population that the user is willing to review at most in one interaction. This option is only taken into account when the number of tied candidates surpasses this percentage. In that case, some candidates are discarded at random to reduce the number of candidates to review.
- Revise_after_percentage_of_coverage: Percentage of all the goals that the best candidate found so far has to cover before enabling the new secondary objective. This can be useful to prevent too early revisions.
- Revise_frequency: Number of generations that have to be executed between one interaction and the next one.
- Revisit_candidates: This option specifies whether already valued candidates will be shown to the tester in case they appear later in the search again. When this option is true, the tester will be informed about the value previously assigned to it, and he/she will have to decide whether to preserve that score or change it. Otherwise, already valued candidates will not be shown to the tester and will be directly assigned the former value. This is achieved thanks to an archive where selected candidates for readability assessment are saved once minimized.

Step-by-step illustrative execution

Here, we describe the execution of the interactive search step by step. For this illustrative example, we select the class ATM from the tutorial package. EvoSuite is configured to evolve a population of 30 test suites for 50 generations, the rest of its parameters are left to default values. The interactive properties are configured as follows:

Property	Value
When_to_revise	Sorting
Max_times_sort	3
Percentage_to_revise	10%
Revise_after_percentage_of_coverage	88%
Revise_frequency	10
Revisit_candidates	False

Next, the search steps in which the interactivity has some influence are described, showing the internal state of the objects and the solutions generated at each moment:

1. Start of the search until secondary objective activation

The search starts with the new secondary objective disabled. Hopefully, the fitness of the candidates –i.e., their coverage of goals– will be increased overall with each new generation. When the coverage of the best candidate is greater or equal to Revise_after_percentage_of_coverage, the secondary objective is enabled. In our example, this happens at the 15th generation:

2. Selection of candidates

In our example, 26 out of 30 candidates present the same best fitness. However, according to Percentage_to_revise, only 3 of them can be selected for revision. These three candidates are then chosen at random:

3. Minimization

The selected candidates are minimized. In our example, two of the three candidates present exactly the same minimization. Therefore, they are grouped together and only two test suites are shown to the tester:

4. Interaction with the tester for readability assignment

At this point, EvoSuite is prepared for an interaction. The two minimized versions are:

First candidate test suite: See as text file, or download as Java file
Second candidate test suite: See as text file, or download as Java file

The execution is paused to wait for the tester’s feedback. In our example, the second minimized test suite is given a better score (8) than the first one (6):

5. Information integration and sorting

The readability scores are transferred to their respective candidate test suites, including those in the same minimization group. These candidates are re-ordered in the population going from highest to lowest readability score. In our example, the test suite with id=119 has the best score (readabilityValue=8), and, therefore, is placed at the first position:

6. Evolution continues until next interaction

The search is resumed, checking if new interactions are needed to break ties at the configured Revise_frequency (10 generations). In our example, a new tie to break happens at the last iteration. Since the maximun number of interactions (Max_times_sort) has not been exhausted, a new interaction is prepared. As in step 3, minimization is executed for the three candidate solutions selected. In this case, each test suite has a different minimized version:

7. Second interaction with the tester for readability assignment

The three minimized test suites are shown to the tester, who assigns new readability scores:

First candidate test suite: See as text file, or download as Java file
Second candidate test suite: See as text file, or download as Java file
Third candidate test suite: See as text file, or download as Java file

8. Final test suite returned after the search

No additional interactions are scheduled, so the evolutionary algorithm ends the search. The returned test suite (text file, Java file) corresponds to the candidate solution with highest coverage and best readability coverage (the 3rd candidate test suite in the last interaction). Notice that EvoSuite adds some test cases to cover additional goals at the end of the search, using an archive of pairs (test case, test goal) taken from any individual within the population. In this execution, the final test suite includes three additional test cases to cover other goals not reached by the best test suite in the final generation (the 3rd candidate previously shown). Also, final statistics are computed, showing that including the readability assessment does not impede coverage improvement.