Acciones
 
(No se muestran 19 ediciones intermedias de 2 usuarios)
Línea 5: Línea 5:
 
<h5>'''Publication'''</h5>
 
<h5>'''Publication'''</h5>
  
Currently under review.
+
Aurora Ram&iacute;rez, Pedro Delgado-P&eacute;rez, Kevin Valle-G&oacute;mez, Inmaculada Medina-Bulo, Jos&eacute; Ra&uacute;l Romero. ''Interactivity in the Generation of Test Cases with Evolutionary Computation''. [https://cec2021.mini.pw.edu.pl/ IEEE Congress on Evolutionary Computation]. June 2021.
  
 
<h5> '''Abstract''' </h5>
 
<h5> '''Abstract''' </h5>
  
TO-DO
+
Test generation is a costly but necessary testing activity to increase the quality of software projects. Automated testing tools based on evolutionary computation principles constitute an appealing modern approach to support testing tasks. However, these tools still find difficulties to detect certain types of plausible faults in real-world projects. Besides, recent studies have shown that, in general, automatically-generated tests do not resemble those manually written and, consequently, testers are reluctant to adopt them. We observe two key issues, namely the opacity of the process and the lack of cooperation with the tester, currently hampering the acceptance of automated results. Based on these findings, we explore in this paper how the interaction between current tools and expert testers would help address the test case generation problem. More specifically, we identify a number of interaction opportunities related to the object-oriented test case design driven to boost their readability and detection power. Using EvoSuite as base implementation, we present a proof of concept focused on the possibility to integrate readability assessment of the most promising test suites into a genetic algorithm.
  
 
<h5> '''Additional material''' </h5>
 
<h5> '''Additional material''' </h5>
Línea 15: Línea 15:
 
<h6><b><i>Introduction to EvoSuite</i></b></h6>
 
<h6><b><i>Introduction to EvoSuite</i></b></h6>
  
TO-DO
+
EvoSuite is a popular search-based test generation tool for Java classes, which uses evolutionary computation to create test cases, independently or as part of whole test suites.
 +
Further information on the main features, execution and extension of this tool can be found in its [https://www.evosuite.org/evosuite/ webpage].
 +
We make use of EvoSuite to implement a proof of concept that allows interacting with a tester to incorporate readability subjective assessment of test suites. In this proof of concept, the tester will be asked to evaluate the readability of the most promising candidates (i.e., those with the highest coverage) with the aim to guide the search towards the generation of more readable test suites. To do this, we have made some modifications to EvoSuite classes. First, a new class has been added:
 +
* <b>InteractiveEvaluationSecondaryObjective</b>: this class represents a new secondary objective that considers readability scores assigned by testers by means of interaction. It contains two main methods:
 +
** <i>setReadability()</i>: selected candidates for evaluation are minimized (to remove irrelevant statements and improve their appearance), shown to the tester and the readability scores provided by the tester are assigned to their respective candidates.
 +
** <i>compareChromosomes()</i>: this method allows comparing two candidates based on their readability scores.
 +
 
 +
Also, the main classes and methods of EvoSuite that have been modified are listed below:
 +
* <b>TestSuiteSecondaryObjective</b>: this class now incorporates the possibility to select the new secondary objective <i>InteractiveEvaluationSecondaryObjective</i> for the execution.
 +
* <b>Chromosome</b>: a new property, called <i>readabilityValue</i>, has been added to this class (representing a candidate test suite) to store a value associated with its readability.
 +
* <b>GeneticAlgorithm</b>: the changes to this class mainly correspond to the following two methods:
 +
** <i>updateSecondaryCriterion()</i>: this method now allows enabling or disabling the new secondary objective based on the added properties to adjust the interaction schedule (these properties are explained below).
 +
** <i>sortPopulation()</i>: when the new secondary objective is enabled, and ties between candidates are detected, a subset of the most promising candidates are selected for their evaluation before the sorting process takes place.
 +
* <b>ReplacementFunction</b>: when the new secondary objective is enabled and a tie happens in the replacement process between children and their parents, the best child and its best parent are shown to the tester before the replacement takes place.
 +
* <b>Properties</b>: six properties have been added to customize the interaction schedule and the information integration strategy (see the paper for details about these components):
 +
**<i>When_to_revise</i>: Sorting, replacement or in both cases.
 +
**<i>Max_times_sort/Max_times_replace</i>: Number of times the user is willing to interact at most during the search when sorting and replacing the population, respectively.
 +
**<i>Percentage_to_revise</i>: Percentage of candidates in the population that the user is willing to review at most in one interaction. This option is only taken into account when the number of tied candidates surpasses this percentage. In that case, some candidates are discarded at random to reduce the number of candidates to review.
 +
**<i>Revise_after_percentage_of_coverage</i>: Percentage of all the goals that the best candidate found so far has to cover before enabling the new secondary objective. This can be useful to prevent too early revisions.
 +
**<i>Revise_frequency</i>: Number of generations that have to be executed between one interaction and the next one.
 +
**<i>Revisit_candidates</i>: This option specifies whether already valued candidates will be shown to the tester in case they appear later in the search again. When this option is true, the tester will be informed about the value previously assigned to it, and he/she will have to decide whether to preserve that score or change it. Otherwise, already valued candidates will not be shown to the tester and will be directly assigned the former value. This is achieved thanks to an archive where selected candidates for readability assessment are saved once minimized.
  
 
<h6><b><i>Step-by-step illustrative execution</i></b></h6>
 
<h6><b><i>Step-by-step illustrative execution</i></b></h6>
 +
 +
Here, we describe the execution of the interactive search step by step. For this illustrative example, we select the class ATM from the [https://www.evosuite.org/documentation/tutorial-part-3/ tutorial package]. EvoSuite is configured to evolve a population of 30 test suites for 50 generations, the rest of its parameters are left to default values. The interactive properties are configured as follows:
 +
 +
<center>
 +
{| class="wikitable" style="width: 50%;"
 +
! style="text-align:left;"| <b>Property</b>
 +
! style="text-align:left;"| <b>Value</b>
 +
|-
 +
|<i>When_to_revise</i>
 +
|Sorting
 +
|-
 +
|<i>Max_times_sort</i>
 +
|3
 +
|-
 +
|<i>Percentage_to_revise</i>
 +
|10%
 +
|-
 +
|<i>Revise_after_percentage_of_coverage</i>
 +
|88%
 +
|-
 +
|<i>Revise_frequency</i>
 +
|10
 +
|-
 +
|<i>Revisit_candidates</i>
 +
|False
 +
|}
 +
</center>
 +
 +
Next, the search steps in which the interactivity has some influence are described, showing the internal state of the objects and the solutions generated at each moment:
  
 
<u><i>1. Start of the search until secondary objective activation</i></u>
 
<u><i>1. Start of the search until secondary objective activation</i></u>
Línea 31: Línea 80:
 
<u><i>2. Selection of candidates</i></u>
 
<u><i>2. Selection of candidates</i></u>
  
In our example, 26 out of 50 candidates present the same best fitness. However, according to ''Percentage_to_revise'', only 3 of them can be selected for revision. These three candidates are then chosen at random:
+
In our example, 26 out of 30 candidates present the same best fitness. However, according to ''Percentage_to_revise'', only 3 of them can be selected for revision. These three candidates are then chosen at random:
  
 
<center>
 
<center>
Línea 50: Línea 99:
  
 
At this point, EvoSuite is prepared for an interaction. The two minimized versions are:   
 
At this point, EvoSuite is prepared for an interaction. The two minimized versions are:   
* [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction1/test-suite-0.txt First candidate test suite]
+
* First candidate test suite: See as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction1/interaction1-testsuite0.txt text file], or download as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction1/interaction1-testsuite0.java Java file]
* [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction1/test-suite-1.txt Second candidate test suite]
+
* Second candidate test suite: See as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction1/interaction1-testsuite1.txt text file], or download as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction1/interaction1-testsuite1.java Java file]
  
 
The execution is paused to wait for the tester’s feedback. In our example, the second minimized test suite is given a better score (8) than the first one (6):
 
The execution is paused to wait for the tester’s feedback. In our example, the second minimized test suite is given a better score (8) than the first one (6):
Línea 62: Línea 111:
 
<u><i>5. Information integration and sorting</i></u>
 
<u><i>5. Information integration and sorting</i></u>
  
The readability scores are transferred to their respective candidate test suites, including those in the same minimization group. These candidates are re-ordered in the population going from highest to lowest legibility score. In our example, the test suite with id=119 has the best score (legibilityValue=8), and, therefore, is placed at the first position:
+
The readability scores are transferred to their respective candidate test suites, including those in the same minimization group. These candidates are re-ordered in the population going from highest to lowest readability score. In our example, the test suite with id=119 has the best score (readabilityValue=8), and, therefore, is placed at the first position:
  
 
<center>
 
<center>
Línea 71: Línea 120:
 
<u><i>6. Evolution continues until next interaction</i></u>
 
<u><i>6. Evolution continues until next interaction</i></u>
  
The search is resumed, checking if new interactions are needed to break ties at the configured ''Frequency'' (10 generations). In our example, the next interaction is required at the 35th generation. Since the maximun number of interactions (''Max_times_sort'') has not been exhausted, a new interaction is prepared. As in step 3, minimization is executed for the three candidate solutions selected. In this case, each test suite has a different minimized version:  
+
The search is resumed, checking if new interactions are needed to break ties at the configured ''Revise_frequency'' (10 generations). In our example, a new tie to break happens at the last iteration. Since the maximun number of interactions (''Max_times_sort'') has not been exhausted, a new interaction is prepared. As in step 3, minimization is executed for the three candidate solutions selected. In this case, each test suite has a different minimized version:  
  
 
<center>
 
<center>
Línea 82: Línea 131:
 
The three minimized test suites are shown to the tester, who assigns new readability scores:
 
The three minimized test suites are shown to the tester, who assigns new readability scores:
  
* [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/test-suite-0.txt First candidate test suite]
+
* First candidate test suite: See as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/interaction2-testsuite0.txt text file], or download as  [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/interaction2-testsuite0.java Java file]
* [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/test-suite-1.txt Second candidate test suite]
+
* Second candidate test suite: See as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/interaction2-testsuite1.txt text file], or download as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/interaction2-testsuite1.java Java file]
* [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/test-suite-2.txt Third candidate test suite]
+
* Third candidate test suite: See as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/interaction2-testsuite2.txt text file], or download as [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/interaction2/interaction2-testsuite2.java Java file]
  
 
<center>
 
<center>
Línea 93: Línea 142:
 
<u><i>8. Final test suite returned after the search</i></u>
 
<u><i>8. Final test suite returned after the search</i></u>
  
No additional interactions are scheduled, so the evolutionary algorithm ends the search. The [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/final/test-suite-final.txt returned test suite] corresponds to the candidate solution with highest coverage and best readability coverage (the 3rd candidate test suite in the last interaction). Notice that EvoSuite adds some test cases to cover additional goals at the end of the search, using an archive of test case - test goal taken from any individual within the population. Final statistics are computed, showing that including the readability assessment does not impede coverage improvement.
+
No additional interactions are scheduled, so the evolutionary algorithm ends the search. The returned test suite ([https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/final/test-suite-final.txt text file], [https://www.uco.es/investigacion/proyectos/SEBASENet/sebasenet-docs/cec2021/final/test-suite-final.java Java file]) corresponds to the candidate solution with highest coverage and best readability coverage (the 3rd candidate test suite in the last interaction). Notice that EvoSuite adds some test cases to cover additional goals at the end of the search, using an archive of pairs (test case, test goal) taken from any individual within the population. In this execution, the final test suite includes three additional test cases to cover other goals not reached by the best test suite in the final generation (the 3rd candidate previously shown). Also, final statistics are computed, showing that including the readability assessment does not impede coverage improvement.
  
 
<center>
 
<center>
 
[[File:Evosuite-results.PNG|600px]]
 
[[File:Evosuite-results.PNG|600px]]
 
</center>
 
</center>

Revisión actual del 19:03 4 may 2021


Interactivity in the Generation of Test Cases with Evolutionary Computation

Publication

Aurora Ramírez, Pedro Delgado-Pérez, Kevin Valle-Gómez, Inmaculada Medina-Bulo, José Raúl Romero. Interactivity in the Generation of Test Cases with Evolutionary Computation. IEEE Congress on Evolutionary Computation. June 2021.

Abstract

Test generation is a costly but necessary testing activity to increase the quality of software projects. Automated testing tools based on evolutionary computation principles constitute an appealing modern approach to support testing tasks. However, these tools still find difficulties to detect certain types of plausible faults in real-world projects. Besides, recent studies have shown that, in general, automatically-generated tests do not resemble those manually written and, consequently, testers are reluctant to adopt them. We observe two key issues, namely the opacity of the process and the lack of cooperation with the tester, currently hampering the acceptance of automated results. Based on these findings, we explore in this paper how the interaction between current tools and expert testers would help address the test case generation problem. More specifically, we identify a number of interaction opportunities related to the object-oriented test case design driven to boost their readability and detection power. Using EvoSuite as base implementation, we present a proof of concept focused on the possibility to integrate readability assessment of the most promising test suites into a genetic algorithm.

Additional material
Introduction to EvoSuite

EvoSuite is a popular search-based test generation tool for Java classes, which uses evolutionary computation to create test cases, independently or as part of whole test suites. Further information on the main features, execution and extension of this tool can be found in its webpage. We make use of EvoSuite to implement a proof of concept that allows interacting with a tester to incorporate readability subjective assessment of test suites. In this proof of concept, the tester will be asked to evaluate the readability of the most promising candidates (i.e., those with the highest coverage) with the aim to guide the search towards the generation of more readable test suites. To do this, we have made some modifications to EvoSuite classes. First, a new class has been added:

  • InteractiveEvaluationSecondaryObjective: this class represents a new secondary objective that considers readability scores assigned by testers by means of interaction. It contains two main methods:
    • setReadability(): selected candidates for evaluation are minimized (to remove irrelevant statements and improve their appearance), shown to the tester and the readability scores provided by the tester are assigned to their respective candidates.
    • compareChromosomes(): this method allows comparing two candidates based on their readability scores.

Also, the main classes and methods of EvoSuite that have been modified are listed below:

  • TestSuiteSecondaryObjective: this class now incorporates the possibility to select the new secondary objective InteractiveEvaluationSecondaryObjective for the execution.
  • Chromosome: a new property, called readabilityValue, has been added to this class (representing a candidate test suite) to store a value associated with its readability.
  • GeneticAlgorithm: the changes to this class mainly correspond to the following two methods:
    • updateSecondaryCriterion(): this method now allows enabling or disabling the new secondary objective based on the added properties to adjust the interaction schedule (these properties are explained below).
    • sortPopulation(): when the new secondary objective is enabled, and ties between candidates are detected, a subset of the most promising candidates are selected for their evaluation before the sorting process takes place.
  • ReplacementFunction: when the new secondary objective is enabled and a tie happens in the replacement process between children and their parents, the best child and its best parent are shown to the tester before the replacement takes place.
  • Properties: six properties have been added to customize the interaction schedule and the information integration strategy (see the paper for details about these components):
    • When_to_revise: Sorting, replacement or in both cases.
    • Max_times_sort/Max_times_replace: Number of times the user is willing to interact at most during the search when sorting and replacing the population, respectively.
    • Percentage_to_revise: Percentage of candidates in the population that the user is willing to review at most in one interaction. This option is only taken into account when the number of tied candidates surpasses this percentage. In that case, some candidates are discarded at random to reduce the number of candidates to review.
    • Revise_after_percentage_of_coverage: Percentage of all the goals that the best candidate found so far has to cover before enabling the new secondary objective. This can be useful to prevent too early revisions.
    • Revise_frequency: Number of generations that have to be executed between one interaction and the next one.
    • Revisit_candidates: This option specifies whether already valued candidates will be shown to the tester in case they appear later in the search again. When this option is true, the tester will be informed about the value previously assigned to it, and he/she will have to decide whether to preserve that score or change it. Otherwise, already valued candidates will not be shown to the tester and will be directly assigned the former value. This is achieved thanks to an archive where selected candidates for readability assessment are saved once minimized.
Step-by-step illustrative execution

Here, we describe the execution of the interactive search step by step. For this illustrative example, we select the class ATM from the tutorial package. EvoSuite is configured to evolve a population of 30 test suites for 50 generations, the rest of its parameters are left to default values. The interactive properties are configured as follows:

Property Value
When_to_revise Sorting
Max_times_sort 3
Percentage_to_revise 10%
Revise_after_percentage_of_coverage 88%
Revise_frequency 10
Revisit_candidates False

Next, the search steps in which the interactivity has some influence are described, showing the internal state of the objects and the solutions generated at each moment:

1. Start of the search until secondary objective activation

The search starts with the new secondary objective disabled. Hopefully, the fitness of the candidates –i.e., their coverage of goals– will be increased overall with each new generation. When the coverage of the best candidate is greater or equal to Revise_after_percentage_of_coverage, the secondary objective is enabled. In our example, this happens at the 15th generation:

Evosuite-step1.PNG


2. Selection of candidates

In our example, 26 out of 30 candidates present the same best fitness. However, according to Percentage_to_revise, only 3 of them can be selected for revision. These three candidates are then chosen at random:

Evosuite-step2.PNG


3. Minimization

The selected candidates are minimized. In our example, two of the three candidates present exactly the same minimization. Therefore, they are grouped together and only two test suites are shown to the tester:

Evosuite-step3.PNG


4. Interaction with the tester for readability assignment

At this point, EvoSuite is prepared for an interaction. The two minimized versions are:

The execution is paused to wait for the tester’s feedback. In our example, the second minimized test suite is given a better score (8) than the first one (6):

Evosuite-step4.PNG


5. Information integration and sorting

The readability scores are transferred to their respective candidate test suites, including those in the same minimization group. These candidates are re-ordered in the population going from highest to lowest readability score. In our example, the test suite with id=119 has the best score (readabilityValue=8), and, therefore, is placed at the first position:

Evosuite-step5.PNG


6. Evolution continues until next interaction

The search is resumed, checking if new interactions are needed to break ties at the configured Revise_frequency (10 generations). In our example, a new tie to break happens at the last iteration. Since the maximun number of interactions (Max_times_sort) has not been exhausted, a new interaction is prepared. As in step 3, minimization is executed for the three candidate solutions selected. In this case, each test suite has a different minimized version:

Evosuite-step6.PNG


7. Second interaction with the tester for readability assignment

The three minimized test suites are shown to the tester, who assigns new readability scores:

Evosuite-step7.PNG


8. Final test suite returned after the search

No additional interactions are scheduled, so the evolutionary algorithm ends the search. The returned test suite (text file, Java file) corresponds to the candidate solution with highest coverage and best readability coverage (the 3rd candidate test suite in the last interaction). Notice that EvoSuite adds some test cases to cover additional goals at the end of the search, using an archive of pairs (test case, test goal) taken from any individual within the population. In this execution, the final test suite includes three additional test cases to cover other goals not reached by the best test suite in the final generation (the 3rd candidate previously shown). Also, final statistics are computed, showing that including the readability assessment does not impede coverage improvement.

Evosuite-results.PNG