SMILESENG

Page 58 - SMILESENG

P. 58

Intl. Summer School on Search- and Machine Learning-based Software Engineering
provide the model fragment from a given model that realizes a specific requirement. The approach receives as input the model that implements a specific requirement. An evolutionary algorithm then iterates over a population of model fragments, evolving them using genetic operations. Finally, the score of each model fragment and its position in the ranking are calculated through a fitness function that uses LtoR as its objective. As output, the approach provides a model fragment ranking where each model fragment is ranked taking into account how well the model fragment implements the input requirement. ELtoR has three steps:
1) Initialization: generation of a population of model frag- ments from the model, which serves as input for the evolutionary algorithm.
2) Genetic operations: genetic operations generate candi- date model fragments for the target requirement.
3) Fitness function: the new model fragment population is evaluated through the fitness function.
The last two steps of the approach are repeated until the solution converges to a certain stop condition. When the stop condition is met, the evolutionary algorithm provides a model fragment list, ranked according to the objectives for the requirement.
IV. PRELIMINARY RESULTS
The fitness function of the TLR-ELtoR approach is based on LtoR algorithms, which are machine learning algorithms that automatically address ranking tasks. Specifically, the LtoR algorithms make it possible to build a classifier that contains a set of rules to rank objects. In our approach, the classifier is used to determine how well each model fragment realizes a specific requirement. If the classifier is not properly trained, the approach cannot determine which model fragment is the best solution for a specific requirement. Therefore, the classifier is a critical element in our approach. However, the training of the classifier is not a simple task.
First, a classifier is trained using a knowledge base. This knowledge base contains a set of examples used for learning, or in other words, to train the classifier. In our case, it must contain a set of links between requirements and model fragments. The specific particularities of the model fragments can have an impact on the training, so using a representative knowledge base for the training is very important. In BPMN models, there are many model elements with little or no text, but several language patterns that can be used to link the requirements with the models [9]. Therefore, the examples (model fragments) must contain different amounts of text and also sample all the possible identified patterns.
Second, the examples in the knowledge base must be encoded as feature vectors to apply LtoR algorithms. In [13], we described three different encodings for MDD models, but these encodings are based on texts, that is, on counting the occurrences of a specific term. We can adapt the encodings to count the occurrences of an element type or pattern for the BPMN models.
Third, machine learning algorithms have different capabili- ties. For example, in [8], we used Rankboost because this LtoR algorithm can benefit from a small knowledge base together with a small number of features in the encoding to reduce the overfitting problem [14]. However, due to the lack of text in BPMN models, we expect to need a larger knowledge base and other machine learning techniques such as neural networks.
V. CONCLUSIONS
Traceability Links Recovery (TLR) is a key to success in the scene of industrial software. BPMN models pose an additional challenge for TLR due to their lack of text. Our hypothesis is that our Evolutionary Learning to Rank approach (ELtoR), which is not as dependent on the linguistic clues of the artifacts as the other TLR approaches, can be used to improve the state of the art in TLR between requirements and BPMN models. So far, we have managed to transport the encoding to BPMN models. Our current challenge lies on adapting the training process and knowledge base.
REFERENCES
[1] R. Oliveto, M. Gethers, D. Poshyvanyk, and A. De Lucia, “On the Equivalence of Information Retrieval Methods for Automated Trace- ability Link Recovery,” in 2010 IEEE 18th International Conference on Program Comprehension. IEEE, 2010, pp. 68–71.
[2] R. Watkins and M. Neal, “Why and How of Requirements Tracing,” IEEE Software, vol. 11, no. 4, pp. 104–106, 1994.
[3] A. Ghazarian, “A Research Agenda for Software Reliability,” IEEE Reliability Society 2009 Annual Technology Report, 2010.
[4] P. Rempel and P. Ma¨der, “Preventing Defects: the Impact of Require- ments Traceability Completeness on Software Quality,” IEEE Transac- tions on Software Engineering, vol. 43, no. 8, pp. 777–797, 2017.
[5] R. M. Parizi, S. P. Lee, and M. Dabbagh, “Achievements and Challenges in State-of-the-Art Software Traceability between Test and Code Arti- facts,” IEEE Transactions on Reliability, vol. 63, no. 4, pp. 913–926, 2014.
[6] M. Chinosi and A. Trombetta, “BPMN: An Introduction to the Stan- dard,” Computer Standards & Interfaces, vol. 34, no. 1, pp. 124–134, 2012.
46
[7] J. Rubin and M. Chechik, “A Survey of Feature Location Techniques,” in Domain Engineering. Springer, 2013, pp. 29–58.
[8] A. C. Marce´n, R. Lapen˜a, O´. Pastor, and C. Cetina, “Traceability link recovery between requirements and models using an evolutionary algorithm guided by a learning to rank algorithm: Train control and management case,” Journal of Systems and Software, vol. 163, p. 110519, 2020.
[9] R. Lapen˜a, F. Pe´rez, C. Cetina, and O. Pastor, “Leveraging BPMN particularities to improve traceability links recovery among requirements and BPMN models,” Requirements Engineering, pp. 1–26, 2021.
[10] M. Eaddy, A. V. Aho, G. Antoniol, and Y.-G. Gue´he´neuc, “Cerberus: Tracing Requirements to Source Code Using Information Retrieval, Dynamic Analysis, and Program Analysis,” in ICPC 2008 conference. IEEE, 2008, pp. 53–62.
[11] M. Eaddy, A. Aho, and G. C. Murphy, “Identifying, Assigning, and Quantifying Crosscutting Concerns,” in Proceedings of the First In- ternational Workshop on Assessment of Contemporary Modularization Techniques, 2007, p. 2.
[12] A. de Lucia et al., “Enhancing an Artefact Management System with Traceability Recovery Features,” in Proceedings of the 20th IEEE International Conference on Software Maintenance. IEEE, 2004, pp. 306–315.
[13] A. C. Marce´n, F. Perez, O. Pastor, and C. Cetina, “Enhancing software model encoding for feature location approaches based on machine learning techniques,” Software and Systems Modeling, vol. 21, no. 1, pp. 399–433, 2022.
[14] Z.-H. Zhou and J. Feng, “Deep Forest: Towards an Alternative to Deep Neural Networks,” arXiv preprint arXiv:1702.08835, 2017.

56 57 58 59 60