Page 24 - SMILESENG
P. 24

Intl. Summer School on Search- and Machine Learning-based Software Engineering
 II. PROPOSAL & CHALLENGES
Transforming OSS engineering into a successful and sus- tainable process requires being able to enroll as much ad- ditional help as possible to manage the software and the surrounding community. We propose to leverage on the help of a swarm of smart software bots to tackle the diverse tasks required to address the sustainability challenges on OSS.
Figure 1 shows the process we devised to create the bots. As can be seen, we define a process composed of four phases: (1) data collection, where project assets are gathered and curated (code but also other collaboration assets such as issues, discussions or comments, among others); (2) graph generation and metric calculation, where collaboration graphs will be generated and used to analyze OSS projects; (3) bot configuration and training, where graphs and metrics will be employed to train bots to perform specific tasks; and (4) bot deployment, where bots will be deployed in the OSS project and used to perform specific tasks.
We have identified several challenges to develop our pro- posal, which we present in the following.
Mining collaboration graphs. Our proposal leverages on collaboration graphs to represent OSS projects, thus enabling the defining of metrics to evaluate relationships between assets of the project. A collaboration graph is a directed graph where nodes represent project assets (e.g., commits, contributors, issues, etc.) and edges represent relationships between them (e.g., the author of a comment, the contributor of a commit, the issue that was assigned to a contributor, etc.). Figure 2 shows an example of a collaboration graph. However, the creation of collaboration graphs is not an easy task, as the extraction and curation of the data recovered from repositories from online code hosting platforms usually require an intricate process to digest the data. Existing solutions such as Cauldron2 or Augur3 are able to extract tabulated data from repositories, which could be used to generate our graph. To the best of our knowledge, only SourceCred4 is able to generate graphs from repositories, but the extracted data is very limited.
Graph-specific metrics. Once the data is extracted in the form of a collaboration graph, the next challenge is to extract significant metrics to describe behaviors, patterns, or identifi- cation of roles. The definition and calculation of graph-specific metrics has usually been covered in the field of Social Network Analysis, which investigates social structures through the use of networks and graph theory (e.g., [1], [4]). In our framework, we propose to explore the application of graph-specific metrics to evaluate the collaboration in OSS projects.
Graph ML Methods. Recently, Graph Neural Networks (GNN) have received a lot of attention due to its ability to analyze graph structural data, which is difficult to analyze as it does not exist in a Euclidean space, does not have a fixed form and usually is hard to visualize for human interpretation.
2 https://cauldron.io/
3 https://github.com/chaoss/augur 4 https://sourcecred.io/
1 2 3 4
     Data Collection
Graph Generation & Metric Calculation
Bot Configuration & Training
Bot Deployment
 12
Online Code Hosting Platform
Fig. 1. Process to create the bots.
  COMMENT COMMIT REVIEW USER ISSUE PULL REPO
Fig. 2. Example of a collaboration graph of a GitHub repository.
A GNN is a neural network that can directly be applied to graphs and provides a convenient way for node-level, edge- level, and graph-level prediction tasks. We propose to use GNNs to analyze collaboration in our graphs and to configure the behavior of bots for OSS projects, as we describe below.
Bot definition and configuration. Our approach aims at providing bots specifically tailored for OSS projects. We propose to define languages to build a smart bots infrastruc- ture able to monitor OSS projects, define and enforce rules, and communicate with the user. Furthermore, to support the deployment of a swarm of bots, we would need to define mechanisms to enable effective bot collaboration protocols.
III. CONCLUSION
In this paper we have described our proposal for the devel- opment of a bot-driven framework to tackle the diverse tasks required to address the sustainability challenges on OSS. The framework leverages on mining software repository techniques to extract collaboration graphs from the repositories of OSS projects, which are then used to configure and train bots for OSS projects. As next steps, we plan to explore the different challenges commented in Section II.
ACKNOWLEDGMENT
This work has been partially funded by the Spanish gov- ernment (LOCOSS project - PID2020-114615RB-I00).
REFERENCES
[1] Mohammad Y. Allaho and Wang-Chien Lee. Analyzing the Social Ties and Structure of Contributors in Open Source Software Community. pages 56–60, 2013.
[2] GeorgiosGousios,MartinPinzger,andArievanDeursen.AnExploratory Study of the Pull-based Software Development Model. In Int. Conf. on Software Engineering, pages 345–355, 2014.
[3] Eric S. Raymond and Tim O’Reilly. The Cathedral and the Bazaar. 1999. [4] WenZhang,YeYang,andQingWang.AnEmpiricalStudyonIdentifying Core Developers Using Network Analysis. In Int. Workshop on Evidential
assessment of software technologies, pages 43–48, 12.




































































   22   23   24   25   26