SMILESENG

Page 59 - SMILESENG

P. 59

Intl. Summer School on Search- and Machine Learning-based Software Engineering
Active Learning-driven Testing of Web APIs
A. Giuliano Mirabella
SCORE Lab, I3US Institute Universidad de Sevilla Seville, Spain amirabella@us.es
Abstract—Automated test case generation for web APIs is a thriving research topic. Most approaches in this domain follow a black-box approach, where test cases are randomly derived from the API specification. These techniques show promising results, but they neglect constraints among input parameters (so- called inter-parameter dependencies), as these cannot be formally described in current API specification languages. As a result, when testing real-world services, most randomly generated API requests (i.e., test cases) are invalid since they violate some of the inter-parameter dependencies of the service, making human intervention indispensable. In this paper, an active learning-based method is proposed to efficiently train a classifier to predict the validity of requests before invoking the API, so that invalid requests can be discarded in advance. This strategy minimises the data required to learn during testing, making the approach affordable in practice. Our technique learns as it generates test cases, so that the percentage of valid calls progressively increases up to 90% in commercial APIs such as GitHub and Stripe. More importantly, the number of detected failures is three times grater than a fuzzing baseline. These results show the potential of artificial intelligence to improve current test case generation techniques achieving an unprecedented level of automation.
I. INTRODUCTION
RESTful Web APIs (also called REST APIs) [1] are the de facto standard for Web integration. These APIs expose a uniform interface through which data and services can be accessed via HTTP interactions. A common phenomenon in REST APIs is that they exhibit inter-parameter dependencies (or simply "dependencies"), i.e, constraints between two or more input parameters that must be met to form valid service calls. For example, in the Google Maps API, when searching for locations, if the location parameter is used, the radius parameter must also be used, otherwise an error is returned (status code 400, “Bad Request”). Likewise, when querying the GitHub API [2] to retrieve the authenticated user’s repos- itories, the optional parameters type and visibility must not be used together in the same API request, otherwise an error will be returned. A recent study [3] revealed that these dependencies are very common: they appear in 4 out of 5 APIs, across all application domains and types of operations. Unfortunately, current API specification languages, such as OpenAPI Specification (OAS) [4], do not support the formal description of such dependencies.
In a previous study [5] we showed that, using machine learning techniques, a classification algorithm can be trained to predict the validity of an API call, i.e. whether it satisfies all API dependencies or not, thus avoiding unnecessary API
requests. This approach is efficient (it is fully automatic) and effective (it generates a high number of valid requests), but requires a sufficiently varied and balanced training set (dataset). Such a dataset is costly to achieve with current testing techniques, as it involves the generation of many requests and the consequent expenditure of API resources (with the number of allowed API calls often being a limiting factor). The aim of this work is to propose a technique for the efficient collection of a dataset of requests and responses, maximizing the learning of the request classifier, and therefore incurring the minimum possible number of API calls.
II. ACTIVE LEARNING
A supervised learning system must be trained with hundreds or thousands of labeled observations. There are cases where these observations have a minimal cost (e.g., the rating that users give to movies on a website), but in other cases obtaining the labels may involve a non-negligible cost in terms of time, money or resources. This is the case here, where tagging train- ing calls as valid or invalid requires invoking APIs hundreds or thousands of times.
Active learning (AL) is a branch of machine learning whose key idea is that the learning algorithm can actively choose the observations it learns from [6]. Higher accuracy can be achieved with fewer training labels if the algorithm is allowed to interactively query a source of information, called an oracle, to label new observations with the correct outputs [6]. For example, transcribing an audio to text can take up to ten times longer than the original audio, and requires trained linguists. In the AL paradigm, it is the algorithm that actively asks the linguist to transcribe certain audios, and then train on them [7].
III. APPROACH
The goal of this work is to maximise the percentage of valid calls sent to the API fully automatically. For this purpose, we propose a technique based on AL that allows to collect a training set in an optimal way. This technique consists of two phases: start and learning (Figure 1).
A. Start
The process starts with the generation of n random requests, which are executed by invoking the API and from which the responses are collected. Those requests that received a 2XX status code are labeled as valid, and those that received a 4XX are labeled as invalid (this happens only when a dependency
47

57 58 59 60 61