SMILESENG

Page 31 - SMILESENG

P. 31

Intl. Summer School on Search- and Machine Learning-based Software Engineering
Green IN Artificial Intelligence: Energy Impact of Machine Learning Models
Mar´ıa Gutie´rrez Institute of Technology
and Information Systems University of Castilla-La Mancha
Ciudad Real, Spain
Email: maria.ggutierrez@uclm.es
Abstract—Nowadays, artificial intelligence (AI) algorithms are used in a wide range of applications, and as their use becomes more common, considering their environmental impact becomes an increasingly urgent topic. We present some empirical cases to illustrate different approaches for energy-efficient AI models: the choice of model and the choice of “real” data or synthetic data for the training dataset. These cases can be used as examples of how a green-driven approach can make contributions to the lifecycle of AI models, helping developers to take energy efficiency requirements into consideration when developing their programs and negotiate a suitable trade-off between energy efficiency and model reliability and performance.
I. INTRODUCTION
The use of artificial intelligence has been increasing over the past years, which has also generated a greater interest in researching the energy consumption of AI systems, and especially of those that include some form of machine learn- ing or neural network. However, attempts at improving the energy efficiency of a model have traditionally been made by designing computer architecture that is particularly suited to run machine learning (ML) models, while the study of the models themselves has been overlooked.
However, there are reasons to research the energy consump- tion of the models themselves. Some studies already point out how the computing cost for ML models has increased exponentially over the last few years [5], and the lack of stan- dardized methods for measuring the energy consumption of these kind of programs [2] hinders our abilities to adequately measure and understand their energetic behavior. In turn, this makes all the harder to design AI systems that take energy efficiency into consideration, since it is not clear which kind of design choices could contribute to its energy efficiency without compromising its performance.
In this proposal, we propose two empirical cases that illustrate possible design choices to develop and train ML models with an eye on their energy consumption: the choice of using ”real” training data versus synthetic training data and the choice of the model itself.
Fe´lix Garc´ıa
Institute of Technology
and Information Systems University of Castilla-La Mancha
Ciudad Real, Spain Email: felix.garcia@uclm.es
II. PROPOSED CASES
A. Synthetic training data or real training data
The most energetically expensive part of the development of a machine learning model often turns out to be its train- ing, not only because it is a task with high computational demands, but also because finding the optimal combination of hyperparameters is a difficult process that requires a lot of experimentation.
For this case, we studied if the choice of training data could affect the overall consumption of a ML model during its training. Many systems are trained using “real” datasets, constructed from data extracted from the real world (such as hospital records, grocery store tickets, network traffic, etc.). However, training models on synthetic datasets is also an extended practice, since real data is not always available, or presents in an unsuitable format. These datasets are created by synthetic data generators, which produce data according to a particular distribution and are able to generate any amount of data on demand, on a format that requires little to no preprocessing.
To properly compare the energy consumption of training with a real dataset versus a synthetic one, we prepared a set of algorithms to perform a binary classification task. We set the task to be run on two different environments: first on MOA [1] and then on WEKA [4]. We chose four algorithms that are available on both platforms and are typical for this kind of task: na¨ıve Bayes, Hoeffding tree, support vector machine (SVM) and logistic regression. For the datasets, we used a real dataset with data for credit card fraud detection, and then we created a synthetic version of that dataset with one of WEKA’s synthetic data generators, trying to ”recreate” the original dataset as closely as possible.
Once the tasks were set up, they were ran on a computer without any special capabilities, and the energy consumption of each algorithm while training on each dataset was measured using GSMP, FEETINGS’ energy measuring methodology [3]. The analysis of the measurements showed that the models that were trained on the synthetic dataset consistently consumed less energy than the models trained on real data, both when they were run on MOA and on WEKA. The percentage of
19

29 30 31 32 33