Thesis JMMoyano – Knowledge Discovery and Intelligent Systems – KDIS

MULTI-LABEL CLASSIFICATION MODELS FOR HETEROGENEOUS DATA: AN ENSEMBLE-BASED APPROACH.

BASIC INFORMATION

Ph.D. Student: Jose María Moyano
Advisors: Eva Lucrecia Gibaja, Krzysztof J. Cios, Sebastián Ventura
Defended on: September 2020
Keywords: multi-label learning, classification, ensemble learning
Digital version: PDF

DESCRIPTION

In recent years, the multi-label classification gained attention of the scientific community given its ability to solve real-world problems where each instance of the dataset may be associated with several class labels simultaneously. For example, in medical problems each patient may have several diseases at the same time, and in multimedia categorization, each item might be related with different tags or topics.Thus, given the nature of these problems, dealing with them as traditional classification where just one class label is assigned to each instance, would lead to a loss of information. The fact of having more than one label associated with each instance leads to new classification challenges that should be addressed, such as modeling compound dependencies among the labels, imbalance of the label space, and high dimensionality of the output space.

A large number of methods for multi-label classification was proposed in the literature, including several ensemble-based methods. Ensemble learning is a technique which is based on combining the outputs of many diverse base models, in order to outperform each separate model. In multi-label classification, ensemble methods are those that combine the predictions of several multi-label classifiers, and they were shown to outperform simpler multi-label classifiers. Therefore, given its performance, we focus our research on the study of ensemble-based methods for multi-label classification.

Since most of the ensemble methods for multi-label classification are based on creating diverse members by randomly selecting instances, input features, or labels, our main objective is to propose novel ensemble methods for multi-label classification where the characteristics of the data are taken into account.

The main contributions of this Ph.D. thesis are summarized in the following points:

A thorough review of the state-of-the-art ensembles of multi-label classifiers (EMLCs) has been performed. This study analyzes the EMLCs in the literature, proposing a novel taxonomy to categorize them, and also an experimental study is carried out, in order to find the method or family of methods that perform better depending on the characteristics of the data, as well as provide some guidelines for selecting the best method according to the characteristics of a given problem.
Two evolutionary approaches to build EMLCs while considering the characteristics of the data are proposed. The first one (EME) evolves the entire ensemble as an individual, while the second (EAGLET) evolves separate members of the ensemble as individuals, then creating the ensemble by combining accurate and diverse individuals from the population. Both methods have been applied to real-world benchmarks and they have been demonstrated to perform significantly better than state-of-the-art EMLCs, also emphasizing the lower computational complexity of EAGLET.
A tool for analyzing and preprocessing multi-label datasets have been developed, including both a graphic interface and a Java API.
Two preliminary approaches to build EMLCs have been also proposed, one of them based on cooperative coevolutionary algorithms, and other based on grammar-guided genetic programming. These approaches serve as basis for establishing some lines of future work.
The code of all the developed tools and algorithms are publicly available in GitHub repositories under the GNU Public License (GLPv3).

FUNDS

The development of this thesis has been supported by:

Spanish Ministry of Science and Competitiveness, project TIN-2014-55252-P.
Spanish Ministry of Science and Competitiveness, project TIN-2017-83445-P.
Spanish Ministry of Education, Culture and Sports under the FPU program (FPU15/02948).

PUBLICATIONS ASSOCIATED WITH THIS THESIS

INTERNATIONAL JOURNALS

J.M. Moyano, E. Gibaja and S. Ventura. MLDA: A tool for analyzing multi-label datasets. Knowledge-Based Systems, vol. 121, pp. 1-3. 2017. DOI: 10.1016/j.knosys.2017.01.018.
J.M. Moyano, E. Gibaja, K.J. Cios and S. Ventura. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion, vol. 44, pp. 33-45. 2018. DOI: 10.1016/j.inffus.2017.12.001.
J.M. Moyano, E. Gibaja, K.J. Cios and S. Ventura. An evolutionary approach to build ensembles of multi-label classifiers. Information Fusion, vol. 50, pp. 168-180. 2019. DOI: 10.1016/j.inffus.2018.11.013.
J.M. Moyano, E. Gibaja, K.J. Cios and S. Ventura. Combining multi-label classifiers based on projections of the output space using Evolutionary algorithms. Knowledge-Based Systems, vol. 196, pp. 105770. 2020. DOI: 10.1016/j.knosys.2020.105770.

INTERNATIONAL CONFERENCES

J.M. Moyano, E. Gibaja, K.J. Cios and S. Ventura. Generating ensembles of multi-label classifiers using cooperative coevolutionary algorithms. In European Conference on Artificial Intelligence (ECAI 2020), 1379-1386. 2020. 2020. DOI: 10.3233/FAIA200242.
J.M. Moyano, E. Gibaja, K.J. Cios and S. Ventura. Tree-shaped ensemble of multi-label classifiers using grammar-guided genetic programming. In IEEE Congress on Evolutionary Computation (IEEE-CEC 2020), 1-8. 2020. 2020. DOI: 10.1109/CEC48606.2020.9185661.

NATIONAL CONFERENCES

J.M. Moyano, E. Gibaja, A. Cano, J.M. Luna and S. Ventura. Algoritmo evolutivo para optimizar ensembles de clasificadores multi-etiqueta. In X Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB 2015), pp. 219-225. 2015.
J.M. Moyano, E. Gibaja, A. Cano, J.M. Luna and S. Ventura. Diseño automático de multi-clasificadores basados en proyecciones de etiquetas. In XVI Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA 2015), pp. 355–365. 2015.
J.M. Moyano, E. Gibaja and S. Ventura. Una herramienta para analizar conjuntos de datos multi-etiqueta . In XVII Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA 2016), pp. 857–866. 2016.