Auto-adaptive Grammar-Guided Genetic Programming Algorithm to build Ensembles of Multi-Label Classifiers – Knowledge Discovery and Intelligent Systems – KDIS

This website contains additional material to the paper titled Auto-adaptive Grammar-Guided Genetic Programming Algorithm to build Ensembles of Multi-Label Classifiers.

Abstract

Multi-label classification has been used to solve a wide range of problems where each example in the dataset may be related either to one class (as in traditional classification problems) or to several class labels at the same time. Many ensemble-based approaches have been proposed in the literature, aiming to improve the performance of traditional multi-label classification algorithms. However, most of them do not consider the data characteristics to build the ensemble, and those that consider them need to tune many parameters to maximize their performance.

In this paper, we propose an Auto-adaptive algorithm based on Grammar-Guided Genetic Programming to generate Ensembles of Multi-Label Classifiers based on projections of k labels (AG3P-kEMLC). It creates a tree-shaped ensemble, where each leaf is a multi-label classifier focused on a subset of k labels. Unlike other methods in the literature, our proposal can deal with different values of k in the same ensemble, instead of fixing one specific value. It also includes an auto-adaptive process to reduce the number of hyper-parameters to tune, prevent overfitting and reduce the runtime required to execute it. Three versions of the algorithm are proposed. The first, fixed, uses the same value of k for all multi-label classifiers in the ensemble. The remaining two deal with different k values in the ensemble: uniform gives the same probability to choose each available value of k, and gaussian favors the selection of smaller values of k.

The experimental study carried out considering twenty reference datasets and five evaluation metrics, compared with eleven ensemble methods demonstrates that our proposal performs significantly better than the state-of-the-art methods.

Experimental study

In this section, we extend the experimental results included in the paper. Specifically, we include figures that for length restrictions were not included in the paper.

Convergence of the best individual (single runs)

For studying the convergence of AG3P-kEMLC, the evolution of the best individual fitness in single executions and in average for all executions is studied. In this website, we extend the experimental results of single executions for all versions of AG3P-kEMLC and for the three datasets.
In all cases, the convergence of the best individual is similar. It varies between datasets (smaller datasets as Emotions need fewer generations to converge than bigger datasets) and between variants (AG3P-k3 usually needs more generations to converge than the rest, given the wider search space).

Auto-adaptability of genetic operators

One of the main advantages of AG3P-kEMLC is the ability to auto-adapt some of its hyper-parameters during the execution, such as the probabilities for crossover and mutation operators.
In the following figure, we plot the variation of such probabilities, on average for all 50 executions. In all of them, their variation is similar, varying mostly in the number of generations needed. It is noteworthy that in AG3P-k3 and Medical, at the beginning the mutation probability is increased, instead of the crossover as in the rest of the cases. Given the number of incomplete trees that AG3P-k3 may create at the beginning, a bigger mutation probability would be needed to convert these incomplete trees into complete; then, when the complete trees are the majority in the population, the average fitness improves, so the crossover probability does so (at least, for several generations).

Then, we also present some examples of the variation in the probabilities of crossover and mutation for single runs. Although plots of single runs do not provide very useful information of the operation of the method on average, they demonstrate that the probabilities greatly varies between executions and therefore the fact of having the auto-adaptive mechanism should make our method more robust than using pre-fixed probabilities. In the following figures, 3 single executions for each method and dataset are presented, where the solid line indicates the crossover probability, and the dotted line the mutation one.

Comparison versus standard (non-ensemble) classifiers

In the paper, we include an experimental study comparing AG3P-kEMLC versus other EMLCs in the literature. Although EMLCs have demonstrated to outperform non-ensemble multi-label classifiers, in this supplementary material we include a comparison between our proposal and other standard MLC methods such as BR [Tso10], LP [Bou04], CC [Rea11], GACC [Gon13], PS [Rea08], LPBR [Ten10], and LIFT [Zha14]. The results of this study can be downloaded in the table below. AG3P-kEMLC is the best method in almost all evaluation metrics, also being the only one that do not perform statistically worse than the control algorithm in any of the metrics, thus demonstrating its superiority versus standard multi-label classification methods.

Results vs standard MLC

Results using other base single-label classifier

The purpose of this paper is to propose a method to optimize the ensemble structure, given a base multi-label classifier (LP is recommended to be used). Besides, this multi-label classifier uses a single-label base classifier to obtain the predictions. Given its wide use in MLC, in the paper the results were reported using C4.5 classifier not only in AG3P-kEMLC but in most EMLCs. However, in order to 1) use a different and more complex single-label classifier, and 2) use an ensemble method as single-label classifier to support the idea that ensemble methods are preferred versus non-ensembles, we have carried out the experimental study using Random Forest [Bre01] as single-label classifier for AG3P-kEMLC and the rest of EMLCs that need a single-label classifier. The results of this study can be downloaded in the table below. Similar to the results reported using C4.5 as single-label classifier, AG3P-kEMLC is the best method in 3 out of 5 evaluation metrics, and the only one that does not perform statistically worse than the control algorithm in any of the metrics. These results show that our proposal achieves good performance independently of the single-label classifier used, demonstrating that it effectively obtains an optimized ensemble structure.

Results using Random Forest

Results using other aggregation criteria

AG3P-kEMLC builds a tree-shaped ensemble structure, where at each non-leaf node, the predictions of children nodes are combined by majority voting. However, in this section we report results for AG3P-ku using an aggregation criteria based on combining the confidences (and not bipartitions) reported by children nodes; only at the root node a bipartition is made to give the final prediction. These results suggest that the combination of bipartitions is a better option than the combination of confidences. Nevertheless, the combination of confidences will be considered for further study in the future.

Results with other aggregation criteria

Source code

G3P-kEMLC has been implemented using the JCLEC [Ven08], Mulan [Tso11], and Weka [Hal09] libraries. The source code of AG3P-kEMLC is publicy available under the GPLv3 license in the following GitHub repository.

References

[Bou04] Boutell, M. R., Luo, J., Shen, X., & Brown, C. M. (2004). “Learning multi-label scene classification”. Pattern recognition, 37(9), 1757-1771.
[Bre01] Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
[Gon13] Gonçalves, E. C., Plastino, A., & Freitas, A. A. (2013, November). “A genetic algorithm for optimizing the label ordering in multi-label classifier chains”. In 2013 IEEE 25th International Conference on Tools with Artificial Intelligence (pp. 469-476). IEEE.
[Hal09] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten (2009) “The WEKA data mining software: an update”, ACM SIGKDD explorations newsletter, 11(1), 10-18.
[Rea08] Read, J. (2008, April). “A pruned problem transformation method for multi-label classification”. In Proc. 2008 New Zealand Computer Science Research Student Conference (NZCSRS 2008) (Vol. 143150, p. 41).
[Rea11] Read, J., Pfahringer, B., Holmes, G., & Frank, E. (2011). “Classifier chains for multi-label classification”. Machine learning, 85(3), 333.
[Ten10] Tenenboim-Chekina, L., Rokach, L., & Shapira, B. (2010, June). “Identification of label dependencies for multi-label classification”. In Working Notes of the Second International Workshop on Learning from Multi-Label Data (pp. 53-60).
[Tso10] Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). “Data mining and knowledge discovery handbook. Mining multi-label data”, 667-685.
[Tso11] G. Tsoumakas, E. Spyromitros-Xioufis, J. Vilcek, I. Vlahavas (2011) “Mulan: A Java Library for Multi-Label Learning”, Journal of Machine Learning Research, 12, pp. 2411-2414.
[Ven08] Ventura, C. Romero, A. Zafra, J. A. Delgado, and C. Hervás (2008) “JCLEC: a Java framework for evolutionary computation”, Soft Computing, 12(4), 381-392.
[Zha14] Zhang, M. L., & Wu, L. (2014). “Lift: Multi-label learning with label-specific features”. IEEE transactions on pattern analysis and machine intelligence, 37(1), 107-120.