Home Publications ConferencesordinalVariablesIJCNN

Representing ordinal input variables in the context of ordinal classification

Research areas:

Methodology - Machine learning

Year:

2016

Type of Publication:

In Proceedings

Authors:

Book title:

IEEE International Joint Conference on Neural Networks (IJCNN2016)

Pages:

2174-2181

Organization:

Vancouver, BC, Canada

Month:

24th-29th July

ISBN:

978-1-5090-0621-2

BibTex:

@conference{22016,
author = "Pedro Antonio Guti{\'e}rrez and Mar{\'i}a P{\'e}rez-Ortiz and Javier S{\'a}nchez-Monedero and C{\'e}sar Herv{\'a}s-Mart{\'i}nez",
abstract = "Ordinal input variables are common in many supervised and unsupervised machine learning problems. We focus on ordinal classification problems, where the target variable is also categorical and ordinal. In order to represent these variables for measuring distances or applying continuous mapping functions, they have to be transformed to numeric values. This paper evaluates five different methods to do so. Two of them are commonly applied by practitioners, the first one based on binarising the ordinal input variable using standard indicator variables (NomBin), and the second one based on directly mapping each category to a consecutive natural number (Num). Furthermore, three novel proposals are evaluated in this paper: 1) an ordinal binarisation based on the order of the input variable (OrdBin), 2) the analysis of pairwise distances between input patterns to recover the latent variable generating the ordinal one (NumLVR), and 3) the refinement of the standard numeric transformation by recovering the distance between sets of patterns of consecutive categories (NumCDR). A thorough empirical evaluation is done, considering 12 datasets, 5 performance metrics and 4 classifiers (2 of them of nominal nature and 2 of ordinal nature). The results show that the NomBin representation method leads to the worst results, and that both Num and NumCDR methods obtain very good performance, although NumCDR results are consistently better for almost all performance metrics and classifiers considered.",
booktitle = "IEEE International Joint Conference on Neural Networks (IJCNN2016)",
doi = "10.1109/IJCNN.2016.7727468",
isbn = "978-1-5090-0621-2",
month = "24th-29th July",
organization = " Vancouver, BC, Canada",
pages = "2174-2181",
publisher = "IEEE Press",
title = "{R}epresenting ordinal input variables in the context of ordinal classification",
url = "doi.org/10.1109/IJCNN.2016.7727468",
year = "2016",
}

Abstract:

Ordinal input variables are common in many supervised and unsupervised machine learning problems. We focus on ordinal classification problems, where the target variable is also categorical and ordinal. In order to represent these variables for measuring distances or applying continuous mapping functions, they have to be transformed to numeric values. This paper evaluates five different methods to do so. Two of them are commonly applied by practitioners, the first one based on binarising the ordinal input variable using standard indicator variables (NomBin), and the second one based on directly mapping each category to a consecutive natural number (Num). Furthermore, three novel proposals are evaluated in this paper: 1) an ordinal binarisation based on the order of the input variable (OrdBin), 2) the analysis of pairwise distances between input patterns to recover the latent variable generating the ordinal one (NumLVR), and 3) the refinement of the standard numeric transformation by recovering the distance between sets of patterns of consecutive categories (NumCDR). A thorough empirical evaluation is done, considering 12 datasets, 5 performance metrics and 4 classifiers (2 of them of nominal nature and 2 of ordinal nature). The results show that the NomBin representation method leads to the worst results, and that both Num and NumCDR methods obtain very good performance, although NumCDR results are consistently better for almost all performance metrics and classifiers considered.

Online version [Bibtex] [RIS] [MODS]

Back