Home Publications Book chaptersTNNOEFSO

Oversampling the minority class in the feature space

Research areas:

Methodology - Machine learning

Year:

2016

Type of Publication:

Article

Keywords:

Over-sampling, imbalanced classification, kernel methods, empirical feature space, support vector machines

Authors:

Journal:

IEEE Transactions on Neural Networks and Learning Systems

Volume:

Number:

Pages:

1947-1961

Month:

September

ISSN:

2162-237X

BibTex:

@article{TNNOEFSO,
author = "Mar{\'i}a P{\'e}rez-Ortiz and Pedro Antonio Guti{\'e}rrez and Peter Tino and C{\'e}sar Herv{\'a}s-Mart{\'i}nez",
abstract = "The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach over-samples the minority class through convex combination of its patterns. We explore the general idea of synthetic over-sampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (a Euclidean space isomorphic to the feature space) for over-sampling purposes. The proposed method is framed in the context of support vector machines where imbalanced datasets can pose a serious hindrance. The idea is investigated in three scenarios: 1) over-sampling in the full and reduced-rank empirical feature spaces; 2) a kernel learning technique maximising the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); 3) a unified framework for preferential over-sampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over $50$ imbalanced datasets.",
awards = "JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY {\&}amp; METHODS",
comments = "JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY {\&}amp; METHODS",
doi = "10.1109/TNNLS.2015.2461436",
issn = "2162-237X",
journal = "IEEE Transactions on Neural Networks and Learning Systems",
keywords = "Over-sampling, imbalanced classification, kernel methods, empirical feature space, support vector machines",
month = "September",
note = "JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY {\&}amp; METHODS",
number = "9",
pages = "1947--1961",
title = "{O}versampling the minority class in the feature space",
url = "ieeexplore.ieee.org/document/7222438/",
volume = "27",
year = "2016",
}

Note:

JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY & METHODS

Abstract:

The imbalanced nature of some real-world data is one of the current challenges for machine learning researchers. One common approach over-samples the minority class through convex combination of its patterns. We explore the general idea of synthetic over-sampling in the feature space induced by a kernel function (as opposed to input space). If the kernel function matches the underlying problem, the classes will be linearly separable and synthetically generated patterns will lie on the minority class region. Since the feature space is not directly accessible, we use the empirical feature space (a Euclidean space isomorphic to the feature space) for over-sampling purposes. The proposed method is framed in the context of support vector machines where imbalanced datasets can pose a serious hindrance. The idea is investigated in three scenarios: 1) over-sampling in the full and reduced-rank empirical feature spaces; 2) a kernel learning technique maximising the data class separation to study the influence of the feature space structure (implicitly defined by the kernel function); 3) a unified framework for preferential over-sampling that spans some of the previous approaches in the literature. We support our investigation with extensive experiments over $50$ imbalanced datasets.

Comments:

JCR(2016): 6.108 Position: 3/104 (Q1) Category: COMPUTER SCIENCE, THEORY & METHODS

Online version [Bibtex] [RIS] [MODS]

Back