Thesis CMarquez – Knowledge Discovery and Intelligent Systems – KDIS

PREDICTING STUDENT FAILURE AND DROPOUT AT SCHOOL USING DATA MINING TECHNIQUES

BASIC INFORMATION

Ph.D. Student: Cárlos Márquez
Advisors: Sebastián Ventura, Cristóbal Romero
Defended on: June 2015
Keywords: predicting student performance, classification, educational data mining, school failure, grammar-based genetic programming.
Digital version

DESCRIPTION

This study proposes to predict school failure in secondary education by using DM. In fact, we want to detect the factors that most influence school failure in young students by using classification techniques. Classification is one of the most studied tasks by DM and Machine Learning (ML) researchers and it consists of predicting the value of a (categorical) attribute (the class) based on the values of other attributes (the predicting attributes). In ML and DM fields, classification is usually approached as a supervised learning task. A search algorithm is used to induce a classifier from a set of correctly classified data instances, called the training set. Another set of correctly classified data instances known as the test set is used to measure the quality of the classifier obtained after the learning process. On the other hand, classification algorithms can be grouped in black and white box models. White box models can be used directly for decision making and provide an explanation for the classification which can be reviewed and agreed by an expert. In this paper, white box algorithms are used because the models obtained have the form of IF-THEN classification or prediction rules that show the reasons for classification and important dependence relations between data. This kind of rule consists of two parts. The rule antecedent (the IF part) contains a conjunction of m conditions on values of predictor attributes, whereas the rule consequent (the THEN part) contains a prediction about the value of a goal attribute or class. In our case, our final objective is to be able to identify the reasons that lead to school failure and to provide identification of students who show those characteristics (i.e. the factors that have the greatest influence on school failure) in order to offer them appropriate and personalized help in an attempt to reduce failure in school. In fact, in this study a new algorithm is proposed for predicting school failure as a classification task. Several experiments have been performed to compare our proposed algorithm with other classification algorithms. Different DM approaches have also been used to try to increase the accuracy of the classification model and to resolve the problems of high dimensionality and imbalanced data.

PUBLICATIONS ASSOCIATED WITH THIS THESIS

INTERNATIONAL JOURNALS

C. Márquez Vera, A. Cano, C. Romero and S. Ventura. Predicting student failure at school using genetic programming and different data mining approaches with high dimensional and imbalanced data. Appl. Intell., vol. 38(3), pp. 315-330. 2013.
C. Márquez Vera, A. Cano, C. Romero, A.Y. Noaman, H.M. Fardoun and S. Ventura. Early dropout prediction using data mining: a case study with high school students. Expert Systems, vol. 33(1), pp. 107-124. 2016.

INTERNATIONAL CONFERENCES

C. Márquez Vera, C. Romero and S. Ventura. Predicting School Failure Using Data Mining. EDM 2011, pp. 271-276. 2011.