New Challenges in Association Rule Mining: an Approach Based on Genetic Programming

Basic information

Ph.D. Student: José María Luna
Advisors: Sebastián Ventura, José Raúl Romero, Cristóbal Romero
Defended on: January 2014
Keywords: association rule mining, genetic programming
Digital version

Description

This Ph.D. thesis involves a series of approaches for mining association rules by means of a grammar-guided genetic programming based methodology. The ultimate goal is to provide new algorithms that mine association rules in only one step and in a highly efficient way. The use of grammars enables the flexibility of the extracted knowledge to be increased. Grammars also enable obtaining association rules that comprise categorical, quantitative, positive and negative attributes to be mined.
Firstly, as for the mining of frequent association rules, a novel grammar-based algorithm, called G3PARM, has been proposed. It is able to discover rules having positive, negative, categorical and quantitative attributes. The evolutionary model is able to perform the mining process in one single step. This PhD Thesis also includes a model for mining rare or infrequent association rules, as well as two multi-objective approaches that optimize two different quality measures at time. Additionally, two novel algorithms that self-adapt their parameters are considered. In this sense, a previous tuning of the parameters would not be required, as they are adjusted depending on the data under study. Finally, the developed methodologies have been applied to the educational field to discover interesting information that could be used to improve the courses.
All the algorithms proposed in this Doctoral Thesis have been evaluated in a proper experimental framework, using different types of datasets and comparing their performance against other published methods of proved quality. Results have been verified by applying non-parametric statistical tests, demonstrating the many benefits of using a grammar-based methodology to address the association rule mining problem.

The following points summarize the most relevant results presented in the different chapters of this Ph.D. thesis and the conclusions obtained.

  • A thorough bibliographical revision covering the topics treated in this Thesis has been discussed. As main topics it includes a study of the proposals by using both exhaustive search and evolutionary methodologies that exist in literature. Notice that G3P has not been previously used in ARM.
  • Computational and memory requirements of ARM have been overcome. All the proposed algorithms follow an evolutionary methodology and do not require the same two steps for mining ARs as exhaustive search algorithms do. Hence, the proposed G3P algorithms are able to discover frequent and also infrequent ARs without requiring the previous discovery of frequent or infrequent patterns.
  • All the proposed algorithms are based on a CFG that increases the flexibility and interpretability of the extracted knowledge. This grammar also enables ARs that comprise positive, quantitative and negative conditions to be mined, and the logical operators used to mine this kind of attributes could be predefined or modified depending on the domain under study.
  • The mining of reliable and infrequent ARs is an easy task, requiring a higher computational time because of the high number of rules that could appear from data. We present the Rare-G3PARM algorithm, which was specifically designed to mine this sort of infrequent ARs, and it has provided pretty promising results.
  • The ARM problem could be considered as a multi-objective problem when optimizing more than one measure at time is required. For this purpose, we have presented two multi-objective approaches for mining ARs by means of G3P. The results have revealed that both algorithms behave very well for optimizing both support--confidence and support--lift at time.
  • An important drawback in evolutionary computation is the high number of parameters required to execute algorithms. Besides, they are often required to be previously optimized to obtain the best results. Sometimes, the number of parameters is too high to be accordingly optimized in a reasonable time. To address this issue, we have proposed two new G3P models that do not require as many parameters as existing EAs do, self-adapting their parameters to the data under study as the evolutionary process evolves.
  • Finally, as a real case study, some of the proposed algorithms in PhD Thesis has been applied to the educational field, discovering interesting relations that are highly interesting for instructors in order to improve their teaching work.

Funds

The development of this thesis was supported by:

  • Spanish Ministry of Science and Technology, project TIN2011-22408.
  • Regional Government of Andalusia, project P08-TIC-3720.
  • Spanish Ministry of Education under the FPU program (AP-2010-0041)

Publications associated with this thesis

International Journals

  1. J.M. Luna, J.R. Romero and S. Ventura. Design and Behaviour Study of a Grammar Guided Genetic Programming Algorithm for Mining Association Rules. Knowledge and Information Systems, vol. 32(1), pp. 53-76. 2012.
  2. C. Romero, A. Zafra, J.M. Luna and S. Ventura. Association rule mining using genetic programming to provide feedback to instructors from multiple-choice quiz data. Expert Systems, vol. 30(2), pp. 162-172. 2013.
  3. J.M. Luna, J.R. Romero and S. Ventura. Grammar-Based Multi-Objective Algorithms for Mining Association Rules. Data & Knowledge Engineering, vol. 86, pp. 19-37. 2013.
  4. J.M. Luna, J.R. Romero and S. Ventura. On the Adaptability of G3PARM to the Extraction of Rare Association Rules. Knowledge and Information Systems, vol. 38(2), pp. 391-418. 2014.

International Conferences

  1. J.M. Luna, J.R. Romero and S. Ventura. G3PARM: A Grammar Guided Genetic Programming Algorithm for Mining Association Rules. Proceedings of the IEEE Congress on Evolutionary Computation (CEC'10), pp. 2586-2593. 2010
  2. C. Romero, J.M. Luna, J.R. Romero and S. Ventura. Mining Rare Association Rules from e-Learning Data. Proceedings of the 3rd International Conference on Educational Data Mining (EDM'10), pp. 171-180. 2010.
  3. J.M. Luna, J.R. Romero and S. Ventura. Analysis for the Effectiveness of G3PARM Algorithm. Proceedings of the 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS'10), pp. 27-34. 2010.
  4. J.M. Luna, J.R. Romero and S.Ventura. Mining and Representing Rare Association Rules through the Use of Genetic Programming. Proceedings of the 3rd World Congress on Nature and Biologically Inspired Computing (NaBIC'11), pp. 86-91. 2011.
  5. J.M. Luna, J.R. Romero, C. Romero and S. Ventura. A Genetic Programming Free-Parameter Algorithm for Mining Association Rules. Proceedings of the 12th International Conference on Intelligent Systems Design and Applications (ISDA'12), pp. 64-69. 2012.