New Challenges in Association Rule Mining: an Approach Based on Genetic Programming
Basic information
Ph.D. Student: José María Luna
Advisors: Sebastián Ventura, José Raúl Romero, Cristóbal Romero
Defended on: January 2014
Keywords: association rule mining, genetic programming
Digital version
Description
This Ph.D. thesis involves a series of approaches for mining association rules by means of a grammar-guided
genetic programming based methodology. The ultimate goal is to provide new algorithms that mine association rules in only
one step and in a highly efficient way. The use of grammars enables the flexibility of the extracted knowledge to be increased.
Grammars also enable obtaining association rules that comprise categorical, quantitative, positive and negative attributes
to be mined.
Firstly, as for the mining of frequent association rules, a novel grammar-based algorithm, called G3PARM, has been proposed.
It is able to discover rules having positive, negative, categorical and quantitative attributes. The evolutionary model is able to
perform the mining process in one single step. This PhD Thesis also includes a model for mining rare or infrequent association rules,
as well as two multi-objective approaches that optimize two different quality measures at time. Additionally, two novel algorithms that
self-adapt their parameters are considered. In this sense, a previous tuning of the parameters would not be required, as they are adjusted
depending on the data under study. Finally, the developed methodologies have been applied to the educational field to discover interesting
information that could be used to improve the courses.
All the algorithms proposed in this Doctoral Thesis have been evaluated in a proper experimental framework, using different types of datasets
and comparing their performance against other published methods of proved quality. Results have been verified by applying non-parametric statistical
tests, demonstrating the many benefits of using a grammar-based methodology to address the association rule mining problem.
The following points summarize the most relevant results presented in the different chapters of this Ph.D. thesis and the conclusions obtained.
- A thorough bibliographical revision covering the topics treated in this Thesis has been discussed. As main topics it includes a study of
the proposals by using both exhaustive search and evolutionary methodologies that exist in literature. Notice that G3P has not been previously
used in ARM.
- Computational and memory requirements of ARM have been overcome. All the proposed algorithms follow an evolutionary methodology and do not
require the same two steps for mining ARs as exhaustive search algorithms do. Hence, the proposed G3P algorithms are able to discover frequent
and also infrequent ARs without requiring the previous discovery of frequent or infrequent patterns.
- All the proposed algorithms are based on a CFG that increases the flexibility and interpretability of the extracted knowledge. This grammar
also enables ARs that comprise positive, quantitative and negative conditions to be mined, and the logical operators used to mine this kind of
attributes could be predefined or modified depending on the domain under study.
- The mining of reliable and infrequent ARs is an easy task, requiring a higher computational time because of the high number of rules that
could appear from data. We present the Rare-G3PARM algorithm, which was specifically designed to mine this sort of infrequent ARs, and it has
provided pretty promising results.
- The ARM problem could be considered as a multi-objective problem when optimizing more than one measure at time is required. For this purpose,
we have presented two multi-objective approaches for mining ARs by means of G3P. The results have revealed that both algorithms behave very well
for optimizing both support--confidence and support--lift at time.
- An important drawback in evolutionary computation is the high number of parameters required to execute algorithms. Besides, they are often
required to be previously optimized to obtain the best results. Sometimes, the number of parameters is too high to be accordingly optimized in
a reasonable time. To address this issue, we have proposed two new G3P models that do not require as many parameters as existing EAs do,
self-adapting their parameters to the data under study as the evolutionary process evolves.
- Finally, as a real case study, some of the proposed algorithms in PhD Thesis has been applied to the educational field, discovering
interesting relations that are highly interesting for instructors in order to improve their teaching work.
Funds
The development of this thesis was supported by:
- Spanish Ministry of Science and Technology, project TIN2011-22408.
- Regional Government of Andalusia, project P08-TIC-3720.
- Spanish Ministry of Education under the FPU program (AP-2010-0041)
Publications associated with this thesis
International Journals
- J.M. Luna, J.R. Romero and S. Ventura.
Design and Behaviour Study of a Grammar Guided Genetic Programming Algorithm for Mining Association Rules.
Knowledge and Information Systems, vol. 32(1), pp. 53-76. 2012.
- C. Romero, A. Zafra, J.M. Luna and S. Ventura.
Association rule mining using genetic programming to provide feedback to instructors from multiple-choice quiz data.
Expert Systems, vol. 30(2), pp. 162-172. 2013.
- J.M. Luna, J.R. Romero and S. Ventura.
Grammar-Based Multi-Objective Algorithms for Mining Association Rules.
Data & Knowledge Engineering, vol. 86, pp. 19-37. 2013.
- J.M. Luna, J.R. Romero and S. Ventura.
On the Adaptability of G3PARM to the Extraction of Rare Association Rules.
Knowledge and Information Systems, vol. 38(2), pp. 391-418. 2014.
International Conferences
- J.M. Luna, J.R. Romero and S. Ventura.
G3PARM: A Grammar Guided Genetic Programming Algorithm for Mining Association Rules.
Proceedings of the IEEE Congress on Evolutionary Computation (CEC'10), pp. 2586-2593. 2010
- C. Romero, J.M. Luna, J.R. Romero and S. Ventura.
Mining Rare Association Rules from e-Learning Data.
Proceedings of the 3rd International Conference on Educational Data Mining (EDM'10), pp. 171-180. 2010.
- J.M. Luna, J.R. Romero and S. Ventura.
Analysis for the Effectiveness of G3PARM Algorithm.
Proceedings of the 5th International Conference on Hybrid Artificial Intelligence Systems (HAIS'10), pp. 27-34. 2010.
- J.M. Luna, J.R. Romero and S.Ventura.
Mining and Representing Rare Association Rules through the Use of Genetic Programming.
Proceedings of the 3rd World Congress on Nature and Biologically Inspired Computing (NaBIC'11), pp. 86-91. 2011.
- J.M. Luna, J.R. Romero, C. Romero and S. Ventura.
A Genetic Programming Free-Parameter Algorithm for Mining Association Rules.
Proceedings of the 12th International Conference on Intelligent Systems Design and Applications (ISDA'12), pp. 64-69. 2012.