MINING DATA WITH MORE FLEXIBLE REPRESENTATIONS
Alberto Cano
Krzysztof Cios
Carlos García
Eva L. Gibaja
Alain Guerrero
José María Luna
Carmen Luque
María Luque
José María Moyano
Francisco Padillo
Mykola Pechenizkiy
Aurora Ramírez
Oscar G. Reyes
Hermes Robles
Cristóbal Romero
José Raúl Romero
Amelia Zafra
SUMMARY
Project MARFIL (Mining data with more flexible representations) has as objective to develop novel approaches for knowledge extraction in those contexts demanding some additional flexibility in data representation:
- Multi-instance and relational learning models that enable a more flexible representation of the input space.
- Learning models with multiple outputs, especially multi-label learning, that allow representing the output space with more flexibility.
- Multi-source and multi-view learning models, which make possible to combine together several data sets describing the same problem using models individually chosen for each of these data sources.
Having all the approaches aforementioned, we will develop new models in the scope of classification, clustering, association and subgroup discovery. We will also enable mechanisms to adapt these models to problems with special characteristics, such as a large number of variables, or very large data sets, as the circumstance dictates. Some of these problems fit into the so-called big data term, and therefore our proposals will be adapted to this new landscape, supplying scalable implementations that are able to provide innovative, appropriate solutions in these contexts.
In addition to its theoretical dimension, previously introduced, this project has got an applied orientation, since we expect to solve several real life problems making use of the developed models. More specifically, we will address some issues related to the context of educational data mining (predicting students’ academic performance, modelling self-assessment and peer assessment plans, and developing resource and activities recommendation models for students), and biomedicine (early diagnosis by studying electronic health records, and predicting the risk of insulin metabolism diseases and related pathologies). It is remarkable the interest that nowadays arouses both application fields in our society, as well as the significant impact that any small step forward would have on the health and educational communities. In fact, in addition to our close cooperation with the Universities involved in this project and the Maimónides health research institute, several companies in both sectors have already shown their interest in the results derived from this proposal. Therefore, in a first stage, we will analyse whether these representation models really represent an important step forward to serve the problem resolution with respect to traditional approaches. In a second stage, the existing state of the art methods will be compared to our own proposals, where we expect to achieve significantly improved outcomes.
Last but not least, in order to promote the conducted research, we plan to build test data repositories together with each one of the resulting models in order to allow the scientific community to replicate our experimentation and thoroughly compare the results. Furthermore, we will integrate the developed models into the today’s most relevant software platforms in order to facilitate their dissemination.
RESEARCH RESULTS
Software
Dataset repositories
Books
- S. Ventura, J. M. Luna. Pattern Mining with Evolutionary Algorithms. Springer, 2016.
- F. Herrera, S. Ventura, R. Bello, C. Cornelis, A. Zafra, D. Sanchez-Tarragó, S. Vluymans. Multiple Instance Learning. Foundations and Algorithms. Springer, 2016.
- S. Ventura y J.M. Luna. Supervised Pattern Mining. Springer (forthcoming).
Journal articles
- A. Cano, J. M. Luna, E. L. Gibaja, S. Ventura. LAIM discretization for multi-label data. Information Sciences, 330, pp 370-384, 2016.
- J. M. Luna, C. Romero, J. R. Romero, S. Ventura. An Evolutionary Algorithm for the Discovery of Rare Class Association Rules in Learning Management Systems. Applied Intelligence, 42(3), 501-513, 2015. DOI: 10.1007/s10489-014-0603-4.
- A. Cano, A. Zafra, S. Ventura. Speeding up Multiple Instance Learning Classification Rules on GPUs. Knowledge and Information Systems, 44(1), 127-145, 2015. DOI: 10.1007/s10115-014-0752-0.
- O. Reyes, C. Morell, S. Ventura. Scalable extensions of the ReliefF algorithm for weighting and selecting features on the multi-label learning context. Neurocomputing, 161, 168-182, 2015. DOI: 10.1016/j.neucom.2015.02.045.
- A.Cano, J. M. Luna, A. Zafra, S. Ventura. A classification module for Genetic Programming Algorithms in JCLEC. Journal of Machine Learning Research, 16, 491-494, 2015.
- E. Gibaja, S. Ventura. A Tutorial on Multi-Label Learning. ACM Computing Surveys, 47(3), 1-38, 2015. DOI: 10.1145/2716262.
- J. M. Luna, M. Pechenizkiy, S. Ventura. Mining Exceptional Relationships with Grammar-Guided Genetic Programming. Knowledge and Information Systems, 2015. DOI: 10.1007/s10115-015-0859-y.
- J. L. Olmo, C. Romero, E. Gibaja, S. Ventura. Improving Meta-learning for Algorithm Selection by Using Multi-label Classification: A Case of Study with Educational Data Sets. International Journal of Computational Intelligence Systems. 8(6): 1144-1164. 2015. DOI: 10.1080/18756891.2015.1113748.
- A. Cano, J. M. Luna, E. L. Gibaja, S. Ventura. LAIM discretization for multi-label data. Information Sciences, 330, 370-384, 2016. DOI: 10.1016/j.ins.2015.10.032.
- C. Márquez-Vera, A. Cano, C. Romero, A. Y. M. Noaman, H. M. Fardoun, S. Ventura. Early Dropout Prediction using Data Mining: A Case Study with High School Students. Expert Systems, 33(1), 107-124, 2016. DOI: 10.1111/exsy.12135.
- J.M. Luna, A. Cano, V. Sakalauskas, S. Ventura. Discovering Useful Patterns from Multiple Instance Data. Information Sciences. Volume 357, 20 August 2016, Pages 23–38. DOI:10.1016/j.ins.2016.04.007.
- J. M. Luna, M. Pechenizkiy, S. Ventura. Mining Exceptional Relationships with Grammar-Guided Genetic Programming. Knowledge and Infomation Systems. June 2016, Volume 47, Issue 3, pp 571–594. DOI: 10.1007/s10115-015-0859-y.
- J. M. Luna, A. Cano, M. Pechenizkiy, S. Ventura. Speeding-up Association Rule Mining with Inverted Index Compression. IEEE Transactions on Cybernetics. 46 – 12, pp. 3059 – 3072. IEEE, 2016. ISSN 2168-2267. DOI: 10.1109/TCYB.2015.2496175.
- A. Y. Noaman, J. M. Luna, A. H. M. Ragab, S. Ventura. Recommending degree studies according to students’ attitudes in high school by means of subgroup discovery. International Journal of Computational Intelligence Systems. 9(6), pp. 1101 – 1117, 2016. DOI: 10.1080/18756891.2016.1256573.
- O. Reyes, C. Morell, S. Ventura. Effective lazy learning algorithm based on data gravitation model for multi-label learning. Information Sciences, 340–341, pp 59–174, 2016. DOI: 10.1016/j.ins.2016.01.006.
- O. Reyes, E. Pérez-Perdomo, M.C. Rodríguez-Hernández, H. Fardoun, S. Ventura. JCLAL: A Java Framework for Active Learning. Journal of Machine Learning Research. Available at: http://jmlr.org/papers/v17/15-347.html .
- J. M. Moyano, E. L. Gibaja, S. Ventura. MLDA: A tool for analyzing multi-label datasets. Knowledge-Based Systems. 121, pp. 1 – 3, 2017. DOI: 10.1016/j.knosys.2017.01.018.
- A. H. Altalhi, J.M. Luna, M.A. Vallejo, S. Ventura. Evaluation and comparison of open source software suites for data mining and knowledge discovery. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 7(3), 1 – 18. 2017. DOI: 10.1002/widm.1204.
- J. M. Luna, C. Castro, C. Romero. MDM tool: A data mining framework integrated into Moodle. Computer Applications in Engineering Education. 25(1), 90-102, 2017. DOI: 10.1002/cae.21782.
- C. Romero, S. Ventura. Educational Data Science in Massive Open Online Courses. WIREs Data Mining and Knowledge Discovery, 7(1): e1187. 2017. DOI: 10.1002/widm.1187.
- A. Cano, C. García-Martínez, S.Ventura. Extremely high-dimensional optimization with Map-Reduce: Scaling functions and algorithm. Information Sciences 415-416: 110-127. DOI: 10.1016/j.ins.2017.06.024.
- A. Cano. An ensemble approach to multi-view multi-instance learning. Knowledge-Based Systems, vol. 136, 46-57, 2017. DOI: 10.1016/j.knosys.2017.08.022.
- G. Melki, A. Cano, V. Kecman, S. Ventura. Multi-Target Support Vector Regression Via Correlation Regressor Chains. Inf. Sciences, vol. 415-416, 53-69, 2017. DOI: 10.1016/j.ins.2017.06.017.
- A. Cano. A survey on graphic processing unit computing for large-scale data mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(1), e1232, 2018. DOI: 10.1002/widm.1232.
- J.M. Moyano, E.L. Gibaja, K.J. Cios, S. Ventura. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion. 44, 33 – 45, 2018. DOI: 10.1016/j.inffus.2017.12.001.
- O. Reyes, S. Ventura. Evolutionary Strategy to perform Batch-Mode Active Learning on Multi-Label Data. ACM Transactions on Intelligent Systems and Technology, vol. 9 (4), 46:1-46:26, 2018. DOI: 10.1145/3161606.
- O. Reyes, A. Cano, H.M. Fardoun, S. Ventura. A locally weighted learning method based on a data gravitation model for multi-target regression. Int. J. of Computational Intelligence Systems, vol. 11(1), pp. 282-295, 2018. DOI: 10.2991/ijcis.11.1.22.
- O. Reyes, C. Morell, S. Ventura. Effective active learning strategy for multi-label learning, Neurocomputing. vol. 273, pp. 494-508, 2018. DOI: 10.1016/j.neucom.2017.08.001.
- O. Reyes, A. H.Altalhib, S. Ventura. Statistical comparisons of active learning strategies over multiple datasets. Knowledge-based Systems, vol. 145, pp. 274-288, 2018. DOI: 10.1016/j.knosys.2018.01.033.
- F. Padillo, J. M. Luna, F. Herrera, S. Ventura. Mining association rules on Big Data through MapReduce genetic programming. Integrated Computer-Aided Engineering 25(1): 31-48. 2018. DOI: 10.3233/ICA-170555.
- J. M. Luna, F. Padillo, M. Pechenizkiy, S. Ventura. Apriori versions based on MapReduce for Mining Frequent Patterns on Big Data. IEEE Transactions on Cybernetics. 2018. DOI: 10.1109/TCYB.2017.2751081.
- J. M. Luna, M. Pechenizkiy, M. J. del Jesus, S. Ventura. Mining Context-Aware Association Rules Using Grammar-Based Genetic Programming. IEEE Transactions on Cybernetics, 2018. DOI: 10.1109/TCYB.2017.2750919.
- A. Guerrero, C. Morell, S. Ventura. A gene expression programming algorithm for discovering classification rules in the multi-objective space. International Journal of Computational Intelligence Systems, 2018. DOI: 10.2991/ijcis.11.1.40.
- G. Melki, A. Cano, S. Ventura. Multi-Instance Support Vector Machine with Bag Representatives. Pattern Recognition, 2018. DOI: 10.1016/j.patcog.2018.02.007.
- G. Melki, V. Kecman, S. Ventura, A. Cano. OLLAWV: OnLine Learning Algorithm using Worst-Violators. Applied Soft Computing, 2018. DOI: 10.1016/j.asoc.2018.02.040.
- A. Bogarin, R. Cerezo, C. Romero. A survey on educational process mining. WIREs Data Mining Knowledge Discovery, 8(1): 2018. DOI: 10.1002/widm.1230.
- C. García, S. Ventura. Multi-view learning and genetic programming to obtain interpretable rulebased classifiers for semi-supervised contexts. Lessons learnt. International Journal of Pattern Recognition, 2017.
- C. Luque, J.M. Luna, M. Luque, S. Ventura. Transforming Clinical Information in Knowledge: A Review of Text Mining Contributions to Medicine. WIREs DMKD, 2017.
- J.M. Luna, M. Ondra, H. Fardoun, S. Ventura. Analysis of quality measures in association rule mining. International Journal on Computational Intelligence Systems, 2017.
- J. López, S. Ventura, A. Cano. Distributed Nearest Neighbor Classification for Large-Scale Multilabel Data on Spark. Future Generation Computing Systems, 2018.
- A. Arenas-Larriva , F. Gomez-Delgado, D. Torres-Peña , A. Garcia-Rios, J. Lopez-Moreno , A. I. Perez-Caballero , O. A. Rangel-Zuñiga , E. Yubero-Serrano , A. Camargo , R. Otero-Aragon, J.M. Luna, G. Quintana-Navarro, E. Herruzo-Gomez , S. Ventura, J. Ordovas , E.J.P. de Koning , P. Perez-Martínez , J. López-Miranda , J. Delgado-Lista. GLUCOGENE: Diabetes risk prediction at 2 years for coronary patients on dietary advice (from the CORDIOPREV Study). Scientific Reports, 2018.
- H. Robles, A. Zafra, H. Fardoun, S. Ventura. An efficient library for clustering with evolutionary algorithms. Knowledge Based Systems, 2018
- L. A. Quintero-Domínguez, C. Morell, S. Ventura. WordificationMI: Multi-Relational Data Mining through Multiple-Instance Propositionalization. The Computer Journal, 2018.
- E.L. Gibaja, J.M. Moyano, S. Ventura. An ensemble-based approach for multi-view multi-label classification. Progress in Artificial Intelligence, 5(4), 251-259, 2016.
- J.M. Luna. Pattern mining: current status and emerging topics. Progress in Artificial Intelligence, 5(3), pp 165–170, 2016.
- F. Padillo,J.M. Luna, S. Ventura. Exhaustive search algorithms to mine subgroups on Big Data using Apache Spark. Progress in Artificial Intelligence. 6(2), pp. 145-158, 2017.
- A. Bogarín, C. Romero, R. Cerezo, M. Sánchez-Santillán. Educational Process Mining: A tutorial and case study using Moodle data sets. S. ElAtia, D. Ipperciel, & O. Zaïane, eds (2016). Data Mining and Learning Analytics: Applications in Educational Research. John Wiley & Sons. 3-23.
- A. Guerrero, C. Morell, S. Ventura. A gene expression programming algorithm for discovering classification rules in the multi-objective space. International Journal of Computational Intelligence Systems, 2018.
- O. Reyes, A. Cano, H.M. Fardoun, S. Ventura. A locally weighted learning method based on a data gravitation model for multi-target regression. International Journal of Computational Intelligence Systems, vol. 11(1), pp. 282-295, 2018.
- A. Y. Noaman, J. M. Luna, A. H. M. Ragab, S. Ventura. Recommending degree studies according to students’ attitudes in high school by means of subgroup discovery. International Journal of Computational Intelligence Systems. 9(6), pp. 1101 – 1117, 2016.
Patents
- F. J. Delgado Lista, J. F. Alcalá Díaz, F. J. Gomez Delgado, J. D. Torres Peña, A. Garcia Rios, A. Isabel Perez Caballero, J. Lopez Miranda, P. Perez Martinez, E. Herruzo Gomez, S. Ventura, A. Camargo Garcia, O. A. Rangel Zuñiga, R. Otero Aragon, J. M. Luna. GLUCOGENE. Request number 201799901595517, registered on 10/05/2017.
International conferences
- A. Bogarin, C. Romero, R. Cerezo. Discovering student’s navigation path in moodle. International Conference on Educational Data Mining, Madrid, Spain, pp 556-557. 2015.
- M.A. Jiménez-Gómez, J. M. Luna, C. Romero, S. Ventura. Discovering Clues to Avoid Middle Shool Failure as Early as Possible. Learning Analytics and Knowledge (LAK), NY, USA. pp 300-305. 2015.
- A. Ramírez, J.R. Romero, S. Ventura. An Extensible JCLEC-based Solution for the Implementation of Multi-Objective Evolutionary Algorithms. 17th Annual Conference on Genetic and Evolutionary Computation, GECCO’15, pp. 1085-1092. 2015.
- C. Romero, R. Cerezo, J.A. Espino, M. Bermudez. Using Android Wear for Avoiding Procrastination Behaviours in MOOCs. Learning at Scale (L@S), Edimburgo, Scotland, UK, pp 193-196. 2016.
- A. Zapata, V. H. Menéndez, C. Romero, M.E. Prieto. Meta-learning for predicting the best vote aggregation method: Case study in collaborative searching of Los. Proceedings of the 9th International Conference on Educational Data Mining, EDM 2016, Raleigh, North Carolina, USA, pp 656-657, 2016.
- F. Padillo, J. M. Luna, A. Cano, S. Ventura. A data structure to speed-up machine learning algorithms on massive datasets. Proceedings of the 11th International Conference, HAIS 2016, Seville, Spain, April 18-20, pp 365-376, 2016.
- F. Padillo, J. M. Luna, S. Ventura. Subgroup discovery on Big Data: exhaustive methodologies using Map-Reduce. IEEE Big Data Science and Engineering, 2016.
- F. Padillo, J. M. Luna, S. Ventura. Mining Perfectly Rare Itemsets on Big Data: An Approach Based on Apriori-Inverse and MapReduce. ISDA 2016. 2016.
- C. García-Martínez, S. Ventura. Multi-view semi-supervised learning using genetic programming interpretable classification rules. IEEE Congress on Evolutionary Computation. 2017.
- J.M. Moyano, E. Gibaja, S. Ventura. An evolutionary algorithm for optimizing the target ordering in Ensemble of Regressor Chains. IEEE Congress on Evolutionary Computation, pp. 2015-2021. 2017.
- O. Sánchez, J. M. Moyano, L. Sánchez, J. Alcalá-Fdez. Mining association rules in R using the package RKEEL. IEEE International Conference on Fuzzy Systems, pp. 1-6. 2017.
- F. Padillo, J. M. Luna, S. Ventura. An evolutionary algorithm for mining rare association rules: A Big Data approach. IEEE Congress on Evolutionary Computation, 2017. 2017, pp. 2007–2014.
- J. Gonzalez-Lopez, A. Cano, S. Ventura. Large-scale multi-label ensemble learning on Spark. IEEE Trustcom / BigDataSE / ICESS, pp. 893-900. 2017.
- P. González, E. Gibaja, A. Zapata, V. H. Menéndez, C. Romero. Classification of Learning Objects: Reducing the Number of Used Features. EDM 2017. 2017.
National conferences
- F. Ibáñez A. Cano, S. Ventura. Evaluación distribuida transparente para algoritmos evolutivos en JCLEC. II Jornadas de Algoritmos Evolutivos y Metaheurísticas (XVI CAEPIA), pp 231-240, 2015.
- J.M. Moyano, E.L. Gibaja, A. Cano, J.M. Luna, S. Ventura. Diseño Automático de Multi-Clasificadores Basados en Proyecciones de Etiquetas. II Jornadas de Fusión de la Información y ensembles (XVI CAEPIA), pp 355-366, 2015.
- J.M. Moyano, E.L. Gibaja, A. Cano, J.M. Luna, S. Ventura. Algoritmo evolutivo para optimizar ensembles de clasificadores multi-etiqueta. X Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp 219-225, 2015.
- J. Fuentes-Alventosa, C. Romero, C. García-Martínez. Predicción de la aceptación o rechazo de las calificaciones propuestas por el alumnado usando técnicas de minería de datos. JENUI. Almeria. pp 203-210. 2016.
- J. M. Luna, F. Padillo, S. Ventura. Minería de reglas de asociación excepcionales extraidas con algoritmos evolutivos. XI Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp 127-136, 2016.
- F. Padillo, J. M. Luna, S. Ventura, F. Herrera. Algoritmo de programación genética gramatical para la extracción de reglas de asociación en Big Data usando el paradigma MapReduce. XI Congreso Español sobre Metaheurísticas, Algoritmos Evolutivos y Bioinspirados (MAEB), pp 137-148, 2016.
- F. Padillo, J. M. Luna, S. Ventura. Minería de patrones en BigData. XVII Conferencia de la Asociación Española para la Inteligencia Artificial, pp 769-778, 2016.
- O. G. Reyes, S. Ventura. Estrategia efectiva para aprendizaje activo multi-etiqueta. XVII Conferencia de la Asociación Española para la Inteligencia Artificial, pp 835-844, 2016.
- C. García-Martínez, S. Ventura. Un algoritmo de programación genética multivista para inducir clasificadores basados en reglas en contextos semi-supervisados. MAEB 2017. 2017.