EMERGING TRENDS IN DATA ANALYSIS
Rafael Barbudo
Alberto Cano
Krzysztof Cios
Carlos García
Eva L. Gibaja
Jorge González
José María Luna
María Luque
Gabriella Melki
José María Moyano
Mykola Pechenizkiy
Alexandro Provetti
Oscar G. Reyes
Cristóbal Romero
José Raúl Romero
Marie Sacksick
Amelia Zafra
Eduardo Pérez
SUMMARY
Project EMERalD (EMERging trends in Data analysis) has as main objective to develop data analysis methodologies/proposals for solving complex problems in biomedicine and education. In this sense, solutions will be reached by designing new algorithms or adapting existing ones according to the problem characteristics, following a systematic work for its resolution and performing data validation on real datasets. Lastly, but not least, new intuitive tools will be developed to work with the proposed methodologies so specialists in the application domain having no background neither in programming nor in existing tools (Weka, R, …) can easily use them.
Regarding the Data Mining paradigms involved in this project, and based on the nature of the problems to be tackled, both predictive models (classification and regression) and descriptive models (pattern mining and extracting subsets of interest) will be developed. In both cases, either conventional or flexible data representations (multi-instance and/or multi-target) will be considered depending on the problem at hand. In the specific case of predictive models, interpretability of the models is a dare so a special emphasis will be placed on white box models (either directly obtained or derived from accurate black box models).
As a natural evolution of the previous works in scalability and parallel models on multi-core platforms and GPUs carried out by the research group, the project will focus on models based on MapReduce considering the Spark framework, as well as traditional programing languages (Java) and some of those that have become more popular for data analysis (Python, Julia and R). The project will also focus on the development of Deep Learning models (which have shown exceptional behavior in a multitude of problems) and combined with more flexible data representations, when applicable.
The practical component of this project is notorious as demonstrated the importance given to the development of solutions to the raised problems. In the educational field, early prediction models will be developed and they will be applied to different sets of students. Additionally, different models for self-assessment and peer-review will be developed, as well as models to recommend didactic materials for students with similar characteristics. As for the biomedical field, models for early diagnosis of melanoma will be developed, and different temporal patterns of hypertension will be analyzed, which will be related to different pathologies that might cause them. Furthermore, we will analyse patterns related to complications that occur after removing
a colorec
tal cancer and, finally, we will analyse which gene expression factors are responsible for the appearance of different tumors.
RELATED RESEARCH GROUPS
There are some national research groups that works in the subjects related to this project. These groups meet periodically in the Taller de Minería de Datos y Aprendizaje (TAMIDA), as well as in the Conferencia de la Asociación Española para la Inteligencia Artificial (AEPIA). The list of research groups that work in data mining is available at the REDMIDAS website. Our team, as well as several of the following detailed research groups, belong to the Red de Excelencia en Big Data y Análisis de Datos Escalable (BigDADE). The following research groups have published works on the topics related to this project:
- SCI2S research group (University of Granada), headed by Francisco Herrera, which is working in multi-instance learning, association rules mining, and more recently they have published several works in the big data area.
- SIMIDAT research group (University of Jaén), headed by María José del Jesus, which have worked in association rules mining, subgroup discovery, and more recently in multi-label learning.
- CIG research group (Universidad Politécnica de Madrid), headed by Concepción Bielza and Pedro Larrañaga, is mainly focused on multi-label learning, among other topics.
- ML research group (University of Oviedo, in Gijón), headed by Antonio Bahamonde, with high experience in multi-label learning.
- MINERVA research group (University of Sevilla), headed by José C. Riquelme, which have different works in association rules mining.
- IDBIS research group (University of Granada), headed by Juan Carlos Cubero, which have been worked in association rules mining.
- MIDAS research group (Universidad Politécnica de Madrid), headed by Ernestina Menasalvas and Alejandro Rodríguez, which is focused on the big data analytics line.
- LIDIA research group (University of A Coruña), headed by Amparo Alonso, which have worked in artificial neural networks, and more recently in the big data area.
Many of the research groups listed above have also worked in problems similar to the ones proposed in this project proposal. Thus, the ML, LIDIA and MINERVA groups have worked in educational problems. Also SIMIDAT, LIDIA and MIDAS groups have worked in clinical and/or biomedical data mining, although these are not the only fields of application in which they have worked. Our team has already collaborated with many of these groups in several occasions, and we hope to stablish new collaborations with them and other groups (specially international groups) along the development of this project, in order to be able to form competitive consortiums to request projects in international research calls (as H2020).
Books
- S. Ventura, J.M. Luna. “Supervised Descriptive Pattern Mining”. Springer 2018, ISBN 978-3-319-98139-0, pp. 1-185.
- R. Salado-Cid, A. Ramírez, J.R. Romero. “On the Need of Opening the Big Data Landscape to Everyone: Challenges and New Trends”. Digital Marketplaces Unleashed, pp. 675-685. Editors: C. Linnhoff-Popien, R. Schneider, M. Zaddach. Springer-Verlag. 2018.
Journal articles
- O. G. Reyes, C. Morell & S. Ventura. Effective active learning methods for multi-label learning. Neurocomputing, 273, 494-508, 2018.
- O. Reyes, A. Cano, H. Fardoun, S. Ventura. A locally weighted learning method based on a data gravitation model for multi-target regression. International Journal of Computational Intelligence Systems, 11(1), 282 – 295, 2018.
- F. Padillo, J.M. Luna, S. Ventura & F. Herrera. Mining association rules on big data through mapreduce genetic programming. Integrated Computer-Aided Engineering, 25(1), 31-48, 2018.
- O. G. Reyes, S. Ventura. Evolutionary Strategy to perform Batch-Mode Active Learning on the Multi-Label Context. ACM Trans. on Intelligent Systems and Technology, 9(4), art. 48, 2018.
- J. M. Moyano, E. Gibaja, K. Cios & S. Ventura. Review of ensembles of multi-label classifiers: Models, experimental study and prospects. Information Fusion, 44, November, pp. 33-45, 2018.
- O. G. Reyes, A. Altalhi & S. Ventura – Statistical comparisons of active learning strategies over multiple datasets. Knowledge Based Systems, 145, Vol. April, 274-288, 2018.
- A. Guerrero, C. Morell, S. Ventura. A gene expression programming algorithm for discovering classification rules in the multi-objective space. International Journal of Computational Intelligence Systems, 11(1), 540.559, 2018.
- G. Melki, A. Cano, S. Ventura. Multi-Instance Support Vector Machine with Bag Representatives. Pattern Recognition, 79, July, 228-241, 2018.
- G. Melki, V. Kecman, S. Ventura, A. Cano. OLLAWV: OnLine Learning Algorithm using Worst-Violators. Applied Soft Computing, 86, May, 384-393, 2018.
- J. González-López, S. Ventura & A. Cano. Distributed Nearest Neighbor Classification for Large-Scale Multi-label Data on Spark. Future Generation Computing Systems, 87, pp 66-82, 2018.
- O. G. Reyes, H. Fardoun, S. Ventura. An effective ensemble method for selecting instances in multi-target regression problems. Integrated Computer-Aided Engineering, 25(4), 305-320, 2018.
- J. M. Luna, M. Ondra, H. Fardoun, S. Ventura. Optimization of quality measures in association rule mining: an empirical study, International Journal on Computational Intelligence Systems, 12(1):59-78, 2018.
- M. D. Gahete, M. del Río, A. Camargo, J.F. Alcalá, E. Alors-Pérez. J. Delgado-Lista, O. Reyes, S. Ventura, O. Pérez-Martínez, J. P. Castaño, J. López Miranda & R. Luque. Changes in splicing machinery components influence, precede, and early predict the development of type 2 diabetes: from the CORDIOPREV Study. EBioMedicine, 37:356-365, 2018.
- F. Padillo, JM Luna & S. Ventura. Evaluating Associative Classification Algorithms for Big Data. Big Data Analytics, 4:2, 2019.
- C. Luque, J.M. Luna, M. Luque, S. Ventura. An Advanced Review on Text Mining in Medicine. WIREs DMKD, 9(3), 2019.
- F. Padillo, JM Luna & S. Ventura. A grammar-guided genetic programing algorithm for associative classification in Big Data. Cognitive Computation, 2018.
- J.M. Moyano, E. Gibaja, K. Cios & S. Ventura. An evolutionary approach to build MLC-based ensembles. Information Fusion, 2018.
- O. G. Reyes & S. Ventura. Performing Multi-Target Regression via a Parameter Sharing-based Deep Network. Int. Journal of Neural Systems, 2019.
- H. Abouzid, O. Chakkor, O. G. Reyes, S. Ventura. Signal Speech Reconstruction and Noise removal using Convolutional Denoising Audioencoders with Neural Deep Learning. Analog Integrated Circuits & Signal Processing, 2019.
- M. del Río-Moreno, E. Alors-Pérez, S. González-Rubio, G. Ferrín, O Reyes, M. Rodríguez-Perálvarez, M. E. Sánchez-Frías, R. Sánchez-Sánchez, S. Ventura, J. López-Miranda, R. D. Kineman, M. de la Mata, J. P. Castaño, M. D. Gahete, R. Luque. Dysregulation of the splicing machinery is associated to the development of non-alcoholic fatty liver disease. The Journal of Clinical Endocrinology & Metabolism, 2019.
- S. U. l. Hassan, H. Waheed, N. R. Aljohani, M. Ali, S. Ventura & F. Herrera. Virtual Learning Environment to Predict Withdrawal by Leveraging Deep Learning. Int. J. of Intell. Sys., 2019.
- L. Quintero, C. Morell & S. Ventura. WordificationMI: Multi-Relational Data Mining through Multiple Instance Propositionalization. Progress in Artificial Intelligence, 2019.
- H. Robles, A. Zafra, H. Fardoun, S. Ventura. A C++ library for Clustering based on Evolutionary Algorithms. Knowledge Based Systems, 2019.
- J. M. Luna, P. Fournier-Viger, S. Ventura. Frequent Itemset Mining: a 25 Years Review. WIREs DMKD, 2019.
- J.M. Luna, H. Fardoun, F. Padillo, C. Romero & S. Ventura. Subgroup Discovery in MOOCs: A Big Data application for describing different types of learners. Interactive Learning Environments, 2019.
- J.M. Luna, C.J. Carmona, A. M. García-Vico, M. J. del Jesus, S. Ventura. Multiple Instance Subgroup Discovery. International Journal of Computational Intelligence Systems, 2019.
- C. García-Martínez, R. Cerezo, M. Bermúdez & C. Romero. Improving essay peer grading accuracy in massive open online courses using personalized weights from student’s engagement and performance. Journal of Computer Assisted Learning, 2019.
- P. G. Espejo, E. Gibaja, V.H. Menéndez, A. Zapata, C. Romero. “Improving Multi-Label Classification for Learning Objects Categorization by Taking into Consideration Usage Information”. JUCS, vol. 25 (13), pp. 1687-1716. 2019.
- C. Romero, S. Ventura. “Guest Editorial: Special Issue on Early Prediction and Supporting of Learning Performance”. IEEE-TLT, vol. 12(2). 2019.
- J. López-Zambrano, J.A. Lara, C. Romero. “Towards Portability of Models for Predicting Students’ Final Performance in University Courses Starting from Moodle Logs”. Appl. Sci. vol. 10, pp. 354. 2020.
- J. A. Delgado-Osuna, C. García-Martínez, J. Gómez-Barbadillo, S. Ventura. “Heuristics for interesting class association rule mining a colorectal cancer database”. Information Processing Management, vol. 57, 102207. 2020.
- C. García-Martínez, S. Ventura. “Multi-view Genetic Programming Learning to Obtain Interpretable Rule-Based Classifiers for Semi-supervised Contexts. Lessons Learnt”. International Journal of Computational Intelligence Systems, vol. 13, pp. 576-590. 2020.
- J.M. Luna, M. Pechenizkiy, W. Duivesteijn, S. Ventura. “Exceptional in so Many Ways – Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios”. IEEE Access, vol. 8, pp. 200982-200994. 2020.
- J.M. Luna, P. Fournier-Viger, S. Ventura. “Extracting User-Centric Knowledge on Two Different Spaces: Concepts and Records”. IEEE Access, vol. 8, pp. 134782-134799. 2020.
- F. Padillo, J.M. Luna, S. Ventura. “LAC: Library for associative classification”. Knowledge Based Systems, vol. 193, 105432. 2020.
- C. Luque, J.M. Luna, S. Ventura. “A semantically enriched text mining system for clinical decision support”. Computational Intelligence, pp. 1-26. 2020.
- J.M. Moyano, E. Gibaja, K.J. Cios, S. Ventura. “Combining multi-label classifiers based on projections of the output space using Evolutionary algorithms”. Knowledge-Based Systems, vol. 196, 105770. 2020.
- O. Reyes, E. Pérez, R.M. Luque, J. Castaño, S. Ventura. “A supervised machine learning-based methodology for analyzing dysregulation in splicing machinery: An application in cancer diagnosis”. Artificial Intelligence in Medicine, vol. 108, 101950. 2020.
- A. Esteban, A. Zafra, C. Romero. “Helping university students to choose elective courses by using a hybrid multi-criteria recommendation system with genetic optimization”. Knowledge-Based Systems, vol. 194, pp. 1-14. 2020.
- P. Fournier-Viger, P. Yang, R.U. Kiran, S. Ventura, J.M. Luna. “Mining local periodic patterns in a discrete sequence”. Information Sciences, vol. 544, pp. 519-548. 2021.
- J.M. Moyano, O. Reyes, H.M. Fardoun, S. Ventura. “Performing multi-target regression via gene expression programming-based ensemble models”. Neurocomputing, vol. 432, pp. 275-287. 2021.
- M. Frias, J.M. Moyano, A. Rivero-Juarez, J.M. Luna, A. Camacho, H.M. Fardoun, I. Machuca, M. Al-Twijri, A. Rivero, S. Ventura. “Data mining approach improves classification accuracy of HCV infection outcome”. Journal of Medical Internet Research, In press. 2021.
- E. Pérez, O. Reyes, S. Ventura. “Convolutional neural networks for the automatic diagnosis of melanoma: An extensive experimental study”. Medical Image Analysis, 67. 2021.
- R. Barbudo, A. Ramírez, F. Servant, J.R. Romero. “GEML: A grammar-based evolutionary machine learning approach for design-pattern detection”. Journal of Systems and Software, vol. 175. 2021.
- J.M. Jiménez-Vacas, V. Herrero-Aguayo, A.J. Montero-Hidalgo, E. Gómez-Gómez, A.C. Fuentes-Fayos, A.J. León-González, P. Sáez-Martínez, E. Alors-Pérez, S. Pedraza-Arévalo, T. González-Serrano, O. Reyes, A. Martínez-López, R. Sánchez-Sánchez, S. Ventura, E. M Yubero-Serrano, M.J. Requena-Tapia, J.P Castaño, M.D. Gahete, R.M. Luque. Dysregulation of the splicing machinery is directly associated to aggressiveness of prostate cáncer. EBioMedicine, 51, 10547, 2020.
- O. Reyes, A. Cano, H. Fardoun, S. Ventura. A locally weighted learning method based on a data gravitation model for multi-target regression. International Journal of Computational Intelligence Systems, 11(1), 282 – 295, 2018.
- A. Guerrero, C. Morell, S. Ventura. A gene expression programming algorithm for discovering classification rules in the multi-objective space. Int. J. of Computational Intelligence Systems, 11(1), 540.559, 2018.
- J. M. Luna, M. Ondra, H. Fardoun, S. Ventura. Optimization of quality measures in association rule mining: an empirical study, International Journal on Computational Intelligence Systems, 12(1):59-78, 2018.
- C. García-Martínez, S. Ventura. “Multi-view Genetic Programming Learning to Obtain Interpretable Rule-Based Classifiers for Semi-supervised Contexts. Lessons Learnt”. International Journal of Computational Intelligence Systems, vol. 13, pp. 576-590. 2020.
- J.M. Luna, M. Pechenizkiy, W. Duivesteijn, S. Ventura. “Exceptional in so Many Ways – Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios”. IEEE Access, vol. 8, pp. 200982-200994. 2020.
- J.M. Luna, P. Fournier-Viger, S. Ventura. “Extracting User-Centric Knowledge on Two Different Spaces: Concepts and Records”. IEEE Access, vol. 8, pp. 134782-134799. 2020.
International conferences
- O. Reyes, J.M. Moyano, J.M. Luna, S. Ventura. A gene expression programming method for multi-target regression. International Conference on Learning and Optimization Algorithms: Theory and Applications, LOPAL, 2018.
- A. Esteban, A. Zafra, C. Romero. A hybrid multi-criteria approach using a genetic algorithm for recommending courses to university students. Eleventh International Conference on Educational Data Mining, EDM, 2018.
- W. Chango, R. Cerezo, C. Romero. Predicting academic performance of university students from multi-sources data in blended learning. Proceedings of the Second International Conference on Data Science, E-Learning and Information Systems, DATA ’19, <\i> 2019.
- J.A. Delgado-Osuna, C. García-Martínez, S. Ventura, J. Gómez-Barbadillo. Obtaining Tractable and Interpretable Descriptions for Cases with Complications from a Colorectal Cancer Database. International Symposium on Computer-Based Medical Systems, 2019.
- J.M. Luna, F. Padillo, S. Ventura. Associative Classification in Big Data through a G3P Approach. 4th International Conference on Internet of Things, Big Data and Security, IoTBDS, <\i> 2019.
- C. Luque, J.M. Luna, S. Ventura. MiNerDoc: a Semantically Enriched Text Mining System to Transform Clinical Text into Knowledge. International Symposium on Computer-Based Medical Systems, 2019.
- J.M. Moyano, E. Gibaja, S. Ventura, A. Cano. Speeding Up Classifier Chains in Multi-Label Classification. 4th International Conference on Internet of Things, Big Data and Security, IoTBDS, 2019.
- O. Reyes, E. Pérez, S. Ventura. Performing melanoma diagnosis by an effective convolutional architecture. Skin Lesion Analysis Towards Melanoma Detection, ISIC, 2019.
- O. Reyes, R.M. Luque, J. Castaño, S. Ventura. A Supervised Methodology for Analyzing Dysregulation in Splicing Machinery: An Application in Cancer Diagnosis. Int. Symposium on Computer-Based Medical Systems 2019. 2019.
- W. Chango, M. Sanchez-Santillan, R. Cerezo, C. Romero. Predicting students’ performance using emotion detection from face-recording video when interacting with an ITS. EDM20, 2019.
- P. Fournier-Viger, Y. Wang, J. Chun-Wei Lin, J.M. Luna, S. Ventura. Mining Cross-Level High Utility Itemsets. 33rd International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, IEA/AIE, 2020.
- J.M. Moyano, E. Gibaja, K.J. Cios, S. Ventura. Generating Ensembles of Multi-Label Classifiers Using Cooperative Coevolutionary Algorithms. 24th European Conference on Artificial Intelligence, ECAI 2020.
- J.M. Moyano, E. Gibaja, K.J. Cios, S. Ventura. Tree-shaped ensemble of multi-label classifiers using grammar-guided genetic programming. IEEE Congress on Evolutionary Computation, CEC, 2020.
- A. Esteban, A. Zafra, S. Ventura. A Preliminary Study on Evolutionary Clustering for Multiple Instance Learning. IEEE Congress on Evolutionary Computation, CEC, 2020.
- M. Pinargote, L. Bowen, J. Meza, S. Ventura. Accuracy’s Measures of Sentiment Analysis Algorithms for Spanish Corpus generated in Peer Assessment. The International Conference of Engineering & MIS, ICEMIS, 2020.
- L. Bowen, M. Pinargote, J. Meza, S. Ventura. Trends the use of Artificial Intellligence Techniques for Peer Assessment. The International Conference of Engineering & MIS, ICEMIS, 2020.
National conferences
- R. Salado-Cid, J.R. Romero. Interoperabilidad de flujos de trabajo intensivos en datos en Industria 4.0: caso de estudio. I Workshop en Aplicaciones de la Inteligencia Artificial para la Industria 4.0, IndustrIA’18 2018.
- R. Barbudo, J.R. Romero, S. Ventura. Aprendizaje automático con programación genética gramatical para la detección de patrones de diseño. XVIII Conferencia de la Asociación Española para la Inteligencia Artificial, CAEPIA, 2018.
- O. Reyes, J.M. Luna, J.M. Moyano, E. Pérez, S. Ventura. Resolución de Problemas Biomédicos mediante Técnicas de Extracción de Conocimiento. XVIII Conferencia de La Asociación Española Para La Inteligencia Artificial, CAEPIA, 2018.
- E. Pérez, L.D. González, L.M. Sánchez, O. Reyes, S. Ventura. JCLAL 2.0: mejoras y nuevas funcionalidades en la herramienta Java de código abierto para el aprendizaje activo. XVIII Conferencia de La Asociación Española Para La Inteligencia Artificial, CAEPIA, 2018
- A. Esteban, A. Zafra, C. Romero. Un Sistema de Recomendación de Asignaturas Multi-Criterio con Optimización Genética. XVIII Conferencia de La Asociación Española Para La Inteligencia Artificial, CAEPIA, 2018