CAKEBox – Knowledge Discovery and Intelligent Systems – KDIS

COMPUTATIONAL APPROACHES FOR KNOWLEDGE EXTRACTION ON BIG DATA

Status:

Finished

PI:

Sebastián Ventura

Reference:

UCO-FEDER 18 REF.1263116 MOD.A 1262678-F

Members:

Eva L. Gibaja
Jose María Luna
María Luque
José Raúl Romero
Amelia Zafra
Jose María Moyano
Rafael Barbudo
Alberto Cano
Krzysztof Cios
Mykola Pecheniskyi

Duration:

2 years

Budget:

24.500 €

Project:

SUMMARY

The main objective of the CAKEBox (Computational Approaches for Knowledge Extraction in Big data) project is the development of new models for knowledge extraction on Big Data. The final aim is to overcome, through these models, existing challenges that arise from this type of problems. This main objective can be itemized in the following sub-objectives:

First, new algorithms for both predictive and descriptive tasks will be developed, improving those proposals considered as the state-of-the-art. The proposed models will be easily scalable and able to deal with complex data representations. A special interest will be paid on the development of learning rule algorithms due to their interpretability as well as on ensemble methods which have shown a great accuracy and applicability to a variety of problems. All the designed algorithms will be validated through a series of datasets and implemented in the most popular Big Data platforms (Hadoop, Spark and Flink).

Second, the developed models will be applied to two real-world problems in the context of biomedicine. More specifically, such models will be used to study the impact of health habits in the prevention of cancer and the analysis of biochemical parameters in hemodialysis patients. In both cases, the aim is to obtain predictive models that help in the development of early diagnosis systems as well as descriptive models used to improve the understanding of the most representative factors on the problems at hand.

Finally, for the sake of sharing the developed proposals with the scientific community, any source code together with all the datasets used in the experimental stage will be provided. In addition, a repository of Docker images will be created which will include whatever is necessary to replicate any of the experiments carried out during the project development.

Journal articles

A. M. Trasierras, J. M. Luna, & S. Ventura. (2022). Improving the understanding of cancer in a descriptive way: An emerging pattern mining-based approach. Int. J. Intell. Syst. 37(4): 2822-2848. Factor de impacto 7.98. https://onlinelibrary.wiley.com/doi/10.1002/int.22503
J. M. Moyano, & S. Ventura. (2022). Auto-adaptive Grammar-Guided Genetic Programming algorithm to build Ensembles of Multi-Label Classifiers. Information Fusion, 78, 1-19. Factor de Impacto 12.97. https://www.sciencedirect.com/science/article/pii/S1566253521001469
J. M. Moyano, O. Reyes, H. M. Fardoun, & S. Ventura. (2021). Performing multi-target regression via gene expression programming-based ensemble models. Neurocomputing, 432, 275-287. Factor de Impacto 6.1. https://www.sciencedirect.com/science/article/pii/S0925231220319603
M. Frias, J.M. Moyano, A. Rivero-Juarez, J.M. Luna, A. Camacho, H.M. Fardoun, & S. Ventura. (2021). Classification Accuracy of Hepatitis C Virus Infection Outcome: Data Mining Approach. Journal of Medical Internet Research, 23(2), e18766. Factor de Impacto 5.43. https://www.jmir.org/2021/2/e18766/
E. Pérez, & S. Ventura. (2021). An ensemble-based convolutional neural network model powered by a genetic algorithm for melanoma diagnosis. Neural Computing and Applications. Impact Factor 5.6. https://doi.org/10.1007/s00521-021-06655-7
J.M. Luna, P. Fournier-Viger, & S. Ventura. (2020). Extracting User-Centric Knowledge on Two Different Spaces: Concepts and Records. IEEE Access 8: 134782-134799. Factor de Impacto 3.367. https://ieeexplore.ieee.org/document/9145755
J.M. Luna, M. Pechenizkiy, W. Duivesteijn, & S. Ventura. (2020). Exceptional in so Many Ways – Discovering Descriptors That Display Exceptional Behavior on Contrasting Scenarios. IEEE Access 8: 200982-200994. Factor de Impacto 3.367. https://ieeexplore.ieee.org/document/9245545

National conferences

A. Ramirez, S.Ventura, J.R. Romero. XIX Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA). Método XAI basado en agrupamiento para la explicación de errores en clasificación. 2021.
R. Barbudo, J.R. Romero, S. Ventura. IX Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA). AutoML para la composición de workflows diversos mediante programación genética gramatical. 2021.
A. Trasierras, J.M. Luna, S. Ventura. XIX Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA). Análisis descriptivo de cáncer de mama usando minería de datos. 2021.
A. Esteban, A. Zafra, S. Ventura. XIX Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA). Estudio comparativo de medidas de disimilitud para Clustering Multi-Instancia. 2021.
E. Gibaja, A. Zafra. XIX Conferencia de la Asociación Española para la Inteligencia Artificial (CAEPIA). Estudio de estrategias basadas en vecinos para clasificación multi-instancia multi-etiqueta. 2021.
R. Barbudo, S. Ventura, J.R. Romero. Proceedings of the 13th International Conference on Soft Computing and Pattern Recognition (SoCPaR 2021). Grammar-Based Evolutionary Approach for Automatic Workflow Composition with Open Preprocessing Sequence. 2021.