DEVELOPMENT AND APPLICATION OF NEW MACHINE LEARNING MODELS FOR THE COLORECTAL CANCER STUDY.
BASIC INFORMATION
Ph.D. Student: Jose Antonio Delgado
Advisors: Carlos García, Sebastián Ventura
Defended on: March 2023
Keywords: colorectal cancer, data learning, classification, association rules
Digital version
DESCRIPTION
Colorectal cancer is a pathology with an important relevance due to its high morbimortality. Most of the works applying Artificial Intelligence techniques to datasets of this disease use databases composed of medical images, such as colonoscopy images, CT scans, histological images or ultrasound images, gene expression data from microarrays, and less frequently, tabular data.
This doctoral thesis presents new machine learning models to identify factors that may influence the occurrence of complications and recurrences in patients with colorectal cancer. Moreover, because of we are talking about knowledge extraction in the healthcare field, we aim to obtain models that are both interpretable and accurate, so that they are well accepted by healthcare professionals. Hence, interpretability is one of the premises of this doctoral thesis.
First, a literature review on the use of machine learning applied to colorectal cancer was performed, from which a taxonomy of existing works in the literature was obtained. Then, a class association rule extraction model was developed with the intention of better understanding why some patients might suffer complications after surgery or recurrences of their cancer. Next, a feature and instance selection model was developed, based on evolutionary algorithms, in order to induce better classification trees. Finally, a grammatical evolution algorithm was developed to induce a large variety of classification trees as accurate as those obtained by the well-known C4.5 and CART methods, as a basis for the extraction of new interesting and tractable descriptions of the cases of interest.
The following aspects summarise the most relevant results of the different chapters of the thesis, and the conclusions obtained:
- A taxonomy of the existing works in the literature on the application of machine learning techniques to colorectal cancer data.
- A methodology for obtaining interpretable and tractable descriptions of colorectal cancer cases that resulted in complications and/or recurrences.
- An evolutionary algorithm model for feature and instance selection for the induction of high quality classification trees.
- A grammar-driven genetic programming algorithm for the induction of a large variety of classification trees as accurate as those obtained by the well-known C4.5 and CART methods, as a basis for the extraction of new interesting and tractable descriptions of the cases of interest.
FUNDS
The development of this thesis is being supported by:
- Spanish Ministry of Science and Technology TIN2017-83445-P and FEDER funds.
- Spanish Ministry of Science and Technology PID2020-115832GB-I00.
PUBLICATIONS ASSOCIATED WITH THIS THESIS
INTERNATIONAL JOURNALS
- J.A. Delgado‐Osuna, C. García‐Martínez, J. Gómez‐Barbadillo, and S. Ventura. Heuristics for interesting class association rule mining a colorectal cancer database. Information Processing and Management, 57(3), 2020.
INTERNATIONAL CONFERENCES
- J.A. Delgado-Osuna, C. García-Martínez, S. Ventura, and J. Gómez Barbadillo. Obtaining tractable and interpretable descriptions for cases with complications from a colorectal cancer database. In proceedings of 32nd IEEE International Symposium on Computer-Based Medical Systems., pp. 459-464, 2019.
- J. A. Delgado Osuna, C. García Martinez and S. Ventura. Smart Operators for Inducing Colorectal Cancer Classification Trees with PonyGE2 Grammatical Evolution Python Package. In 2022 IEEE Congress on Evolutionary Computation (CEC), pages 1 9, 2022.