PROPOSITIONALIZATION METHODS IN THE SOLUTION OF MULTI-RELATIONAL DATA MINING PROBLEMS
BASIC INFORMATION
Ph.D. Student: Luis A. Quintero-Domínguez
Advisors: Carlos Morell, Sebastián Ventura
Defended on: September 2021
Keywords: propositionalization, multi-relational data mining, multiple-instance learning, grammar-guided genetic programming
THESIS PROPOSAL
A large amount of the data currently collected is stored in databases with a relational structure composed of several tables. This has led to the emergence of multi-relational data mining (MRDM), which focuses on the discovery of knowledge from relational data. There are two approaches to dealing with MRDM problems. The first is to learn directly from the relational data, which is computationally expensive because of the large search space they need to explore. Because of this, another approach has emerged, which involves transforming relational data into a simpler representation, thus reducing the search space. This transformation has two main advantages, the reduction of data complexity, which speeds up the learning process, and the possibility of using a wide range of learning algorithms that have proven their effectiveness in multiple problems. For this reason, propositionalization has become one of the mainstreams within MRDM. Propositionalization normally changes precision for efficiency. This is because, during the process of transforming relational data into a simpler representation, information loss can occur, affecting the accuracy of learning methods. This is the main drawback of this approach.
The main goal of this thesis is the development of new propositionalization methods that reduce the information loss to improve the accuracy of the learning algorithms.
The partial objectives are as follows:
- Design and develop a propositionalization method, that takes advantage of the potentialities of multi-instance representation, to reduce the information loss and thus improve the accuracy of the learning algorithms.
- Design and develop a propositionalization method based on grammar-guided genetic programming, that reduces the dimensionality of the obtained datasets and improves the accuracy of the learning algorithms.
FUNDS
The development of this thesis is being supported by:
- Spanish Ministry of Economy and the European Regional Development Fund, project TIN2017-83445-P.
- AUIP and the Council of Economy and Knowledge of the Andalusia Board, as sponsors of the Academic Mobility Scholarship Program of the AUIP.