Thesis MJalil – Knowledge Discovery and Intelligent Systems – KDIS

DEEP LEARNING MODELS WITH MULTIPLE INSTANCE DATA.

BASIC INFORMATION

Ph.D. Student: Mustafa A. Jalil Shafeea
Advisor: Sebastián Ventura
Started on: February 2021
Keywords: Deep learning, Multiple instance learning

THESIS PROPOSAL

The multiple-instance learning model is becoming increasingly important in machine learning. In the standard supervised learning task, we learn a classifier based on a training set of feature vectors, where each feature vector has an associated class label. In the Multiple Instance Learning (MIL) task we learn a classifier based on a training set of bags, where each bag contains multiple feature vectors (called instances in the MIL terminology). In this setting, each bag has an associated label, but we do not know the labels of the individual instances that conform the bag. Furthermore, not all the instances are necessarily relevant, i.e., there might be instances inside one bag that do not convey any information about its class, or that are more related to other classes of bags, providing confusing information. The goal of the learner is to generate a hypothesis to accurately predict the label of previously unseen bags. MIL deals with a dataset called a bag and only supervises the whole bag, but does not provide specific labels of instances in the bag.

Especially, in recent years, the formulation of this problem has attracted much attention from the research community. The data volume of the problem is exponential and requires a lot of labelling work. As a weakly supervised learning method, MIL can effectively reduce this burden because weak supervision is usually easier to obtain. MIL has been increasingly used in many other applications in the past 20 years, such as image and video classification, document classification, and sound classification. Motivated by the great effectiveness of MIL in real applications. MIL classification is not limited to assigning a single label to an instance or bag. Assigning multiple labels to bags are particularly relevant because they can contain instances that represent different concepts. Ranking can be performed at bag-level or instance-level. It is also possible to implement clustering tasks, including searching for clusters or structures in the set of an unlabeled bag. In some cases, clustering is performed in bag spaces using standard algorithms and set-based distance measures.

The main goal is the development of new deep multiple-instance models and their application to different real problems. The working hypothesis is that deep multiple-instance learning is an excellent methodology to reach this objective. More specifically, the following objectives are detailed:

A systematic organization or review of deep learning models using multi-instance (MI) data.
Getting a better insight about the convenience of using the multi-instance representation instead of the standard single-instance.
Develop new deep learning models for multiple-instance learning (MIL) competitive with state-of the-art.
Apply these models to real-world problems (e.g. medical images).