IMPROVING DATA SCIENCE USER’S EXPERIENCE WITH COMPUTATIONAL INTELLIGENCE
Cristóbal Romero
Eva L. Gibaja
Carlos García
Amelia Zafra
María Luque
Álvaro Espejo
Jose María Moyano
Aurora Ramírez
Jose María Luna
Robert Feldt
Philippe Fournier-Viger
Krzysztof Cios
Aurora Esteban
Antonio R. Moya
Rafael Barbudo
Eduardo Pérez
SUMMARY
The INTENSE (ImproviNg DaTa SciENce USErs Experience) project aims to develop new methods and techniques to improve the experience of Data Science users, meaning both professionals in this field and data consumers who make use of the available tools to solve their knowledge discovery problems. Our working hypothesis is that computational intelligence techniques (in our case, neural network models and evolutionary computation) can bring solutions to these challenges in an elegant, powerful, and flexible way.
The development of this project is organised into three well-differentiated modules. Firstly, proposals aimed at improving the experience of data consumers will be developed. These proposals are aimed at improving the ease of use of tools and the understanding of the extracted knowledge. Thus, this project will make progress in the automatic design of workflows through genetic programming techniques, the involvement of the user in the design of the algorithms itself through proposals that include interaction with the user in the fitness function, as well as proposals that improve the interpretability of the obtained models.
Secondly, proposals are made to enhance the experience of professionals in data science. These proposals focus on the improvement of existing algorithms and the addition of new capabilities such as their use under complex conditions or the development of models respecting the privacy of user data. In this way, work will be done on the development of proposals to address problems with few labelled data. Two problems that have become very popular recently (continuous learning and federated learning) will be also addressed, and new proposals for the optimisation of hyperparameters in deep learning models will be developed. Last but not least, algorithms for the extraction of sequences of events that meet a series of temporal constraints will be designed.
In contrast to the first two modules of the project, which are of a distinctly theoretical nature, the third module is related to addressing several real problems. Its aim is to verify that the proposals developed in modules 1 and 2 provide competitive solutions to the problems under study. Problems from the field of health and education, on which the project team has broad experience, will be addressed, as well as new problems such as predictive maintenance models applied to vehicle failure or the analysis of software repositories. The socioeconomic interest that all these fields of application currently arouse in our society is well known, as well as the effect that any small advance may have. In fact, several companies and institutions of different activity sectors have already shown their interest in the results that could be derived from this project.
Finally, to give visibility to the research results, we will publish the necessary information (algorithms and data) to encourage the replicability of our experimentation and the comparison of results. Furthermore, we will integrate the developed models into the currently most relevant software platforms and tools to facilitate their dissemination to a wider community.
RESEARCH LINES
- Automatic design and configuration of workflows in Data Science
- Putting the human-in-the-loop of the learning process
- Interpretable models in Data Science
- High-utility patterns
- Improving the hyper-parameter optimisation of deep learning models
- Lifelong learning / Continual learning
- Few labelled data
- Chronicle mining
- Federated learning
- Anomaly detection
- Predictive maintenance
- Early diagnosis of diseases
- Student performance prediction
- Software analytics