Page 67 - SMILESENG
P. 67

Intl. Summer School on Search- and Machine Learning-based Software Engineering
 Type4Py: Machine Learning-based Type Auto-completion for Python
Amir M. Mir Department of Software Technology Delft University of Technology Delft, The Netherlands Email: s.a.m.mir@tudelft.nl
Sebastian Proksch Department of Software Technology Delft University of Technology Delft, The Netherlands Email: s.proksch@tudelft.nl
Georgios Gousios Department of Software Technology Delft University of Technology Delft, The Netherlands Email: g.gousios@tudelft.nl
Abstract—In this short report, we present Type4Py, a tool that assists Python developers to retrofit type annotations to their codebases. It is powered by a deep learning model that is trained on 5.2K open-source Python projects. Type4Py is available as a Visual Studio Code extension.
I. INTRODUCTION
The Python programming language is extremely popular nowadays among software developers as it is easy to use and allows rapid prototyping. The IEEE Spectrum ranks Python as the most popular programming language in 20211. Despite its popularity, Python lacks static types, which causes type errors and unexpected run-time exceptions. To mitigate these issues, Python 3.5 added support for optional type annotations. This means that Python developers can gradually add type annotations to their existing codebases. However, this is a daunting and error-prone task.
Researchers have recently proposed machine learning (ML)- based type prediction models for dynamic programming lan- guages [1], [2]. These ML-based type prediction models perform sophisticated feature extraction and computationally expensive analysis such as control/data flow analysis or search- based validation, which makes them impractical to be used in IDEs by developers. Motivated by this, very recently, we proposed Type4Py [3], a deep similarity learning (DSL)-based type inference model for Python. Specifically, the Type4Py model is based on hierarchical neural networks, which learns to discriminate between similar and dissimilar types in a high- dimensional space, namely, type clusters. Given this, K-nearest neighbor search is performed to suggest type annotations for a test query.
Type4Py has mainly two advantages compared to the recent state-of-the-art approaches, TypeWriter [1] and Typilus [2]:
1) Its mean reciprocal rank (MRR) score is 77.1%, which is 8.1% and 16.7% higher than Typilus and TypeWriter, respectively. For example, considering a list of 10 pre- dictions, a higher MRR score means that the model predicts a correct type annotation among the first few predictions in the list.
2) It is practical and can be used as a developer tool, i.e., it can be used in an IDE, Visual Studio Code, to assist
1 https://spectrum.ieee.org/top-programming-languages/
Python developers to gradually adding type annotations to their existing codebases.
In this short report, we present Type4Py as a developer tool. It was initially released in July 2021 and its Visual Studio extension2 has over 1,100 installations at the time of this writing.
55
II. TYPE4PY Type4Py has the following main features:
• It is powered by a DSL-based model that is trained on the ManyTypes4Py dataset [4] with 5.2K open-source Python projects.
• It provides ML-based type auto-completion functionality in Visual Studio Code. Figure 1 shows one usage example from VS Code.
• It can predict type annotations for variables, functions’ arguments, and return values.
• It has a local model that runs end-to-end locally on users’ machines. This solves common privacy concerns by developers such as sharing or sending their source codes to external servers.
B. Design
The design of Type4Py is shown in Figure 2. At the client- side, to provide type auto-completion, the VS Code extension sends Python source code file(s) to either the local model or the production server. At the server-side, a tiny REST API with a prediction endpoint queries the pre-trained Type4Py model. For the model to predict types, Python files are processed to extract features/type hints (see [3] for details). Finally, the predicted type information is returned as a JSON response to the extension.
C. Implementation
The VS code extension is written in TypeScript3 and the Type4Py model is implemented in Python 3 using the PyTorch framework4. Specifically, The REST API is implemented
2 https://marketplace.visualstudio.com/items?itemName=saltud.type4py 3 https://github.com/saltudelft/type4py-vscode-ext
4 https://github.com/saltudelft/type4py
A. Core Features







































































   65   66   67   68   69