Many developers get goosebumps just thinking about the work it takes to convert code from one language to another. What if this task could be done automatically, in a few steps? This is the proposal of TransCoder, a language conversion model developed by Facebook researchers.
As you can imagine, the tool is based on artificial intelligence. The idea is to convert code in a high-level language – such as C++, Java and Python – into another with the lowest possible level of human supervision or intervention.
It’s difficult. Even for an experienced programmer, doing this work often requires patience and, as the main factor, good knowledge of the source and destination languages.
Transcompilers are tools that contribute to this task. Thanks to them, the new code does not need to be rewritten from scratch. But today, the advantages do not go much beyond this: it is still up to the developer to deal with syntax differences, library changes, or API adaptations, for example.
In Facebook’s own words, the TransCoder comes to be a “neural” transcompiler, that is, a transcompiler that uses machine learning to do all the dirty work.
The process starts with a pre-training that maps instructions from a code that is the same in the target and source languages. “Anchor points” common to many languages, examples of statements like “if or” while “and mathematical operators, serve as the basis for this work.
A “back-translation” process, that is, converting the code back to the first language, allows TransCoder to generate parallel data to be compared to the original ones. The differences found in this process reinforce the training.
The tool has been trained with more than 2.8 million open-source repositories available on GitHub. Tests were also carried out with 852 functions in C ++, Java and Python from GeeksforGeeks, a platform that gathers programming problems to be solved (GeeksforGeeks is great for training programming logic and related skills).
The results were exciting. When converting from C++ to Java, for example, TransCoder achieved 74.8% accuracy in the expected results; from Python to C++, 57.8%; from Java to C++, 91.6%.
For now, the tool works with C ++, Java, and Python, but Facebook researchers point out that TransCoder can be trained to work with almost any programming language.
The model was developed for academic purposes, but with the necessary improvements, it is possible that the TransCoder will find practical use: “our results suggest that many errors made by the model can be easily corrected with the addition of simple restrictions to the decoder to ensure that generated functions are syntactically correct ”, say the responsible.
You can learn more about Facebook TransCoder in this study (PDF).