Research by AI Center reaches Top 1 Benchmark on Microsoft CodeXGLUE

The AI ​​Center research team’s automatic code annotation model has just reached Top 1 Benchmark on Microsoft CodeXGLUE. CodeXGLUE is a famous benchmark created by Microsoft Research that focuses on creating artificial intelligence for code intelligence problems.

Code intelligence is applied in a number of tasks such as automatically writing code based on natural language input, translating from one programming language to another, and automatically correcting faulty code. Problems on mathematical code intelligence is considered as the “Third wave of AI” and has recently been of great interest at major companies such as Microsoft, Google, Salesforce and Meta, because of its potential to change the entire software programming industry (Software 2.0). Towards that end, a benchmark is needed for research teams to evaluate the performance of their method. If GLUE is often seen as the benchmark of the NLP field (gluebenchmark.com), then CodeXGLUE is the benchmark of code intelligence. CodeXGLUE has been and is being used a lot as a benchmark for code intelligence papers published at leading conferences in Computer Science such as ICLR, NeurIPS, ICSE, and ACL.

A programmer’s job every day is to write code. Writing comments to explain code is a very time consuming process. There are a lot of pieces of code in reality that have very bad comments or even no comments for complex code, which leads to a lot of difficulties in the programmer’s process of reading and understanding the code. That reduces the efficiency of the code and their work performance. Recognizing this problem, a lot of research has been done at famous universities and companies such as UC Davis, UCLA, Columbia University, Carnegie Mellon University, Microsoft Research and Salesforce Research. At FSOFT AI Center, the AI4Code research team is also focusing on the automatic code summarization, which is also part of the CodeXGLUE benchmark.

The AI4Code research team proposes to use the Knowledge Distillation method – a method belonging to the Transfer Learning family. The algorithm derives its main idea from the human learning process where knowledge is transferred from a more knowledgeable instructor to a less knowledgeable learner. This method has been widely used in image processing and natural language processing. However, to use this method for code intelligence problems, it is necessary to convey the information that only the codes have such as variable names, method names, and the abstract syntax tree. The team proposed using this information to pass information from large pretrained models to smaller models, resulting in the small models learning better and giving much better results than the baselines of the models from other research groups.

Talking about the project development plan, Mr. Bui Duy Quoc Nghi – a representative of the author group, shared that “The research team will continue to improve this technique and publish the research work at leading scientific conferences. In the future, the group also aims to solve other problems such as code translation, code fixing, and code synthesis.

>> The results can be seen here HERE

>> Learn more about CodeXGLUE HERE

source AIC

Tags
Show More

Leave a Reply

Your email address will not be published.

Related Articles

Close