It’s not hard to see why code generation would be an attractive prospect for machine learning and language processing tools. After all, programming code is just a kind of well structured language. So much so that code language parsers have a lot in common with natural language processing tools.
Given this, it makes sense that machine learning tools that have already been brought to bear on the generation of human-readable text on arbitrary subjects should be quite capable in the area of code generation.
But what, then, are the expectations and challenges of such a thing? What is it about code that is distinctly open, or resistant, to machine learning? In order to answer these questions, we need to take a closer look at the nature of programming code and how it differs from other kinds of language.
One key difference between code and other kinds of language is its distinct levels of abstraction. The code that is developed by a programmer is usually not what is actually executed by the machine, which is actually called “machine code.” The language that the programmer uses, and that is parsed to machine code, is constructed partly for the convenience of the programmer, who requires some degree of legibility to understand what they’re programming. This higher level language also allows the programmer to express very complex lower-level operations efficiently and with features that implement higher-order concepts.
In contrast, natural languages do not arise with a particular level of abstraction in mind. They grow organically, and while there are ways to make them more or less abstract (through metaphor, for example), there is no real analogue to the way that programmers design a language specifically for machines. This difference in levels of abstraction means that code is more resistant to machine learning than natural language.
Another key difference between code and other languages has to do with the fact that code is a “closed system.” That is, there are well-defined rules for constructing valid programs in a particular language, and these rules are not open to interpretation in the same way that natural language is. This means that machine learning systems will have a much harder time “learning” to generate valid code, since they will need to be able to learn and follow the rules of the language in order to produce something that is actually executable.
So while it is certainly possible for machine learning and language processing tools to generate code, there are distinct challenges that need to be overcome in order to make this a reality. These challenges arise from the fact that code is more abstract and less open to interpretation than other kinds of language. Nevertheless, we believe that with enough effort, these challenges can eventually be overcome and machine-generated code will become a reality. Who knows, maybe one day programs will even be able to write themselves!