Large language models based on artificial intelligence (AI), such as GPT-3, are successfully teaching themselves new things. Researchers at the Swiss Federal Institute of Technology in Zurich (ETH) and Google may now have discovered the key mechanism behind this capability. ETH reports on this in an interview with doctoral student Johannes von Oswald, who researches learning algorithms for neural networks. His research paper will be presented at the International Conference on Machine Learning in Hawaii at the end of July.
von Oswald explains that neural networks are generally seen as a black box: they spit out output when provided with input. According to the ETH, the inner workings of large language models such as the OpenAI GPT model family and Google Bard remain a mystery even to their developers. However, the team has documented “that transformers can learn on their own to implement algorithms within their architecture,” says von Oswald. “We were able to show that they can implement a classic and powerful machine learning algorithm.”
It is “surprising but true” that the model can teach itself a technique for in-context learning, “driven simply by the pressure to improve on its training objective, namely to predict the immediate future.” The research group has hypothesized “that the transformer architecture has an inductive bias towards learning. This means that its ability to develop these learning mechanisms is implicitly built into its basic design, even before the model is trained.” ce/mm