If the conversational skills of ChatGPT have left you speechless, it is time to dig deeper into how GPT models work. Through the use of the library transformers, we will discover how to use an open-source generative model to create original text starting from any prompt!
Generative Artificial Intelligence (or generative AI) refers to a class of deep learning algorithms that are specialized in generating original data from a pre-existing dataset. These algorithms allow the generation of very varied outputs, such as audio-visual sequences, text and images, from input data of different kinds. The operation of generative AI models is based on modeling the input data as extremely complex distributions of variables that generative models are able to reproduce indistinguishably from the input source. The list of generative AI models released only during 2022 is astonishing: to name a few, DALL-E 2 and Midjourney in the area of text-to-image models (i.e., in generating images from textual prompts), Make-A-Video among text-to-video models, and ChatGPT in the area of text-to-text models. Unfortunately, the aforementioned models are only usable through credits issued by the vendor companies, and thus it is not always possible to make unconstrained use of them.
This talk aims to provide the tools to be able to understand and use so-called text-to-text models, i.e., a specific class of generative AI models that are capable of generating text from a textual prompt. The huge interest in text-to-text models is not only due to the fact that they underlie many technologies we use every day, from autocompletion to automated chatbots, but also because they are among the models that have had the most amazing evolution in recent years. Thanks to platforms such as Hugging Face, we can now download generative models to our laptop, use them with just a few lines of code, and train them to perform tasks even more specific than those for which they were created. A concrete example of text-to-text models is GPT-2, developed by OpenAI and accessible for free from the HuggingFace platform. Through a transformer-based architecture, GPT models have achieved performance unimaginable until a few years ago that makes human interaction indistinguishable from that of a chatbot developed from it.
Starting with a brief historical overview of text-to-text generative models, in the first part of the talk we will delve into how to use the transformers library and the open-source version of the GPT-2 model provided by HuggingFace to generate text in just a few lines of code and to improve the quality of the text over the default generated output. Next, we will see how to train a GPT template for a specific generation task. This process, called finetuning, is extremely common for those who need to improve the performance of a model to more faithfully generate specific textual outputs. To this end, it is possible to train the starting model from a suitable set of data and from the Trainer class provided by the transformers library. In the second part of the talk, we will discuss how to optimize a finetuning process through the parameters of the Trainer. Through some appropriate choice of the parameters, we will investigate how to use more efficiently our available memory and GPU when training a GPT-2 model. In the end, we will see together what are the surprising results obtained by a properly trained GPT-2 model.
Dopo un percorso universitario da fisico sperimentale all’università di Pisa, ho conseguito un dottorato in Data Science presso la Scuola Normale Superiore. Nell’ambito della mia formazione, ho passato periodi di ricerca al Fermilab di Chicago e al CERN di Ginevra. Attualmente mi occupo di Natural Language Processing all’interno di AIKnowYou nell’ambito dell’analisi di conversazioni da customer care e dell’automazione di chatbot intelligenti.