Generative models for business

In this blog I will explain how generative language models can be used to create new products. My goal for this blog is to give you a framework for using generative models to improve any process. By the end of this blog, you should have a clear understanding of how to use these models for your purposes.

GPT for business

The boom in ChatGPT has shown that the next word prediction approach can closely approximate the human thought process. We can expect to see many ChatGPT-based products this year. (While ChatGPT is used here as an example, there are many other LLMs, some of which have a big advantage for business use - they are open source).

In practice, you can create new products using generative models, so let's show how to do it using an example "sandbox problem": We want to build a product AnswerGPT.com - a service that can answer any question using Google.com.

The algorithm for the model will be:

  1. Input the user's question

  2. Generate a query to the Google search engine based on the user's question

  3. Summarise the top 10 documents

  4. Return the answer to the user.

There are several ways to build this service. This blog will cover the simplest MVP implementation and a practical state-of-the-art implementation. For the sake of brevity, let's assume that we already have a competent summarisation model and focus only on the query generator. However, the same principles apply to building the summarisation model.

One of the prerequisites is a strong base generative LLM. Fortunately, there are many open source models available, and they are improving every day.

MVP

Lets just use some available for your generation model. Simple as that, a decent LLM prompt for query generation will work.

Practical SOTA

What if we could improve our search query generator model to give us an advantage in the market of SearchTooledGPT services? Reinforcement learning on human feedback (RLHF) offers a solution. By training a reward model on pairs of answers, we can optimize the search query generator model to generate better queries. Here's how the pipeline would work:

  1. The generator produces a couple of search queries.

  2. For each query, we call Google search and summarize the results.

  3. We rank the resulting summarizations with the reward model.

  4. We optimize the policy against the reward model (finetune or ptune).

To simplify things, we can assume that the reward model is a BERT-like language model.

The core part of this process is the reward model training. To build a better product, we need to focus on improving the reward model for the RLHF algorithm, which in turn will improve the search query generator model.

Reward model training

Product design depends on your target audience and the products available on the market. Research is therefore crucial. In order to design a product, we need to define the core principles for the output of our product. Let's start with: Informativeness, Truthfulness, Clarity and Simplicity. These principles will be used to rank different outputs.

Training pool creation

First, we need to collect a sample of questions that are representative of your service's audience. It would be best to include specific questions related to tax optimisation, legal details of divorce, or other relevant topics.

For each question in the sample, we should generate 10 query variants and summarise the top 10 for each. This will give us a sample of queries and 10 different generated service outputs that we can rank.

Human feedback

In this problem, the ranking process can be performed either by humans (if the amount of data is not too large, you can do it yourself) or, preferably, by experts.

The accuracy of the labelers is the key factor here. Nowadays, models can generalize very well on small datasets, but if there is random noise (or, even worse, non-random noise) in the training pool, your final product may not be better than your competitors.

Regardless, you should create a concise and clear ranking instruction, even if you're the one doing the labeling. To get the best quality out of the human labeling process and tips on creating instructions, check out this blog post: https://dev.to/muradmazitov/human-feedback-in-ml-or-how-to-collect-data-for-your-own-chatgpt-1i41.

Getting to best answers

A common practical solution here is pairwise ranking, which is a simple and highly intuitive process for ranking two different answers.

In addition to ranking, it would be useful to have another process where the task is to write the best Google query for these answers, based on what your generator has already provided. Include summarisation for this human-written query in the ranking task, but not for the same labeler, as this could introduce some bias.

So we end up with two labeling tasks: the "rank two answers" task and the "write the best Google query" task.

Launch

Run the RLHF with the resulting reward model and launch the product! To continuously improve the quality of your product over time, keep running labeling tasks on new user requests.

Good luck with your future launches!