Why Deploy LLM Chatbots with RAG?

In 1966, professor Joseph Weizenbaum at MIT came up with a curious concept: teaching a machine called Eliza to respond like a psychotherapist to user queries.

Eliza was powered by a set of rules for generating plausible text responses. Although this first chatbot was far from perfect, it inspired the idea of creating computer assistants for humans.

Fast-forward to 2010, when IBM unveiled Watson — a state-of-the-art artificial intelligence (AI) system capable of chatting like Bob Dylan thanks to advanced natural language processing (NLP) capabilities.

Heavily advertised and with a successful appearance on the quiz show Jeopardy!, IBM Watson soon became the epitome of artificial intelligence in the public mind.

However, IBM’s technology was also far from perfect. One of the first large-scale field tests of IBM Watson at the MD Anderson Cancer Center in Houston went badly wrong. Although the machine could scan billions of scientific journals, it wasn’t capable of effective document classification, making it hard for users to obtain specific information. The project was shelved several years into the making.

Watson’s machine learning algorithms struggled to effectively process unstructured data — handwritten notes, clinical images, and freehand doctors’ inputs — making it less effective than advertised for the most hyped use case: healthcare. The system was trained on unrepresentative data sets, was impossible to integrate with popular electronic health record (EHR) systems, and couldn’t adapt to local treatment protocols. In the best case, Watson made blatantly obvious recommendations. In the worst case, it proposed downright dangerous treatment options.

Despite all of the company’s expertise, IBM didn’t anticipate the problems that emerged when the model had to interact with the real world.

OpenAI came better prepared with ChatGPT, training its GPT-3.5 large language model (LLM) on an estimated 570 GB of publicly available data. Because ChatGPT “read” a good part of the internet, it was much better “educated,” delighting users with moderately accurate information on everything from popular dinosaurs to a recommended go-to-market strategy for a new SaaS application.

Compared to Watson, ChatGPT can better understand the context of users’ queries. But similarly to Watson, its intelligence is limited by its training data. Like Wikipedia, ChatGPT provides broad-stroke information to satisfy general curiosity. However, the OpenAI model lacks awareness of anything outside the scope of its training data. It doesn’t know what Lydia said in the last meeting or what your corporate policy says about performance bonuses.

But programming an LLM to know those things isn’t an impossible task. Most LLMs are open-source, meaning developers can access and customize pretrained models. Moreover, whether an LLM is open-source or proprietary, retrieval augmented generation (RAG) can provide a way for the LLM to interact with business data and become more context-aware.

Leverage RAG for safer, smarter LLM chatbot solutions

Start now with Intellias

How retrieval augmented generation builds on LLMs

Standard chatbots have bounded knowledge, which quickly gets out of date and can result in misleading outputs when the LLM makes guesses (hallucinates) due to missing information.

Back in 2020, researchers from Meta AI came up with a solution called retrieval augmented generation: a recipe for fine-tuning a model by combining its pretrained knowledge with knowledge gleaned by training on additional data from external sources.

When prompted, an LLM chatbot with RAG can retrieve extra information from a connected vector database — a data store containing high-dimensional vector data, typically derived from text, visuals, or audio and converted into mathematical vectors. When data is packaged in a vector format, the model can access new knowledge in a matter of seconds.

Vector databases also enable better semantic search, which aims to understand a search query’s intent and context. Such databases find the most semantically similar results, even when the exact keywords aren’t present. Graph databases are also well suited for semantic search tasks, especially when you want to map complex data relationships.

semantic search tasks

Source: Graft

Once a retrieval augmented generation chatbot fetches the required data, it uses the underlying LLM model to generate a relevant, up-to-date response.

LLM model to generate a relevant, up-to-date response

Benefits of using RAG for customizing LLM chatbots

With RAG, LLM chatbots get access to a designated treasure trove of information — be it your corporate SharePoint portal, Confluence, or a database with anonymized customer data.

Apart from giving the large language model access to the most accurate, context-specific data from connected sources, RAG also allows you to include cited sources and footnotes in generated responses so users can fact-check any claims.

Thanks to semantic search, an LLM chatbot with retrieval augmented generation also produces less ambiguous results, even with imperfect queries, plus delivers a few more benefits.

Hyper-relevant results

Retrieval augmented generation provides a pretrained LLM with extra knowledge, enabling faster corporate data processing. Instead of querying an analytics database or perusing data storage services, users can quickly retrieve the information they need via a text query. With retrieval augmented generation, users receive hyper-relevant results thanks to technologies that allow for injecting an external knowledge base and document-level context into queries.

RAG is adaptable to virtually any industry as long as the training data can be packaged into a vector format. For example, Morgan Stanley prompt-tuned OpenAI’s GPT-4 model on a set of 100,000 corporate documents containing investment, business, and process knowledge to supply its workforce with hyper-relevant answers. The company also managed to address some of the hallucination problems GPT-4 models have during long conversations by applying fine-tuning techniques, limiting prompt topics, and embedding accuracy checks.

Generative AI tools enable contextual search and make information retrieval conversational. Employees can use these tools to easily summarize a vast body of data into digestible paragraphs to solve problems faster, uncover new insights, and deliver better business outcomes. In healthcare, the public sector, and education, generative AI could have a $480 billion productivity benefit in the short term.

generative AI could have a $480 billion productivity benefit

Source: McKinsey — Gen AI’s productivity possibilities.

Cost-efficiency

Training a custom LLM is a resource-intensive task. You need massive computing power, human expertise, and terabytes of training data. Creating an RAG-based chatbot requires a fraction of the above efforts.

Instead of training the model from scratch, you augment its knowledge with extra insights. Investment research company Morningstar fine-tuned the GPT-3.5 model on vector embeddings from its investment research database to create a generative AI assistant called Mo for its financial advisors. Mo can synthesize investing insights, automate simple tasks, and provide in-software help for company users.

According to Forbes Contributor Tom Davenport, Rhodes (Morningstar’s CTO) has said that thus far Mo has answered over 25,000 questions and the average cost per question answered is an astounding $0.002. The total expense devoted to Mo thus far — not counting the compensation of its creators — is $3000.

Over one month, Mo handled over 25,000 questions, with the average cost per answer being just $0.002. The total expense on generative AI assistant is about $3,000 per month, not counting the compensation of its creators.

Easy implementation

RAG can be used with various LLM architectures and is relatively easy to implement. Researchers associated with Facebook AI Research, University College London, and New York University have proposed a variant of RAG containing just five lines of code that can support simple use cases.

RAG containing

Source: Hugging Face — RAG Token.

On the downside, RAG requires a mature data management culture. Effectively, you’ll have to transform a large body of knowledge (documents, images, etc.) into vector embeddings. Likewise, you’ll have to ensure low system latency and optimized performance to support a high number of concurrent users.

New open-source and commercial frameworks are emerging every day to facilitate the development of production-ready generative AI applications. Open-source frameworks like LangChain and LlamaIndex help connect external data sources to LLMs. Azure OpenAI Service provides API-based access to some of the best foundation models, plus optimized infrastructure for model hosting and training. AWS Bedrock also offers API-based access to foundation models from companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon (of course!). Superpowered.ai offers an API for retrieval augmented generation that is optimized for legal, financial, and educational use cases.

Flexible use cases

RAG enables LLM chatbot customization for virtually any domain — from finance to telecom — and for any business function: sales, marketing, HR, procurement, you name it. Effectively, you can use RAG to create custom virtual assistants for any use case involving text summarization, clustering, or classification.

At Intellias, we’re also supercharging a GPT model to handle a wide range of knowledge-intensive tasks, ranging from providing general information on organizational policies to accessing specific sales, training, or HR knowledge. Our sales team can use AI to gain insights into our portfolio and projects to better close the next deal.

Google used RAG to improve its Med-PaLM2 — an improved model for medical knowledge processing. The company also recently released GenAI Databases Retrieval App — a collection of techniques for infusing extra information from Google Cloud databases into generative AI apps.

Microsoft also relies on RAG for its growing series of Copilots for Sales, Microsoft 365, and Security. Users can also apply RAG for Azure AI Search projects. IBM’s so-far most successful AI service, Watsonx, relies on various RAG techniques.

We’ve built a virtual chatbot for efficient product training that helps to boost sales

Use cases of an LLM Chatbot with RAG

An enterprise uses 991 applications on average. IT budgets increase year over year, but the lack of interoperability and mounting number of data silos diminishes the productivity gains. Most companies need better knowledge management and dissemination practices. A retrieval augmented generation chatbot can deliver just that for an array of text processing tasks.

Corporate knowledge management

Looking up information in emails, presentations, reports, and meeting notes is a solid use case for an AI chatbot with RAG. Retrieval augmented generation enables efficient document querying through a conversational interface. Instead of opening dozens of SharePoint pages or multiple tabs in a virtual cloud storage app, users can get summarized answers from a virtual assistant.

Almost half (47%) of digital workers struggle to find the information they need to effectively perform their jobs, and 66% wish that their IT departments provided “universally accepted and supported applications or devices” to get work done according to a Gartner survey.

LLMs powered by RAG effectively deliver just that. For example, Microsoft Sales Copilot can quickly provide information about customers or opportunities from a connected CRM system. Dust lets you fetch and process data from Notion, Slack, GitHub, and Google Drive.

Employee onboarding

Getting up to speed in a new job can be tough. Between signing a heap of payroll-related documents and getting acquainted with organizational policies, new hires struggle to cope. Over 80% of employees say they’re overwhelmed with information throughout the onboarding process and lose about one full workday per week searching for the right information.

A corporate chatbot can facilitate knowledge discovery, giving instant replies to questions a new hire may find too silly to ask a new colleague or the ever-busy HR manager. Likewise, employees can use a corporate chatbot to save time in searching for the right policy document when questioning the correctness of their actions (when no immediate human help is in sight). This can eliminate otherwise inevitable first-day mistakes for new hires, shorten their time to productivity, and save senior staff from answering routine questions instead of discussing more important problems.

Intellias has recently helped a multi-national company launch a GPT-powered assistant for employee skill assessments. The model analyzes internal and external data sources, such as SharePoint and LinkedIn, to get a 360-degree view of employees’ competency levels. The tool automatically creates detailed skill profiles and suggests areas for extra professional development.

CloudApper has used the GPT model to create an HR chatbot that provides contextually aware self-service to employees. The app can provide up-to-date information on company policies and various HR procedures, plus answer specific questions about vacation days and sick leave or other benefits.

Customer support

A stellar customer experience (CX) lowers customer service costs and increases the future purchase intent. Yet, offering impeccable support across multiple channels is expensive. Current advances in NLP and generative AI can reduce customer service costs by 30% by answering 80% of customer queries.

Apart from providing contextual replies to common questions like Where’s my package? and What does the warranty cover? RAG also makes it easier to identify flaws in conversations and re-route more complex problems to human agents.

LLMs with RAG can also empower employees with better knowledge, aiding them with issue resolution. Chima, for example, has created a customizable generative AI platform that can connect to different customer databases and support platforms to analyze historical interactions and provide accurate suggestions. Gusto, an online payroll and HR platform, has used Chima to completely overhaul its support function. The chatbot uses current interaction context and historical data to provide users with responses in text, audio, and video formats. Agents also receive rich insights for problem-solving based on corporate data. To ensure compliance, the system applies masking to sensitive customer data and uses industry-standard security practices.

Clinical decision support

While IBM Watson was a flop for healthcare, newer generative AI systems have shown better results. They can handle a wider range of data formats, adapt to different clinical protocols, and produce more reliable results.

Harman, a Samsung subsidiary, has been testing a HealthGPT model for analyzing clinical trial data. Using an automated framework for fine-tuning and output validation, the team has produced a system capable of providing rich, context-aware clinical insights from cohort studies, screening trials, and other types of observational studies.

The National University Health System (NUHS) in Singapore has also trained an advanced LLM model. It can summarize patient case notes, write referral letters, and provide healthcare professionals with information related to medical conditions and clinical practice guidelines.

Apart from providing clinical decision support, LLM models can handle a wide range of back-office healthcare processes. Claims processing, health insurance authorizations, and resolution of claims denials are time-consuming, data-intensive processes that can be streamlined with a chatbot using retrieval augmented generation. According to McKinsey, “Gen AI represents a meaningful new tool that can help unlock a piece of the unrealized $1 trillion of improvement potential present in the [healthcare] industry.”

Regulatory compliance

New data privacy laws, ESG disclosures, and marketing policy updates — every organization has a mounting regulatory burden. According to the Thomson Reuters 2023 Cost of Compliance Report, 73% of compliance professionals expect the volume of regulatory information to increase in the coming year. This, in turn, drives up the cost of compliance.

Generative AI proves to be an effective assistant for tasks such as monitoring the regulatory landscape, assessing the its impact, and making prescriptive implementation recommendations. Thanks to RAG, LLM chatbots can cite facts and sources, reducing the effort required of compliance managers. The models can also summarize complex regulatory language, compare different regulations across countries, and check them against current internal policies and procedures.

The Austrian legal publisher Manz, for example, fine-tuned an open-source language model (BERT) to better process legal documents. The system can now query the publisher’s database of 3 million documents with high accuracy. It can re-surface relevant case law documents and find up to 30 different facets to a legal problem with high accuracy based on one query.

Corlytics, a risk intelligence platform, has been running internal tests to evaluate pretrained generative AI compliance models. According to the company’s CEO John Byrne, “when it comes to summarizing complex regulatory documents and rules, the off-the-shelf accuracy is below 20%.” In contrast, Corlytics’ model produced up to 85% accuracy during the first runs. With further fact-checking and feedback from lawyers, the model accuracy improved to 99%.

Deal management

The average sales cycle length across industriesa is 102 days — and a lot of knowledge work happens during that time. Sales teams need to identify requirements, request and fill in proposal templates, create contracts, check in with the legal team, and do myriad other text-related tasks. GenAI can add efficiencies at every step of the sales cycle, helping teams close new opportunities faster.

Robin AI has developed a copilot for legal contracts, offering 85% faster contract reviews through contextual recommendations and faster information lookup. Dialect has built a GenAI assistant for auto-filling RFPs, RFIs, DDQs, and other important questionnaires using data from connected corporate systems.

Eilla, in turn, promises to bring extra speed to M&A and VC workflows by researching, aggregating, and analyzing information from key industry sources and internal data to perform better due diligence. Govdash wants to help companies liaising with federal customers to create more relevant solicitation documentation by automatically identifying requirements and evaluation factors and then returning relevant customer information.

RAG brings a new degree of intelligence to companies with LLMs

The above use cases just scratch the surface of the possibilities for a corporate retrieval augmented generation chatbot. Because of the relatively low implementation cost and complexity, RAG-based models can be deployed for a wide range of text processing tasks across all business functions. Full data security and privacy are guaranteed, as the model is deployed on private cloud infrastructure, limiting exposure of data to third parties. High auditability and a greater degree of control over output generation also make it easier to spot inconsistencies or biases early. If you’re looking for a way to convert the massive potential of generative AI into measurable business profits, RAG may be the answer.

Intellias helps global businesses implement and productize AI applications. From automotive to telecom, we help clients deploy high-performance, responsible, and secure generative AI solutions. Contact us for a personalized consultation.

Deploy Your LLM Chatbot with Retrieval Augmented Generation (RAG)