Share this article
Modern Large Language Models (LLMs) are impressive, but they have a major limitation: their knowledge is fixed in their weights, making it difficult to update and extend their knowledge. Retrieval-Augmented Generation (RAG) is an approach designed to address this problem. Introduced by Meta in 2020, it connects a language model to an external knowledge base (for example, a set of documents) so that it can incorporate up-to-date and specific information into its responses. In practice, for each question asked, the RAG system first extracts relevant content from its document base, then generates a response by combining this retrieved context with the linguistic capabilities of the LLM.
Note: The complete source code for the example project mentioned in this article is available on GitHub.
What is RAG and why use it?
Architecture of a RAG system
Practical implementation with TypeScript
Code analysis and best practices
Advantages of the technical stack
Going further
Retrieval-Augmented Generation (RAG) literally means "generation augmented by retrieval." The idea is to separate knowledge from the model. Instead of trying to incorporate all information into the parameters of an LLM (through costly fine-tuning) or designing a classical model that would predict responses from data, we let the main model generate text and augment it with an intermediate step of information retrieval. A typical RAG pipeline works as follows:
This process allows the model to rely on specific external knowledge at the time of generation, without having to permanently memorize it. This can be compared to a human who, faced with a question, would consult books or reference documents before answering: the LLM "searches its library" before speaking.
The RAG approach is particularly useful whenever a conversational assistant needs to handle an evolving or voluminous knowledge base. Here are some examples of concrete use cases where RAG excels compared to classical methods:
Documentary chatbots: An assistant powered by a company's technical documentation, capable of answering questions from developers or customers by drawing directly from manuals, internal knowledge bases, or even source code. For example, the model can be connected to API specifications or open-source project code to explain how a function works or the reason for a certain design.
Dynamic FAQs: In a customer support context, a RAG chatbot can answer common questions (FAQs) based on the latest policies or product data. If a policy (e.g., return conditions) changes, you only need to update the reference document and the bot will take it into account instantly, without requiring retraining. This results in always up-to-date FAQs, with the ability to provide the source of information to support the answer.
Legal assistants: An assistant can help lawyers or legal professionals by finding relevant passages in a database of laws, case law, or contracts for a given question, then formulating the answer in natural language. The model doesn't need to know the entire Civil Code by heart; it just needs to look up the appropriate articles. The same applies to a medical assistant, which could query databases of scientific publications or medical protocols to provide answers based on the latest clinical knowledge.
Programming assistant: This is the case of our example project – an assistant that knows the content of a code repository and can answer questions about this code (architecture, role of a module, potential bugs, etc.). Rather than training a specialized programming model, we use a generalist LLM augmented by searching for relevant code files in the repository.
A complete RAG system typically includes the following components:
Indexing and storage
Query pipeline
Generation and post-processing
typescript
For our implementation, we've chosen a modern and performant stack:
This combination offers an excellent balance between performance, ease of development, and flexibility.
Let's start by initializing our project:
bash
typescript
Indexing is a crucial step in a RAG system. It involves transforming raw documents into appropriately sized chunks, then generating embeddings for each chunk.
typescript
typescript
typescript
Splitting documents into chunks is a critical step that directly influences the quality of results. Some best practices:
The quality of semantic search is essential:
Prompt construction is an art that strongly influences the quality of responses:
typescript
Bun offers significant advantages for this type of application:
LangChain greatly facilitates the development of LLM-based applications:
Ollama allows running language models locally with great flexibility:
Qdrant is a modern vector database designed for semantic search:
To measure the quality of a RAG system:
Retrieval-Augmented Generation represents a major advance in how we can leverage language models for specific use cases. By separating knowledge from the generation model, RAG enables the creation of AI assistants that are more accurate, more up-to-date, and more transparent.
Our implementation with TypeScript, Bun, LangChain, Ollama, and Qdrant demonstrates that it is now possible to build performant RAG systems with modern and accessible technologies. This approach paves the way for a new generation of AI assistants capable of reasoning on specific knowledge bases while maintaining the fluidity and coherence of large language models.
Feel free to explore the complete source code on GitHub and adapt it to your own use cases. RAG is an evolving technology, and there are numerous opportunities for innovation in this exciting field.
Sébastien TIMONER
Expert in web development and team management, I specialize in creating and optimizing high-performance digital solutions. With extensive expertise in modern technologies like React.js, Node.js, TypeScript, Symfony, and Zephyr OS for IoT, I ensure the success of complex SaaS and IoT projects, from design to production, for companies across various sectors, at offroadLabs.
At offroadLabs, I offer custom development services that combine technical expertise with a collaborative approach. Whether creating an innovative SaaS solution, developing IoT systems with Zephyr OS, modernizing an existing application, or supporting the upskilling of a team, I am committed to delivering robust and high-performance solutions tailored to the specific needs of each project.
I am available for projects in the Aix-en-Provence area or fully remote.