๐Ÿ”— Building a RAG-Enabled Assistant with OpenWebUI: Integrating Local and Cloud LLMs via API

Date Created: 2025-05-05
By: 16BitMiker
[ BACK.. ]

OpenWebUI offers a powerful and intuitive interface for interacting with large language models (LLMs), whether you're running them locally via Ollama or accessing cloud-based services like OpenAI. But what really sets it apart is its extensibility: youโ€™re not limited to the browser UI. OpenWebUI provides a robust API for customizing assistants, integrating tools, and enabling Retrieval-Augmented Generation (RAG).

In this post, weโ€™ll walk through how to build a RAG-enabled assistant in OpenWebUI using both local and cloud models. Then, weโ€™ll show how to access it programmatically using the OpenAI-compatible API.

Letโ€™s dive in. ๐ŸŠ

๐Ÿง  What Is a Custom Assistant?

In OpenWebUI, a โ€œcustom assistantโ€ is a user-defined configuration that wraps around a base model. Think of it as an intelligent persona with memory, tools, and specific knowledge sources.

Each assistant can include:

Once defined, the assistant can be used through the UI or via API.

๐Ÿ“‹ Step 1: Choose Your Model Backend

OpenWebUI is backend-agnostic โ€” you can connect it to local or cloud-hosted models.

๐Ÿ–ฅ๏ธ Local with Ollama

Ollama makes it easy to run models like LLaMA 3, Mistral, or Qwen entirely on your machine.

โœ… To get started:

OpenWebUI will auto-detect your local Ollama instance (typically available at http://localhost:11434) and make it available as a model backend.

๐Ÿ”๏ธ Ideal for: Air-gapped environments, privacy-conscious deployments, and tinkering.

โ˜๏ธ Cloud with OpenAI, Azure, or Custom APIs

If you want access to high-performance models like GPT-4 or Claude, you can connect OpenWebUI to any OpenAI-compatible API.

โœ… Example: Set up OpenAI

  1. Go to Workspace > Models

  2. Click + Create Model

  3. Choose โ€œOpenAI-Compatibleโ€

  4. Enter your endpoint (e.g., https://api.openai.com/v1) and your API key

OpenWebUI will treat this like any other backend, standardizing interactions.

๐Ÿ“ฆ This same approach works with self-hosted APIs like vLLM, LM Studio, or FastChat.

๐Ÿ“š Step 2: Upload Documents for RAG

RAG enhances your assistant by grounding its answers in your documents. OpenWebUI embeds these documents and indexes them for semantic retrieval.

โœ… To add a knowledge base:

  1. Go to Workspace > Knowledge

  2. Click + Create Knowledge Base

  3. Give it a name and description

  4. Upload PDFs, Markdown, or plain text files

  5. OpenWebUI will process them into embeddings using its vector store backend

๐Ÿ” These documents will be automatically referenced by the assistant during conversations.

๐Ÿงฐ Step 3: Create and Configure a Custom Assistant (Model)

Now that your model backend and knowledge base are ready, letโ€™s create a custom assistant.

  1. Navigate to Workspace > Models

  2. Click + Create a model

๐Ÿงฉ Configuration Options:

๐Ÿ”— Enable RAG:

๐Ÿ”Œ Enable Tools (Optional):

๐ŸŽ›๏ธ Advanced Settings:

โœ… Save your assistant. Itโ€™s now available in OpenWebUI and via API.

๐Ÿ”‘ Step 4: Generate an API Key

To call your assistant from code, youโ€™ll need an access token.

  1. Go to Settings > Account > API Keys

  2. Click โž• to generate a key

  3. Copy and store it securely

๐Ÿ” This API key will be used as a Bearer token in your HTTP requests.

โ–ถ๏ธ Step 5: Call the Assistant via API

OpenWebUI exposes an OpenAI-style endpoint:

๐Ÿ“Œ Notes:

๐Ÿ”„ Optional: Streaming Responses

For real-time interaction (e.g., live chat):

Use a client that supports Server-Sent Events (SSE) to process streamed tokens.

๐Ÿ“ˆ Great for dashboards, bots, and typing animations.

๐Ÿ› ๏ธ Example: Use OpenWebUI in Python

Hereโ€™s a simple way to query your assistant from Python:

๐Ÿ“ This works whether your assistant is backed by a local model (via Ollama) or a cloud model (like OpenAI's GPT).

๐Ÿ—‚๏ธ Use Cases

OpenWebUI + API opens up a wide range of possibilities:

๐Ÿ‘ฅ Internal Tools:

๐Ÿ”Œ System Integrations:

๐Ÿ”๏ธ Offline-First:

๐Ÿ”š Conclusion

OpenWebUI transforms how you interact with LLMsโ€”whether through its sleek UI or programmable API. By combining local and cloud models, integrating RAG, and enabling tool use, it becomes a powerful development environment for intelligent assistants.

With just a few steps, you can create domain-specific assistants that are fast, private, and deeply integrated into your workflows.

โœ… Whether you're building a chatbot, automating support, or querying internal documents, OpenWebUI provides the flexibility and control you need.

๐Ÿ“š Read More

Happy building! ๐Ÿง ๐Ÿ’ป