RAG Architecture
A local-first retrieval-augmented generation pipeline with semantic search and grounded responses.
Request Flow
User
Question
React
ChatWidget
FastAPI
Orchestration
Weaviate
Vector Search
Ollama
LLM Generate
RAG Pipeline Steps
User Query
User asks a question about their career through the React chat widget.
API Routing
FastAPI receives the POST request and routes to the ChatService.
Semantic Search
Weaviate performs near_text() search across all 4 collections using nomic-embed-text embeddings.
Context Assembly
Top matches from each collection are assembled into a JSON-formatted BACKGROUND section.
LLM Generation
Ollama llama3.2 generates a natural language response grounded in the retrieved context.
Response Delivery
Answer and source attribution returned to the user through the chat interface.
Project Structure
# RAG Backend Structure
rag/
api/
server.py # FastAPI app + endpoints
models.py # Pydantic schemas
services/
chat_service.py # RAG orchestration
clients/
weaviate_client.py # Vector DB wrapper
ollama_client.py # LLM client wrapper
core/
settings.py # Configuration
schema/
schema.py # Collection definitions
ingest/
ingest_jobs_from_json.py # Data loader
candidate_data.json # Career data
Core Components
Weaviate
v1.34.4
- text2vec-ollama module for embeddings
- Multi-collection semantic search
- Persistent volumes in Docker
Ollama
Local LLM
- llama3.2 for response generation
- nomic-embed-text for embeddings
- Runs locally on port 11434
FastAPI
Backend
- POST /api/chat endpoint
- OpenAPI/Swagger at /docs
- CORS configured for React dev
React Frontend
v19 + TypeScript
- Floating ChatWidget component
- Vite 7 for fast HMR builds
- Animations and responsive design
Running Locally
Start Ollama
ollama run llama3.2Start Weaviate
docker-compose up -dIngest Data
python ingest/ingest_jobs.pyRun API
uvicorn api.server:app