RAG Architecture
A local-first retrieval-augmented generation pipeline with semantic search and grounded responses.
Request Flow
User
Question
React
ChatWidget
FastAPI
Orchestration
Weaviate
Vector Search
Ollama
LLM Generate
User
Question
React
ChatWidget
FastAPI
Orchestration
Weaviate
Vector Search
Ollama
LLM Generate
RAG Pipeline Steps
User Query
User asks a question about their career through the React chat widget.
API Routing
FastAPI receives the POST request and routes to the ChatService.
Semantic Search
Weaviate performs near_text() search across all 4 collections using nomic-embed-text embeddings.
Context Assembly
Top matches from each collection are assembled into a JSON-formatted BACKGROUND section.
LLM Generation
Ollama llama3.2 generates a natural language response grounded in the retrieved context.
Response Delivery
Answer and source attribution returned to the user through the chat interface.
Project Structure
# RAG Backend Structure
rag/
api/
server.py # FastAPI app + endpoints
models.py # Pydantic schemas
services/
chat_service.py # RAG orchestration
clients/
weaviate_client.py # Vector DB wrapper
ollama_client.py # LLM client wrapper
core/
settings.py # Configuration
schema/
schema.py # Collection definitions
ingest/
ingest_jobs_from_json.py # Data loader
candidate_data.json # Career data
Core Components
Weaviate
v1.34.4
- text2vec-ollama module for embeddings
- Multi-collection semantic search
- Persistent volumes in Docker
Ollama
Local LLM
- llama3.2 for response generation
- nomic-embed-text for embeddings
- Runs locally on port 11434
FastAPI
Backend
- POST /api/chat endpoint
- OpenAPI/Swagger at /docs
- CORS configured for React dev
React Frontend
v19 + TypeScript
- Floating ChatWidget component
- Vite 7 for fast HMR builds
- Animations and responsive design
Running Locally
Start Ollama
ollama run llama3.2Start Weaviate
docker-compose up -dIngest Data
python ingest/ingest_jobs.pyRun API
uvicorn api.server:app