Planit AI, semantic search across a UK planning corpus
The challenge
Planit Consulting is a UK planning consultancy with an archive of 18,429 .docx documents: planning statements, design & access, heritage, appeal, pre-app, cover letters and conditions. Consultants had to manually search through this archive for precedents and relevant passages. Slow, error-prone Ctrl+F work that cost hours per research task.
The analysis
The corpus is heterogeneous, 14 document types spread across 3 tiers from primary to metadata-only. Not every type demands the same chunk strategy.
Document classification was step one. Without reliable typing, retrieval could never become relevant. Of the 18,429 documents, 514 remained unclassified and were handled separately.
Embedding costs at 95,000+ chunks are substantial. We chose OpenAI text-embedding-3-large (3072 dimensions) over cheaper alternatives for precision, with cost tracking at the chunk level.
RAG without conversation history is unusable for consultants who iterate on questions. SQLite persistence for conversations and messages was part of the architecture from day one.
Our solution
- Document classification into 14 types across 3 tiers via a Python pipeline
- Semantic chunking with section detection: 95,592 chunks from 15,801 docs
- Embedding via OpenAI text-embedding-3-large with separate handling for oversized chunks
- ChromaDB vector store with persistent storage
- FastAPI backend with REST endpoints and CORS
- React TypeScript chat UI (Planit AI) with sidebar, conversation grouping, rename and delete
- RAG via Claude with source citations back to original documents
- SQLite persistence for conversations and messages
The Blueprint
Results

“We are genuinely amazed at how well this works. Roughly 40% of our daily consultant work is now automated. Finding relevant precedents that used to take hours now takes seconds, with source citations back to the original document so we can always verify.”
Related articles
BBS Advocaten
LegalAutomatically converting court RSS feeds into publishable blog posts, social media content and summaries. Including PDF upload pipeline.
Mastone
Real EstateComplete platform with WWS points calculation, online brochures, account management and investor portal. Over 100 conditions processed in one system.
Simply
Recruitment & SalesAutomated outbound lead generation with niche targeting, DMU identification, multichannel outreach and smart Salesforce detection.
