Technology

Hybrid AI architecture, engineered for legal stakes.

Frontier models where reasoning matters. Local inference where privacy matters. Retrieval and provenance at every step — because a finding without a citation isn't a finding.

Architecture

One pipeline. Two inference paths. Your policy decides.

Document
Intake
PDF · DOCX · scan · email
Parse &
Chunk
OCR · layout · semantic chunking
Embed &
Index
vector store · full provenance
Cloud Reasoning
Anthropic Claude
or
Local Inference
Llama · Mistral · Gemma
Structured
Output
citations · scores · audit trail

Core Components

Each layer does one job, well.

Ingestion & Parsing

Native PDF, scanned image, and DOCX ingestion. OCR for degraded scans; layout detection preserves tables, schedules, and exhibits. Headings, clause boundaries, and defined terms are identified structurally — not inferred from formatting alone.

  • PyMuPDF
  • Tesseract / Azure OCR
  • LayoutLM
  • Unstructured.io

Embeddings & Vector Store

Semantic chunking at clause granularity. Embeddings from Voyage or local encoders for sensitive corpora. Stored with document, page, and paragraph provenance so every retrieval can be cited.

  • Voyage Embeddings
  • BGE / E5 (local)
  • pgvector
  • Qdrant

Retrieval-Augmented Reasoning

Hybrid BM25 + dense retrieval surfaces candidate passages; the reasoning model applies the playbook against retrieved context, not against model memory. This is how hallucinations are kept out of the report.

  • Anthropic Claude (Opus / Sonnet / Haiku)
  • Tool use & structured output
  • Prompt caching

Local Inference Path

Open-weights models deployed in-VPC or on-prem for sensitive matters. Same pipeline, same output schema — the routing layer decides which inference backend a document hits based on classification and client policy.

  • Gemma4
  • Mistral
  • Ollama
  • vLLM

Evaluation & Observability

Gold-set evaluation harnesses measure extraction precision and recall per clause type. Production monitoring tracks latency, token spend, model drift, and flagged reviewer overrides — feedback loops close automatically.

  • Custom eval harness
  • Human-in-the-loop labeling
  • Langfuse-style tracing

Storage & Infrastructure

PostgreSQL for operational data, SQLite for lightweight deployments, S3 for document storage with client-side encryption. Containerized and deployable to AWS, Azure, GCP, or fully air-gapped on-prem Kubernetes.

  • Weaviate
  • SQLite
  • S3 / MinIO
  • Docker

Why Hybrid

Three axes that force the design.

I.

Privacy

Merger documents, trade secrets, privileged communications — these do not belong in third-party cloud logs. Local inference keeps them on your hardware without sacrificing the analysis.

II.

Cost

Bulk classification — "is this an NDA? is this exhibit relevant?" — runs cheaply on local or small cloud models. Frontier models are reserved for the 5% of work that actually needs their reasoning depth.

III.

Accuracy

Different models are better at different things. The architecture routes each task to the model that performs best on it in your evaluation suite — not the one that sounded best in a demo.

Curious whether this fits your stack?

I'll review your workflow, constraints, and sample documents, and tell you honestly whether this approach is a fit.

Schedule a Technical Conversation