System online · Lahore

Case · 01 / 05

Anonymized

Real estate · Operations · Tier A · 8 weeks

Tenant-ops copilot for a US property management platform.

Multi-agent system handling lease ingest, maintenance triage, and tenant communications. Three specialist agents wired through a routing orchestrator with eval gating before any tenant-facing action.

EngagementTier A · 8 weeks

Team1 founder + 1 contractor

StackNext.js, Claude, LangGraph, pgvector, AWS

StatusShipped · in production

01The problem

Lease admin and maintenance ops were eating the team's day.

The client runs a property management platform with thousands of units under management. Operations were drowning in three high-volume, low-margin tasks: ingesting new lease documents into structured data, triaging maintenance requests by severity and routing them to the right vendor, and answering tenant questions over email and chat.

Each task on its own was a candidate for automation. Together they overwhelmed a small ops team — a tenant question would surface a maintenance issue, which would surface a lease-clause question, and the chain would break across three tools and two human handoffs. Off-the-shelf RPA couldn't reason across the three. A single LLM agent couldn't keep state across the chain. They needed something that could route, specialize, and evaluate before acting.

Time-to-resolution on a typical maintenance ticket was running roughly 38 hours from intake to vendor confirmation. The client wanted that under a day, ideally under half a day, without hiring more ops staff.

02What we built

Three specialist agents behind a routing orchestrator. Every action eval-gated.

Fig. 02.A · System topology Production

The system has four moving pieces. Intake normalizes everything coming in — email threads, chat messages, portal forms, freshly uploaded lease PDFs — into a typed event stream. The router is the orchestrator: it inspects each event, decides which specialist agent owns it, and hands off with the relevant context window already loaded.

Three specialist agents handle the actual work. The lease agent reads PDFs, pulls structured fields with pgvector-backed retrieval over prior leases, and answers clause-level questions. The maintenance agent classifies severity, picks a vendor from the network, and drafts a work order with a quoted ETA. The comms agent drafts tenant-facing replies in the building's voice — formal for one portfolio, casual for another, multilingual where needed.

Every agent's output passes through an eval gate before reaching a tenant. The gate runs a small evaluator LLM with rubric-based scoring on tone, factuality, and policy compliance. Below threshold, the action retries with a corrected prompt; below threshold twice, it escalates to a human reviewer. Nothing tenant-facing ships without that signoff.

03How we built it

Four phases. Eight weeks.

01 · Map

Two-day discovery

Walked the existing ops workflow with the head of operations. Identified the three highest-cost task families and the handoff points between them. Output: agent architecture spec + 8-week SOW.

Days 1 — 2

02 · Build

Spine first

Stood up the intake → router → eval-gate skeleton end-to-end with stub agents returning canned responses. Wired retrieval, persistence, observability before any real agent code.

Weeks 1 — 3

03 · Wire

Real agents land

Replaced stubs with the lease, maintenance, and comms agents one at a time. Each came with its own eval suite and rollback path. Shadow-tested against historical tickets before any went live.

Weeks 3 — 6

04 · Ship

Production deploy

Deployed to the client's AWS account with full observability dashboards. Loom walkthrough, runbook, 30-day support tail. The ops team had final approval on every tenant-facing message for the first two weeks.

Weeks 7 — 8

04Stack & tradeoffs

Why these tools.

Claude for the agents. Stronger on tone control and document reasoning than the alternatives at the time of build, and the eval gate's rubric scoring was tighter on Claude than on what we benchmarked from OpenAI on the same prompts. LangGraph for the orchestration — explicit state machine, easy to inspect, easy to add the eval node without rewriting the routing layer.

pgvector over Postgres for retrieval. The client already ran Postgres in production; adding a separate vector DB would have been one more operational burden. pgvector held up fine at the lease corpus size (~50K documents) and let us colocate retrieval with the existing relational schema. Next.js for the small ops console — server components for the data-heavy views, nothing fancy.

Considered and rejected: a pure RAG-over-everything single-agent setup (loses specialization), a workflow-platform-based approach (couldn't express the eval gate cleanly), and routing inside the LLM context with no orchestrator (worked in demos, fell apart on edge cases the third week of testing).

05Outcomes

What changed after deploy.

Resolution time 38 hrs → 6 hrs − 84%

Tickets handled fully autonomously ~65% no human touch

Eval-gate save rate ~12% caught + retried before tenant

Ops team headcount Same no new hires

Illustrative ranges. Specific client metrics are confirmed under NDA. Numbers shown reflect reported outcomes from the engagement and are representative of what the system delivered at handover.

→More work

Other systems we've shipped.

Want one of these for your team?

30-min scope call. By the end you'll know what we'd build, in what order, what it costs.

Book a call Email instead