The problem
Every revenue team has the same problem. Leads come in from your website, from events, from referrals. They all land in the same inbox. A rep opens each one and makes a judgment call: is this real? Is it urgent? Who should handle it?
This process takes 2-3 hours a day. The decisions are inconsistent. Warm leads sit for days while reps sort through noise.
This walkthrough covers the architecture of a system that eliminates manual triage entirely.
The architecture
The system has four stages. Each stage has a clear input, a clear output, and a clear contract between them.
Stage 1: Intake
Leads arrive from any channel — form submissions, email forwarding, CRM API, Zapier webhooks. They get normalized into a standard format.
The intake layer is deliberately simple. It accepts whatever format the source sends and maps it to a canonical schema:
- Required fields: name, email, message
- Optional fields: company, phone, timeline, budget, source channel
- Generated fields: intake timestamp, unique lead ID, raw source payload (stored for audit)
The key design decision: store the raw payload alongside the normalized version. You'll need it later for debugging classification issues. When the model misclassifies a lead, the first question is always "what did the original submission look like?"
Stage 2: Classification
This is where the AI earns its keep. The model evaluates each lead on three independent dimensions:
Intent — Is this person actively looking to buy, casually researching, or just filling out a form? The model looks at language specificity. "We need a lead triage system for our 50-person sales team by Q3" signals high intent. "Just checking out what you offer" signals low intent.
Fit — Does this lead match your ideal customer profile? Company size, industry, role, geography — whatever your ICP defines. The model cross-references available information against your qualification criteria.
Urgency — Is there a timeline? Budget pressure? A competing evaluation? Urgency signals determine whether this lead needs a response in hours or can wait for the weekly batch.
Each dimension returns a score (0-100) and reasoning. The reasoning matters more than the score. A rep who sees "Intent: 85 — Lead described a specific workflow bottleneck with timeline pressure" trusts the system. A rep who sees "Intent: 85" doesn't.
Implementation detail: Run the three evaluations in parallel, not sequentially. Classification adds latency. Three parallel calls complete in the time of one. A lead that takes 30 seconds to classify will be abandoned. A lead that takes 3 seconds will be adopted.
Stage 3: Routing
Based on classification scores, the lead gets assigned to the right rep. This stage is pure deterministic logic. No AI.
Rules might look like:
- Intent > 80 AND Fit > 70 → Enterprise team, priority queue
- Intent > 60 AND Fit > 50 → Mid-market team, standard queue
- Intent < 40 OR Fit < 30 → Nurture sequence, no rep assignment
- Urgency > 80 (any intent/fit) → Immediate alert to on-call rep
The rules live in configuration, not code. Sales ops should be able to adjust routing without a deploy. This is a critical design choice — the people who understand lead routing best are not the people who write the code.
Stage 4: Response generation
The system generates a structured first response. The response references what the lead actually asked about. It proposes a specific next step based on the classification.
High-intent, high-fit: Direct calendar link to a discovery call with the assigned rep. The message acknowledges their specific problem.
Medium-intent, medium-fit: Educational content relevant to their stated interest. A softer ask — "would a 15-minute overview be useful?"
Low-intent or low-fit: Automated nurture. Add to a drip sequence. No rep time spent.
The response draft goes through a review gate before sending. The rep sees the lead data, the classification with reasoning, and the proposed response. They approve, edit, or override. Every edit is logged as feedback.
The review gate in detail
The review gate is not optional. It's the mechanism that builds trust in the system.
In the first week, reps will edit 60% of responses. That's expected. The edits teach the system what "good" looks like for your team's voice and standards.
By week four, the edit rate drops to 15-20%. By month three, most reps approve without changes for standard leads and only edit edge cases.
The key: log every edit with a diff. What did the rep change? Did they soften the language? Add a detail? Remove an assumption? This data is the feedback loop that makes classification and response generation sharper over time.
What changes after deployment
Response time drops from hours to minutes. Classification is consistent across every lead — no more quality differences between the rep who triages on Monday morning (caffeinated, focused) and Friday afternoon (checked out).
Reps spend zero time on sorting and all their time on qualified conversations. The system handles the volume that would take a team of 3 reps to sort manually.
But the biggest change is less visible: the data. After a month, you have structured classification data on every lead. You know which channels produce high-intent leads. You know which ICPs convert. You know your actual response time distribution.
That data was always there in theory. In practice, it was trapped in rep judgment calls that nobody tracked.
This is First Lead Inbox
The system described here is First Lead Inbox. It's in private release and serving revenue teams now.
If your team is still manually triaging inbound leads — we'd like to understand your workflow. The system adapts to your routing rules, your ICP definition, and your team's communication standards.