Building a Better Training Protocol for AI Review

How ReviewPartner turns protocol gaps into structured questions for the review manager

By John Tredennick

Every document review starts with a training protocol. The review manager uses it to teach reviewers about the case, the legal issues, the key people, entities and time periods, and what is being requested for production.

For an AI review, the training protocol is a set of instructions (a prompt) that an AI reviewer applies to every document in the collection. Get it right and the review works. Get it wrong, and every document in the collection pays the price.

The problem is that the training protocol is usually written before anyone has worked through the documents. Legal teams do their best, but every collection has its own language and contains documents no one could have anticipated. The real questions surface once reviewers begin work, and by then the protocol is already in use. Changing it means retraining the team and rechecking earlier calls.

Generic AI review makes this worse, not better. To be sure, the AI will apply whatever criteria it is given, faster and more consistently than a human. But when the protocol is wrong, the wrong rules will be applied to every document in the collection. The question is not whether AI can read and classify documents. The question is whether the criteria it is applying have been tested against the documents being reviewed.

A Different Approach

ReviewPartner starts with an initial training protocol, like every other review. The difference is what happens next. 

Sampling

Before the AI reviewers are turned loose on the full collection, ReviewPartner samples and analyzes hundreds of documents against the protocol instructions. The sample is built to include clearly responsive, clearly non-responsive, and the gray-zone documents in between, where the protocol is most likely to be tested.

Review Reports

For each sampled document, ReviewPartner provides a relevance score, a document summary, responsiveness comments, and a request for clarification where the training protocol is ambiguous or hard to apply.

As an example, here is a report on a document sampled from a hypothetical matter involving Jeb Bush and the Florida hurricane insurance crisis.

Document Summary [relscore 72]

This is a November 3, 2006 email from Mike Francisco to Governor Jeb Bush regarding “Major Issues: Post election.” Most of the message concerns Iraq and Secretary Rumsfeld [¶2]. One paragraph separately addresses “the insurance crisis,” stating that the sender has been told a proposed “business wind pool” has been rejected if secondary-market insurance is available, and complaining that he was quoted an allegedly excessive premium of $29,000 per year with a 6% deductible for a commercial building. He characterizes the pricing as unfair, says businesses would be left with no alternatives, and warns that this “will hurt Florida and Floridians” [¶3].

Responsiveness Summary

This document is responsive, although not as strongly as documents directly concerning homeowners or formal state action. It falls within the 2004–2006 period and contains substantive content about the Florida insurance crisis rather than a passing mention [¶3].

The strongest basis for responsiveness is:

Issue 3 — Executive Action and Inaction: The email is directed to Governor Bush and communicates information and complaint about a policy choice — specifically, that “the notion of a business wind pool has been rejected” — in the context of “the insurance crisis” [¶3]. Because Issue 3 expressly includes advice to the Governor and contemporaneous external criticism bearing on the insurance crisis, this paragraph is responsive.

Issue 5 — Economic Consequences: The sender asserts that the lack of alternatives and high pricing for business wind coverage “will hurt Florida and Floridians” [¶3], which bears on business impact and broader economic consequences.

There is also a possible but weaker connection to Issue 1 because the email complains about an allegedly excessive premium quote and lack of alternatives [¶3]. However, Issue 1 is framed around homeowners and consumer harm, while this paragraph concerns commercial/property business coverage rather than homeowner policies. I therefore would not rely primarily on Issue 1.

Clarification Requests

The following aspects of the definition create some uncertainty for this document:

  1. Homeowner-focused case versus business-insurance content. The case description and Issue 1 focus on Florida homeowners, but this email concerns a “business wind pool” and insurance for a 3,800-square-foot building used in a business context [¶3]. Clarification would help on whether commercial-property insurance documents are intended to be responsive when they bear on the same market crisis.
  2. Scope of Issue 5. Issue 5 includes “business impact” and broader economic consequences, which suggests inclusion here, but it is unclear how direct or substantial the economic nexus must be. This document offers only a brief assertion that the situation “will hurt Florida and Floridians” [¶3], without detailed economic analysis.
  3. Scope of Issue 3 where the document references a rejected policy proposal but does not identify the decisionmaker. The email says the sender was told a “business wind pool has been rejected” [¶3], but it does not say by whom. Clarification would help on whether a communication to the Governor criticizing an unidentified policy rejection is sufficient for Issue 3, or whether the document must more clearly tie the decision to the Governor’s Office or executive branch action.

ReviewPartner provides this level of analysis on each document it reviews.

Responsiveness Assessment Report

After finishing the sample set, ReviewPartner analyzes the results and creates a responsiveness assessment for the review manager. It typically starts with an overview of its findings.

Responsiveness Assessment Report
Overall Pattern

Across 440 assessed documents, responsiveness skews strongly non-responsive by count but with a substantial core of high-confidence responsive materials. The score distribution is: 294 documents at 0–19, 23 at 20–39, 3 at 40–59, 32 at 60–79, and 88 at 80–100.

The responsive core is consistent across the intermediate reports. Documents were most often treated as responsive when they contained one or more of the following:

  1. homeowner or constituent complaints about premium spikes, cancellations, non-renewals, inability to obtain private coverage, or forced placement into Citizens;

  2. communications to or from Governor Bush concerning requests for intervention, insurance reform, executive awareness, or responses to complaints;

  3. substantive materials about hurricane losses, insured-loss reporting, catastrophe funding/modeling, or 2004–2005 storm impacts in an insurance sense;

  4. Citizens-specific discussions about rates, deficits, claims handling, or insurer-of-last-resort function; and

  5. broader business, housing, mortgage, or economic harm tied to insurance disruption (Document 9193419062022¶2,6-18 [rel 96], Document 9193419032090¶1,3,5 [rel 95], Document 9193419096814¶6-21 [rel 95], Document 9193419105570 [rel 95], Document 9193419060020¶1-2 [rel 95], Document 9193419105876¶4-7 [rel 94], Document 9193419067003¶1-5 [rel 93], Document 9193419152868¶1-4 [rel 88], Document 9193419123852¶1-2 [rel 86], Document 9193419026004¶1-8 [rel 88]).

The non-responsive set is also consistent. Most low-scored documents concern disaster logistics, general hurricane relief, scheduling, praise/thank-you notes, personal or political correspondence, unrelated insurance lines, non-insurance housing or infrastructure issues, or vague materials whose relevance depends on missing context (Document 9193419024838 [rel 12], Document 9193419040709 [rel 12], Document 9193419227114¶1-4 [rel 12], Document 9193419181502¶3-6 [rel 14], Document 9193419045474 [rel 10], Document 9193419116473¶1 [rel 8], Document 9193419173016¶1-2 [rel 5], Document 9193419125834 [rel 3], Document 9193419224573¶1-3 [rel 2], Document 9193419217935 [rel 12]).

There are very few formal 40–59 “unclear” calls. The principal review risk lies instead in the 20–39 likely-not-responsive band and the 60–79 likely-responsive band, where the definition leaves recurring scope questions.

The report then turns to definitional deficiencies that should be addressed before the review can proceed.

Definitional Deficiencies

1. Issue 3 is under-specified for third-party communications to the Governor

The most recurring ambiguity is whether ordinary incoming communications to the Governor qualify under Issue 3 when they urge action, criticize inaction, or convey insurance-crisis complaints, but do not show internal executive deliberation or actual policy action. Many responsive calls appear to assume yes, but the definition does not clearly say so (Document 9193419062022¶2 [rel 96], Document 9193419005788¶1-3 [rel 95], Document 9193419242724¶1-3 [rel 95], Document 9193419032564¶1,11 [rel 93], Document 9193419105651¶1-5 [rel 72], Document 9193419205509¶3-4 [rel 63], Document 9193419192110¶1-4 [rel 68], Document 9193419093839¶1-5 [rel 72], Document 9193419114211¶1-2 [rel 78], Document 9193419118077¶1-3 [rel 68]).

Needed clarification: whether Issue 3 categorically includes communications to the Governor that request intervention, criticize executive inaction, or offer policy recommendations, even absent evidence of internal consideration or response.

2. Issue 3 does not clearly state how much gubernatorial involvement is enough

Relatedly, the definition does not say whether forwarding, routing, referral, requests for a response, or requests for briefing are sufficient executive involvement, or whether substantive policy content is required. This uncertainty is especially important in low-information emails and likely-not-responsive documents (Document 9193419114413¶1-2 [rel 96], Document 9193419090322¶1-2 [rel 95], Document 9193419288064¶1-2 [rel 92], Document 9193419063419 [rel 28], Document 9193419123717 [rel 24], Document 9193419169152 [rel 35], Document 9193419226235¶1 [rel 68], Document 9193419287935¶1-4 [rel 72], Document 9193419125632 [rel 28], Document 9193419024146¶1-8 [rel 33]).

Needed clarification: whether ministerial handling alone satisfies Issue 3, or whether the document must itself contain substantive insurance-crisis analysis, recommendation, decision, or criticism.

3. Issue 3 does not clearly define how directly a document must bear on the property-insurance crisis, as opposed to general hurricane/disaster response

A major false-negative risk appears in documents involving gubernatorial action during hurricane response, post-storm recovery, housing, building-code changes, FEMA problems, power restoration, accessibility, contractor abuse, or similar matters. Reviewers repeatedly questioned whether such materials qualify if they are crisis-adjacent but not expressly about insurance-market disruption (Document 9193419123717 [rel 24], Document 9193419169152 [rel 35], Document 9193419106000¶32-35 [rel 68], Document 9193419116944¶9-21 [rel 68], Document 9193419024146¶2-4,7 [rel 33], Document 9193419097886¶5-12 [rel 12], Document 9193419106643¶1 [rel 8], Document 9193419045474 [rel 10], Document 9193419024356 [rel 14], Document 9193419095099 [rel 12]).

Needed clarification: whether Issue 3 is limited to executive action expressly concerning insurance regulation, Citizens, rates, insurer exits, reforms, and consumer protection, or also includes broader hurricane-response actions that materially relate to the crisis.

Each section of the report includes links to representative documents along with their relevance scores. 

Refinement Questions

Based on the issues identified in its report, ReviewPartner generates a set of questions for the review manager and the legal team to answer. Each is built around a specific protocol weakness, with the analytical work already done. The system asks focused questions anchored in documents from the sample. The review manager is not being asked to figure out what is wrong. The review manager is given options to fix it.

A typical question describes the ambiguity in plain language, shows which part of the existing protocol is unclear, links to the specific documents that illustrate the issue, and offers pre-formulated answer options representing the reasonable interpretations the system identified. 

Here is an example from our hypothetical review project:

Q5. Do you want to add an explicit global exclusion for non-property insurance lines?

Context

The analysis notes that non-property insurance lines are easy no-calls in practice, but the definition still does not say so expressly. That omission matters chiefly under Issue 3, where communications to the Governor about “insurance” or insurance reform can look superficially responsive until the reviewer determines from the substance that they concern a different line. The cited examples are all clearly non-responsive, but they recur across health insurance, medical malpractice, and related insurance-policy materials: state employee health insurance procurement Document 9193419231835¶1-5 [rel 6], a health-insurance startup article Document 9193419044030¶1-20 [rel 8], multiple medical malpractice insurance critiques sent to the Governor Document 9193419185318¶1-8 [rel 4], Document 9193419141593¶1-3,8-20 [rel 5], Document 9193419209887¶3-10 [rel 6], Document 9193419070309¶1-12 [rel 4], Document 9193419274677¶1-9 [rel 5], and a malpractice-rate email Document 9193419225012¶1-4 [rel 6]. Because these are all clear-band non-responsive documents, this is not outcome-critical on the current set, but it is a clean drafting clarification if you want to eliminate avoidable hesitation.

Relevant definition language

The definition repeatedly refers to the “property-insurance crisis,” Florida homeowners, homeowner/residential coverage, hurricane losses, Citizens, and related economic consequences. But it nowhere says in one sentence that non-property insurance lines—such as health, medical malpractice, title, life, long-term care, or similar lines—are outside the definition unless the document itself directly bears on a stated issue in the property-insurance-crisis sense.

Yes. Add a sentence expressly stating that documents about non-property insurance lines are not responsive unless the document itself directly bears on one of the defined issues in the property-insurance-crisis sense.

Change magnitude

Minor. This would not materially change substantive scope but would improve reviewer efficiency and reduce superficial false positives under Issue 3.

Proposed edit

Add to the General Scope and Time Rule or Responsiveness Rule: “Documents concerning non-property insurance lines—such as health, medical malpractice, life, title, long-term care, or similar insurance lines—are not responsive unless the document itself directly bears on one of the defined issues in the homeowner/property-insurance-crisis sense.” No other substantive revisions should be necessary.

No. Leave the definition as written and rely on the existing property-insurance framing and issue language.

Change magnitude

None to minor. This would leave current treatment unchanged and rely on reviewer judgment to exclude obviously different insurance lines.

Proposed edit

No change to the definition.

○ Other: ____________________________________________________________

The review manager can pick from the choices offered or insert their own answer. The pre-formulated options are not a substitute for judgment. They are a starting point that lets the review manager move quickly to the substantive call.

These are the questions that surface late in a traditional review and get resolved inconsistently across a large reviewer team. In ReviewPartner, they surface before full review begins and get resolved once, by the people who should be making the call.

Refining the Training Protocol

Once the review manager and the legal team answer the questions, ReviewPartner again takes the lead. It uses the answers, backed by its analysis, to refine the training protocol. It presents a revised protocol to the review manager, with the option to view it in redline format or edit directly.

Issue 3 — Executive Action and Inaction

Governor's Office decisions, executive orders, task forces, and public statements bearing on the insurance crisis. Also responsive are internal deliberations about whether to intervene, decisions not to act, advice to the Governor recommending action or restraint, and contemporaneous internal or external criticism of executive inaction.

This includes substantive communications sent to the Governor or Governor's Office by third parties, including constituents, legislators, trade groups, reporters, or other outsiders, if the document itself requests intervention, recommends action or restraint, criticizes executive inaction, or otherwise advocates a position regarding the insurance crisis. This also includes neutral third-party communications to the Governor or Governor's Office that substantively convey information about the insurance crisis or focus on potential executive action concerning it, even if they do not expressly advocate a position, so long as they do more than make a bare inquiry or request for comment. A third-party communication need not show an internal response to qualify, but it must itself contain affirmative substantive content bearing on the insurance-crisis. A content; a bare inquiry, routing note, or request for comment without with no substantive advocacy or criticism crisis content does not qualify on that basis alone.

A document may also qualify under this issue if it reflects ministerial Governor's Office handling of a substantively insurance-crisis-related matter, including forwarding for response, requesting a briefing, routing to staff, requesting follow-up, or similar handling. Such handling is sufficient only if the document, viewed as a whole, contains substantive insurance-crisis content. A bare administrative act without substantive crisis content is not enough.

A document evidencing inaction must contain affirmative content such as a decision, recommendation, complaint, analysis, or substantive criticism; the mere absence of action is not itself a document.

Responsive if dated 2004–2006. Pre-2004 documents are also responsive if they directly evidence advance warning, executive awareness, or imminent market disruption closely tied to only as provided under the property-insurance crisis that materialized in 2004–2006 General Scope and Time Rule.

Testing the Refined Protocol

The next step is to test the refined protocol against the sampled documents to confirm the changes. ReviewPartner provides a new report and, if appropriate, asks further questions to clarify the protocol. The cycle typically repeats two or three times. With each pass, the number of uncertain determinations decreases and the protocol becomes more precise. The cycle ends when additional rounds produce diminishing improvement and the protocol converges.

At this point, the protocol is ready for validation. It has been refined against the documents and proven through repeated engagement with the actual collection. It is no longer a starting point. It is a finished training protocol built specifically for the matter.

Validation Confirms the Protocol Works

Once the training protocol is ready, ReviewPartner runs a formal validation step against a fresh sample of documents, typically around 1,000 documents drawn from the collection. The AI analyzes each document and produces a determination with reasoning. A human reviewer, typically the review manager or another senior reviewer, reads each document and either agrees or disagrees with the AI’s call.

The validation process produces accuracy figures the review manager can act on: overall agreement between the human reviewer and the AI, plus precision and recall against the validation sample. If the validation results are satisfactory, the review proceeds to full scale. If not, the review manager runs another refinement cycle. The cycle continues until the protocol meets the requirements of the matter and the review is ready to proceed at full scale.

What This Means for the Review

The question-and-answer process is what makes everything else work. Without it, AI document review is just a faster way to apply whatever criteria the legal team drafted at the start of the matter. With it, the training protocol gets built against the actual documents, refined in response to real ambiguities, and validated before a single production document is coded.

Human judgment is concentrated where it matters most. The review manager and the trial team are not supervising a large reviewer team. They are answering the questions that determine what the criteria should be. Everything downstream, the application of those criteria across hundreds of thousands of documents, happens with the speed and consistency that AI provides.

The questions are not a feature of the platform. They are the mechanism through which a better training protocol gets built, and a better training protocol is what makes a better review.

Want to learn more about ReviewPartner?
See how ReviewPartner helps review managers refine training protocols before full review begins. Visit the ReviewPartner page to learn more.

Ready to talk through your own review workflow?
Contact us to discuss how Merlin can help you bring more control, consistency, and efficiency to your next document review.

About the Author

John Tredennick (jt@merlin.tech) is CEO and Founder of Merlin Search Technologies, a company pioneering AI-powered document intelligence for legal professionals. A former trial lawyer and founder of Catalyst Repository Systems, he is recognized by the American Lawyer as a top six ediscovery pioneer and has been involved in legal technology and document review for more than 30 years.

Scroll to Top