Merlin Unitizeâ„¢
Intelligent Document Separation for Large PDF Collections
Turn large, unseparated PDF files into organized, individual documents, using AI-powered boundary detection at a fraction of the cost of manual processing.
The Unitization Problem
Document productions and scanned collections routinely arrive as large, unseparated PDF files, in some cases with tens of thousands of pages. Without unitization, these files can’t be loaded into a review platform, coded, searched effectively, or produced in any useful way.
Manual unitization requires staff to review every page, identify where one document ends and another begins, then split and rename files by hand. For large collections, this takes days or weeks and can cost thousands of dollars. The work can also be inconsistent. Different reviewers make different judgment calls about document boundaries, especially with mixed document types like emails followed by attachments, multi-page letters, or fax cover sheets.
Why Does It Matter?
Until documents are properly separated, every downstream task. OCR, coding, review, production, is either blocked or degraded. Unitization isn’t optional. It’s the prerequisite for everything else.
Traditional approaches to this problem haven’t changed in decades: hire staff, review pages, split files. Unitize replaces that manual process with AI that identifies document boundaries automatically, processing large collections overnight at a fraction of the cost.
How Unitizeâ„¢ Works
Unitize uses AI-powered boundary detection to identify where individual documents begin and end within large PDF files. The system analyzes the text extracted by our OCR Vision service ,along with changes in formatting, headers, letterheads, page numbering resets, and document type transitions, to determine where one document ends and the next begins.
The system handles mixed document types within a single file: correspondence, reports, forms, exhibits, fax transmissions, handwritten notes. It applies the same kind of judgment a human reviewer would use, but does it consistently across the entire collection without fatigue or variability.
The output is a set of individual, properly separated PDF files with sequential naming, ready for loading into any review or document management platform. Collections of any size can be processed quickly. Typical turnaround is two days for most projects.
Key Capabilities
- Detects document boundaries using AI analysis of text content and visual layout
- Separates and organizes documents automatically with sequential naming
- Handles mixed document types and formats within a single file
- Processes large collections in a matter of days
- Delivers output compatible with any review or document management platform
- Works standalone or as part of the full DocPartner processing pipeline
Part of the DocPartnerâ„¢ Pipeline
Unitize is one of three services in the DocPartner document intelligence pipeline. Use it standalone or combine it with OCR Vision and MetaSummary for a complete document processing workflow.
Unitizeâ„¢
Advanced NLP algorithms identify document types, extract entities, and understand contextual relationships within complex legal documents.
OCR Visionâ„¢
Powered by Claude (Anthropic) and GPT-4 (OpenAI) for superior document comprehension and metadata extraction accuracy.
MetaSummaryâ„¢
AI generates both concise overviews and detailed summaries, capturing key facts and themes human coders typically miss.
For collections that already have searchable text but lack document breaks, Unitize can run independently. Output integrates with any review platform using standard file structures.
Unitizeâ„¢ in Action
Insurance Defense Production | 50,000 Pages, No Document Breaks
A large insurance defense firm received 50,000 pages of scanned documents to produce — no text, no document breaks, no metadata. The full DocPartner pipeline unitized the collection into individual documents, extracted text including handwriting, pulled metadata, and generated summaries.
Result: From limited-value images to organized, actionable intelligence overnight.
Who We Serve
Law Firms
Replace expensive offshore coding services with intelligent AI processing that delivers superior results overnight.
Litigation Support Companies
Enhance your service offerings with AI-powered coding capabilities that differentiate you from traditional providers.
Corporate Legal Departments
Streamline internal document processing and reduce dependency on external coding vendors.