From the dawn of the digital age, keyword search has been the only means to find relevant information in unstructured text. The rub is that most keywords searches don’t work very well. They typically bring back too many irrelevant documents, often as many as ten to one, which can increase review costs, typically by a lot. They can also miss a lot of relevant documents. Getting only part of the story can be a problem, and costly in many cases.
In short, keyword search is less than perfect. But what’s the alternative?
We built Merlin Integrated Search (IS) to address this problem–to provide a better way to find information in large document populations. By integrating keyword and algorithmic search, we can help people find relevant documents without mastering complex keyword search syntax, even in massive document sets.
We call this new way to find documents Search 2.0, the integration of keyword and algorithmic search in a single platform. Search 2.0 gives you the best of both worlds, the utility of simple keywords and the power of machine learning (AI). With Search 2.0, we can change how people find great documents the way Pandora and its progeny changed how we find and listen to great music.
Watch this short video to see how Search 2.0 can be a game changer for document search.
Exploring Sherlock and Search 2.0
Here is a look at how we are reinventing search. The screens below were taken from our demo site with with just under two million documents, primarily emails from Jeb Bush’s two terms as governor. The topic for this exercise was a disputed claim that two commercial bottling companies were extracting too much water from their wells, resulting in depletion of a key Florida resource,
The request (RFP) was this:
Bottled Water — All documents concerning the extraction of water in Florida for bottling by commercial enterprises.
It was created by NIST, the National institute of Standards and Technology for their annual TREC (Text Retrieval) Conference. The screens we present were taken directly from the site. You can duplicate these results yourself if you like with access to our site.
There are a number of approaches you could take to begin this exercise. The most obvious in a Search 1.0 world is to use Boolean syntax to create a keyword search. For many of us, that would be a daunting prospect. Not surprisingly, we would recommend taking a different approach.
With Merlin IS, use our unique freeform search capability to get started. Rather than craft a complicated Boolean search, simply enter the text of the request itself, punctuation and all. The only requirement is that you frame the request in square brackets.
Merlin will ignore the lack of syntax and go after documents containing the highest number of terms in your statement. To be sure, you would probable remove a lot of the “noise” words from the request but that is not required. The idea is to make the starting search as simple as possible. The beauty is you can forget about mastering complex keyword syntax.
Just enter every term you can think of that might be associated with your topic.
For this type of search, choose “Sort By: Relevance (High to Low). Merlin IS will present the most likely relevant documents at the top of the results list.
We recommend viewing the results in our unique snippet view. It is particularly helpful if you sort results in relevance order. Unlike the typical grid display (which we also have), the snippet view shows extracted text along with fields in a Google-like view. It can help you get a quick idea of your search results before you take next steps.
Relevance sorting uses a special algorithm that analyzes the density of the keywords from your search in each document. It brings the documents with the most keywords to the top, which are often the most useful.
View Document Text
From the snippet view you can see the text of individual documents. This helps ensure you are on the right track.
Or click on the record itself to view the individual document and its associated tags and metadata.
Selecting Multiple Documents
Scroll down the page to review and select relevant documents from your search.
You can folder the documents or send them directly to Sherlock and use the power of machine learning to find more.
Naming Your Session
Sherlock gives you the option to name your session and to have Sherlock place positive documents. “Thumbs Up”, in a selected folder.
Naming a session allows you to return to it at a later time or let others join the search Sherlock will retain the earlier training as it looks for more good documents.
Reviewing New Documents
In milliseconds, Sherlock will analyze the document you sent, build an AI model around its key terms and apply that model to the other documents on your site.
It will then rank the documents in relevance order, bringing back the next most likely relevant document with key terms highlighted.
Thumbs Up / Thumbs Down
Take a look at the document from Sherlock. If it is relevant to your inquiry, give it a Thumbs Up. If not, give it a Thumbs Down to help Sherlock get on the right track.
You may need to review a couple documents before Sherlock zeroes in on your target.
Sherlock provides key terms that may be important to your search. You can view them in a Word Cloud. The more important terms are larger in size.
Or view the top key terms in list format. They too are ranked in order of importance. You can boost terms you think are especially important. And suppress terms that Sherlock should ignore.
Either way, Sherlock will take these into account the next time you send a document judgment.
Continue Your Review
As you keep judging documents, Sherlock gets smarter, much like Pandora gets better at finding new music. Continue with Sherlock until you find what you need, or he stops finding relevant documents. You can stop at any time and return to the session later.
We track your progress at the top of the screen.
Sherlock Keeps Getting Smarter
As the review progresses, Sherlock just gets smarter about the information you seek. You can see the key terms evolve as you continue your review.
Continue your doc by doc review or consider this next unique aspect of Search 2.0: Cluster Batching.
Use Cluster Batching
You can enhance review efficiency by using Sherlock’s unique Cluster Batching algorithm. Rather than review documents one by one, ask Sherlock to send the next 50 documents in ranked order for your review.
Sherlock will group your results in individual clusters of documents with similar content. You can then review documents individually or tag them by cluster.
Search 2.0 in Action
Want to see a few more examples of Search 2.0 in action? Watch as Sherlock tackles two of his toughest cases.
Back in the early 2000’s citizens of Florida were concerned about felon voting rights. Can Search 2.0 help us find documents on this thorny issue? Let’s find out.
Climate change was an important issue during Jeb Bush’s terms as Governor of Florida. Can Search 2.0 help us find documents about this issue? Watch and see.
Why Search 2.0?
The blinding speed of our Sherlock machine learning algorithm is the cornerstone of Search 2.0. It allowed us to create a seamless integration between keyword and algorithmic search for the first time ever. Suddenly, people can find what they need in large document populations—quickly and easily—without having to master complex keyword search syntax.
Why would I use Search 2.0? The answer is simple. It represents a quantum leap forward in a world where digital content is expanding at light speed. We happily went from yellow pads to word processors, flip phones to smartphones and mail to email. All were powered by technology advances that changed how we communicate and work. None of us looked back when these new innovations arrived.
None of us will go back to Search 1.0 either once we have the chance to use something better.