We have created this page to introduce Professor Hamilton’s class to Merlin Integrated Search and to provide assistance as you work through your class exercise.
What is Merlin Integrated Search?
Merlin IS is a new, full-featured investigation and e-discovery platform built using the latest open source technology. It is the first to integrate keyword and algorithmic search into a single, seamless user experience. We call it Search 2.0.
Here are some helpful links to get you started on Merlin IS and your class project.
- Here is the URL for the site: https://edu.merlin.tech.
- Your username is your first initial last name. As in JSmith.
- Your initial password is FloridaLaw2021* [including the *] Feel free to reset it by clicking on your profile at the top right.
- You can learn more about Search 2.0 in this short video.
- You can find several help guides on the Search page. But here is the Quick Guide to Keyword Search Syntax.
- And here is a Quick Guide to Using Sherlock.
- Here is a simple guide to creating private and shared folders.
- Lastly, here is your assignment for this class.
Sherlock Hounds is the first AI-powered, virtual document bloodhound. He has a knack for finding relevant documents. Wave a virtual hanky under his nose (or point to one or more relevant documents) and he will come to life, chasing after the scent until he finds what you are looking for, whether you are an investigator, e-discovery specialist or just need to find what you are looking for in a large document population.
You can meet Sherlock in person and send him off to find relevant documents in a collection of about 290,000 emails collected from Jeb Bush’s archives during his two terms as Governor of Florida.
Working with Sherlock
Start with a keyword search to find an initial document for Sherlock that relates to your topic. When you find one, let Sherlock loose to find more like your starter document. Sherlock will immediately take off, rank all 290,000 documents and present you the next most relevant document. Tell Sherlock “Good Boy” by clicking on the Thumbs Up symbol and Sherlock will go at it again, instantly ranking all of the documents based on what you have marked before and finding the next candidate for your review. Or if he misses, click thumbs down and he will go at it again.
What you will see may amaze you. Unlike humans, Sherlock possesses almost supernatural capabilities to find relevant documents once you tell him what you want to find. Some might even call it Artificial Intelligence but we just say that “Sherlock is pretty smart for a virtual dog.” Either way you look at it, Sherlock will find those relevant documents much faster than you might armed only with keyword search.
Sherlock not only finds relevant documents, he will highlight the top terms he associates with relevant documents. Watch as they change over time. Sherlock keeps learning and will find key terms or names that might be helpful to your investigation. Click on the Key Words link if you want to see more of the terms Sherlock finds important.
Just waive a virtual hankie in fron of Sherlock’s nose and say “Go get em boy.”
A Quick Introduction to Sherlock
Sherlock is unique in that it can analyze and rank one million documents in a hundred milliseconds. To our knowledge we are the first to integrate keyword and algorithmic search. Our goal was to revolutionize document search they way Pandora Internet Radio changed how people find new music.
You can see a number of helpful Sherlock videos here. Here is a quick introduction to get you started:
Our CEO and Founder, John Tredennick also introduced Sherlock to the Stanford Codex community in this 25 minute video. It will give you a better idea of how Search 2.0 works although we added the Cluster Batching feature after that video was recorded.
About the Documents
The site contains a collection of about 290,000 emails taken from Governor Jeb Bush’s archives during his two terms as governor of Florida (1999 to 2007). The emails reflect communications to and from Governor Bush and others in his administration. Not surprisingly, they contain plenty of information about issues that his administration faced during his years in office—for example, a bid for the 2000 Olympics, the Bush v. Gore election, political battles over keeping or sending Elian Gonzalez back to Cuba and a whole lot more.
Feel free to search the demo site for any information that interests you using our unique combination of an initial keyword search and user-directed relevance feedback to the AI engine. Just click thumbs up or thumbs down to let the AI algorithm know how it is doing.
Used at TREC
This same email corpus was used as a basis for testing AI engines for the Total Recall Track during the 2016 TREC (Text Retrieval Conference) program), which is sponsored by NIST (National Institute of Standards and Technology). The program brought together academics, legal and AI professionals from around the world who want to test their algorithms and search techniques in a controlled environment against other algorithms and approaches. The test used a locked down server to record each participant’s progress in finding relevant documents pertaining to any one of 34 separate topics. The goal was to see which methods or algorithms are the most effective at finding all of the relevant documents in the population.
You can read about the participants and their successes in the final report from the 2016 conference here.
Search Exercises (Hypothetical)
Just for fun, imagine you are an associate at a new firm. You are called in for a meeting with a litigation partner who needs help with a case. Specifically, depositions start on Monday and the partner needs your help getting prepared. If you aren’t efficient at this task, you better plan on working this weekend.
The partner advises that opposing counsel just produced 290,000 emails in the case that need review and analysis. As in most cases, the relevant emails are far outweighed by the number of irrelevant files caught in the discovery net. The partner is counting on you to quickly find documents relating to the following problems. Let’s see if Sherlock can help.
1. Trademark Dispute
Our client, Havana Club Holdings S.A. (a joint venture between Cuba and the French company Pernod) is in a dispute with the Bacardi over the right to use the Havana Club name in the United States. We believe Governor Bush has been illegally putting pressure on the U.S. Patent and Trademark Office and in particular, Trademark Trial and Appeal Board to rule in favor of Bacardi in this dispute. You can read more about this dispute here.
We sent out this request for production: “All documents evidencing, reflecting or pertaining to actions taken by Jeb Bush or his administration regarding a trademark dispute between Bacardi and the U.S. Patent and Trademark Office.”
We need as many relevant documents to use as exhibits for Governor Bush’s deposition. I need you to search the emails we have received to find what you can. ‘
2. Bottled Water Problem
Our client is a citizens group concerned about how Florida administrators are allowing private companies to extract water from the aquifer and sell it as bottled water. There are concerns that the companies are depleting the aquifer, causing sinkholes to swallow homes and more.
We sent out this request for production: “All documents evidencing, reflecting or pertaining to the extraction of water in Florida for bottling by commercial enterprises.” Depositions are not yet scheduled but we need to know whom to depose from which companies and which government officials are involved in the decision making process. Ideally, we would like to know who is in whose pocket in this business.
Here is your assignment.
There are two goals for this exercise:
- To provide a real-world opportunity to use a modern litigation support platform to search for and identify documents responsive to a Rule 34 request for production.
- To illustrate the difference between keyword search alone (Search 1.0) and the integration of keyword and algorithmic search (Search2.0).
To carry out the exercise, your Professor has chosen two topics for the class. We will split the students into two groups. Members of one group will tackle the first topic; members of the second group will tackle the second. One group will be asked to compete the topic using keyword search methods only. The other will be allowed to use both keyword and algorithmic methods. You will be given one hour to complete your exercise and are expected to record your time much as an associate would be. You will also be given a deadline to complete the work.
For the second exercise, you will be given a second topic to work on and be asked to switch from keyword only to keyword and algorithmic or vice versa. The idea is to give you a feel for both methods of search and to get your reactions to both. After the exercises are completed, we will report back on how the class did using the two different methods.
Before beginning the exercises, we ask that each student create to shared folders. The folder should be named with your unique ID, the exercise number (1 or 2) and whether you used Keyword only or had access to Sherlock. Here is an example of how you should name your folder:
027-1-Sherlock or 041-2-Keyword.
The folder should be created under the Private/Shared main folder and it should be shared with the User-Admin Role. Here is a simple guide to setting up a Shared folder. Here is a quick guide to creating folders.
Once you have created your folders, copy relevant documents into the appropriate folder for each exercise. This will allow us to match the documents selected against the TREC coding for those topics. We can then report on how many relevant documents you found using the different methods.