A Revolution in Search Technology
Welcome to the Sherlock Evaluation Site. We’re taking search to the next level by combining the familiarity of keywords with the speed and power of artificial intelligence. It’s a new and better way to find information for investigations, discovery and ECA.
Sherlock, our AI-powered search, investigation and review platform, is built around a set of smart, machine learning algorithms. These algorithms work in tandem with keywords to provide a faster, easier and more cost effective way to find relevant information in large document sets.
This is the first Search 2.0 platform—the first to integrate keyword and algorithmic search seamlessly in a powerful search, investigation and review platform. Once you experience the power of algorithmic search, you will never again be satisfied with keywords alone.
Meet Sherlock, the world’s first digital document bloodhound and the heart of our Search 2.0 platform. In milliseconds, he will analyze documents you send and find others for further review. Fast and scalable, Sherlock can rank a million documents in 100 milliseconds, literally the blink of an eye.
Sherlock eliminates the need to craft complex keyword searches. Instead, we’ve programmed Sherlock’s smart, machine-learning algorithms to key off your judgments, retrieving relevant documents faster, more easily and at lower costs than was ever possible with traditional keyword search alone. Just click “Thumbs Up” and let Sherlock do the rest, quickly and effectively finding relevant documents to further your investigation or review.
Sherlock works like Pandora Radio, only he finds great documents rather rather than great music.
Waive a virtual hankie in front of Sherlock’s nose and say “Go get em boy.”
How Sherlock Works
Along with full-featured keyword search, Sherlock offers several new, AI-enabled ways to find information in large document sets.
Start with a keyword search to find a relevant document for Sherlock. When you find one, send it to Sherlock to find more relevant documents. Within milliseconds, Sherlock will analyze your starting document, build a machine-learning model, apply it to all 2.7 million documents on the site and then return likely relevant documents along with key terms.
Tell Sherlock “Good Boy” by clicking on the Thumbs Up symbol and Sherlock will go at it again, quickly re-ranking all of the documents based on the documents previously sent. Or if Sherlock misses the mark, click thumbs down and he will quickly get back on track.
What you will see may amaze you. Unlike humans, Sherlock possesses almost supernatural capabilities to find relevant documents once you alert him to what you want to find. Some might even call it Artificial Intelligence but we just say that “Sherlock is pretty smart for a virtual dog.” Either way you look at it, Sherlock will find those relevant documents faster than you can armed only with keyword search syntax.
Sherlock not only finds relevant documents, he will highlight the top terms he associates with relevant documents. Watch as they change over time. Sherlock keeps learning and will find key terms or names that might be helpful to your investigation. Click on the Keywords link if you want to see more of the terms Sherlock finds important.
More Ways to Find Documents
That is only the starting point for using Sherlock. Here are several more search options:
- Send Multiple Documents to Sherlock: Send a folder of documents to Sherlock. In milliseconds, Sherlock will analyze the documents you have sent and bring back more for your review. This is a good way to find good documents that you might otherwise miss.
- Send Search Results to Sherlock: Start with a keyword search and view results in the Snippet View. Sort by relevance and then send selected results (or even the first page) to Sherlock. Much like before, he will analyze the documents you send and quickly find other good ones for your review.
- Use Freeform Search: Our new Freeform search allows you to paste in text or enter keywords without worrying about syntax or even punctuation. Just wrap your Freeform query in [square brackets] to let Sherlock know this is a Freeform search. Then sort by relevance and review the top returns in Snippet View.
From there select several documents (or a page of them) and send them to Sherlock. In many cases, you will find that Freeform is an easier and more effective way to search than building complex keyword syntax. And you can combine a Freeform statement with other field and tag criteria.
A Quick Introduction to Sherlock
To get you started, here are two short video introductions to Sherlock and the power of Search 2.0.
There are more Sherlock videos on our website. Take a look to learn more about the different ways you can use Sherlock to find relevant information.
About the Site
The site contains more than 2.7 million documents from two different collections. The first consists of about two million from Governor Jeb Bush ’s archives during his two terms as governor of Florida (1999 to 2007). The second About 700,00 consist of the native format version of the EDRM Enron V2 test collection. Here is more about each of these collections.
The Bush Collection
The site contains a collection of about 1.9 million emails taken from Governor Jeb Bush’s archives during his two terms as governor of Florida (1999 to 2007). The emails reflect communications to and from Governor Bush and others in his administration. Not surprisingly, they contain plenty of information about issues that his administration faced during his years in office—for example, a bid for the 2000 Olympics, the Bush v. Gore election, political battles over keeping or sending Elian Gonzalez back to Cuba and a whole lot more.
A smaller set of these emails (about 290,000) were used as a basis for testing AI engines for the Total Recall Track during the 2016 TREC (Text Retrieval Conference) program, which is sponsored by NIST (National Institute of Standards and Technology). The program brought together academics, legal and AI professionals from around the world who want to test their algorithms and search techniques in a controlled environment against other algorithms and approaches.
The test used a locked down server to record each participant’s progress in finding relevant documents pertaining to any one of 34 separate topics. The goal was to see which methods or algorithms are the most effective at finding all of the relevant documents in the population. You can read about the participants and their successes in the final report from the TREC 2016 conference here.
Search Exercises for the Bush Collection
Just for fun, imagine you are an associate at a new firm. You are called in for a meeting with a litigation partner who needs help with a case. Specifically, depositions start on Monday and the partner needs your help getting prepared. If you aren’t efficient at this task, you better plan on working this weekend.
The partner advises that opposing counsel just produced 1.9 million emails in the case that need review and analysis. As in most cases, the relevant emails are far outweighed by the number of irrelevant files caught in the discovery net. The partner is counting on you to quickly find documents relating to the following problems. Let’s see if Sherlock can help.
1. Trademark Dispute
Our client, Havana Club Holdings S.A. (a joint venture between Cuba and the French company Pernod) is in a dispute with the Bacardi over the right to use the Havana Club name in the United States. We believe Governor Bush has been illegally putting pressure on the U.S. Patent and Trademark Office and in particular, Trademark Trial and Appeal Board to rule in favor of Bacardi in this dispute. You can read more about this dispute here.
We sent out this request for production: “All documents evidencing, reflecting or pertaining to actions taken by Jeb Bush or his administration regarding a trademark dispute between Bacardi and the U.S. Patent and Trademark Office.”
We need as many relevant documents to use as exhibits for Governor Bush’s deposition. I need you to search the emails we have received to find what you can. ‘
2. Bottled Water Problem
Our client is a citizens group concerned about how Florida administrators are allowing private companies to extract water from the aquifer and sell it as bottled water. There are concerns that the companies are depleting the aquifer, causing sinkholes to swallow homes and more.
We sent out this request for production: “All documents evidencing, reflecting or pertaining to the extraction of water in Florida for bottling by commercial enterprises.” Depositions are not yet scheduled but we need to know whom to depose from which companies and which government officials are involved in the decision making process. Ideally, we would like to know who is in whose pocket in this business.
Other Topics to Consider
In the 2016 TREC program, participants were given 34 different topics to explore, each of which could be treated like a Fed.R.Civ.P. Rule 34 request for production (for non-lawyers, that is simply a request to produce documents). Here are each of those topics, any one of which would provide a good search exercise for the demo site.
- Bacardi Trademark Lobbying — Documents related to the Jeb Bush administration’s involvement in a trademark dispute between Bacardi and the U.S. Patent and Trademark Office.
- Bottled Water — All documents concerning the extraction of water in Florida for bottling by commercial enterprises.
- Summer Olympics — All documents concerning a bid to host the Summer Olympic Games in Florida.
- Save the Manatee–All documents about this program but not about Manatee county itself.
- Florida Horse Park–Documents about the establishment of a horse park in Florida and Bush’s veto of the program.
- Space — All documents concerning the space industry, the space program, space travel (whether manned or unmanned, public or private), and the study or exploration of space in Florida.
- Eminent Domain — All documents concerning the legality or morality of expropriating land in Florida for commercial development.
- Newt Gingrich — All documents concerning House Speaker Newt Gingrich or any entities or personnel associated with Newt Gingrich.
- Felon Disenfranchisement — All documents concerning the right of felons to vote in Florida, including but not limited to voter purges and reinstatement of voter rights. Individual clemency cases in Florida are not relevant.
- Faith-Based Initiatives — All documents concerning grants or other initiatives in Florida to offload social services to so-called faith-based agencies. Services include but are not limited to education, prisons, and emergency relief.
- Invasive Species — All documents concerning the problem of invasive species in Florida, that is, non-native plants or animals that threaten the Florida ecosystem.
- Climate Change — All documents concerning climate change, global warming, or carbon emissions, whether in Florida or otherwise.
- Condominiums — All documents concerning the rules and organizations governing Florida condominium associations and conflicts between owners and managers in Florida. Relevant documents include those concerning the establishment of the Florida office of ombudsman, and issues relating to hiring and firing the ombudsman.
- “Stand Your Ground” — All documents concerning a Florida bill permitting the use of deadly force to protect one’s self or one’s property.
- 2000 Recount — All documents concerning the contested result of the 2000 presidential election.
- James V. Crosby — All documents concerning James V. Crosby, including but not limited to his relationship with Governor Bush before being appointed as Florida Secretary of Corrections, his role as Secretary, his firing, and any criminal allegations against Mr. Crosby.
- Medicaid Reform — All documents concerning efforts to reform Medicaid.
- George W. Bush — All documents concerning George W. Bush, whether by explicit reference or by his relationship to Governor Bush.
- Marketing — All documents concerning advertising or marketing efforts undertaken by the Florida Governor’s office or any other institution of the State of Florida.
- Movie Gallery — All documents concerning investments or divestments by the State of Florida in Movie Gallery.
- War Preparations — All documents concerning preparations for the Iraq War undertaken in Florida before the March 20, 2003 invasion.
- Lost Foster Child Rilya Wilson — All documents concerning the disappearance of lost foster child, Rilya Wilson, and the impact or aftermath in Florida resulting from the loss.
- Billboards — All documents concerning rights and control of billboards in Florida. Different legislative efforts should be considered to be separate sub-categories.
- Traffic Cameras — All documents involving discussions of the use of unattended cameras to enforce traffic laws in Florida.
- Non-Resident Aliens (NRA) — All documents involving discussions of the non-resident alien issue. Documents concerning the National Rifle Association are not relevant.
- National Rifle Association (NRA) — All documents concerning the National Rifle Association, its members, and its influences. Documents concerning the non-resident alien issue are not relevant.
- Gulf Drilling — All documents involving discussions of off-shore drilling for oil or gas. Drilling of wells for water is not relevant.
- Civil Rights Act of 2003 — All documents involving discussions of the Florida Civil Rights Act of 2003.
- Jeffrey Goldhagen — All documents related to Jeffrey Goldhagen’s role in the Bush administration, his firing, and reinstatement.
- Slot Machines — All documents concerning the definition, legality, and licensing of “slot machines” in Florida.
- New Stadiums and Arenas — All documents involving discussions of the construction of new sports stadiums or arenas in Florida.
- Cuban Child, Elian Gonzales — All documents involving discussions of the Cuban child, Elian Gonzales, and his whereabouts or status.
- Restraints and Helmets — All documents involving discussions of seat belts, child seats, and helmet mandates.
- Agency Credit Ratings — All documents involving discussions of credit ratings of Florida institutions, including but not limited to those undertaken by Standard and Poor’s, Fitch’s, and Moody’s.
- Gay Adoption — All documents involving discussions of the gay adoption issue in Florida.
- Abstinence — All discussions of abstinence and abstinence-only programs in Florida to supplant birth control or sex education.
The Enron Collection
We have also loaded the native format version of the EDRM Enron V2 test collection. The EDRM Enron v2 collection was first used in the 2010 TREC Legal Track. It was derived from the EDRM Enron Dataset V2 prepared by ZL Technologies in consultation with the Legal Track coordinators, and hosted by EDRM.
ZL acquired the full collection of 1.3 million Enron email messages from Lockheed Martin (formerly Aspen Systems) who captured and maintained the dataset on behalf of FERC. After deduplication it came to 455,499 messages plus 230,143 attachments.
Search Exercises for the Bush Collection
The Enron documents and topics are well known in the ediscovery world because the corpus has been widely used for software demonstrations. While everyone has their favorite topic to explore, here were some of the topics (in the form of legal document requests) used in the 2009 and 2010 Legal Track programs.
- All documents or communications that describe, discuss, refer to, report on, or relate to the Company’s engagement in structured commodity transactions known as “prepay transactions.”
- All documents or communications that describe, discuss, refer to, report on, or relate to the Company’s engagement in transactions that the Company characterized as compliant with FAS 140 (or its predecessor FAS 125).
- All documents or communications that describe, discuss, refer to, report on, or relate to whether the Company had met, or could, would, or might meet its financial forecasts, models, projections, or plans at any time after January 1, 1999.
- All documents or communications that describe, discuss, refer to, report on, or relate to any intentions, plans, efforts, or activities involving the alteration, destruction, retention, lack of retention, deletion, or shredding of documents or other evidence, whether in hard‐copy or electronic form.
- All documents or communications that describe, discuss, refer to, report on, or relate to energy schedules and bids, including but not limited to, estimates, forecasts, descriptions, characterizations, analyses, evaluations, projections, plans, and reports on the volume(s) or geographic location(s) of energy loads.
- All documents or communications that describe, discuss, refer to, report on, or relate to any discussion(s), communication(s), or contact(s) with financial analyst(s), or with the firm(s) that employ them, regarding (i) the Company’s financial condition, (ii) analysts’ coverage of the Company and/or its financial condition, (iii) analysts’ rating of the Company’s stock, or (iv) the impact of an analyst’s coverage of the Company on the business relationship between the Company and the firm that employs the analyst.
- All documents or communications that describe, discuss, refer to, report on, or relate to fantasy football, gambling on football, and related activities, including but not limited to, football teams, football players, football games, football statistics, and football performance.
- All documents or communications that describe, discuss, refer to, report on, or relate to onshore or offshore oil and gas drilling or extraction activities, whether past, present or future, actual, anticipated, possible or potential, including, but not limited to, all business and other plans relating thereto, all anticipated revenues therefrom, and all risk calculations or risk management analyses in connection therewith.
- All documents or communications that describe, discuss, refer to, report on, or relate to actual, anticipated, possible or potential responses to oil and gas spills, blowouts or releases, or pipeline eruptions, whether past, present or future, including, but not limited to, any assessment, evaluation, remediation or repair activities, contingency plans and/or environmental disaster, recovery or clean-up efforts.
- All documents or communications that describe, discuss, refer to, report on, or relate to activities, plans or efforts (whether past, present or future) aimed, intended or directed at lobbying public or other officials regarding any actual, pending, anticipated, possible or potential legislation, including but not limited to, activities aimed, intended or directed at influencing or affecting any actual, pending, anticipated, possible or potential rule, regulation, standard, policy, law or amendment thereto.
- All documents or communications that describe, discuss, refer mto, report on, or relate to the design, development, operation, or marketing of enron online, or any other online service offered, provided, or used by the Company (or any of its subsidiaries, predecessors, or successors-in-interest), for the purchase, sale, trading, or exchange of financial or other instruments or products, including but not limited to, derivative instruments, commodities, futures, and swaps.
- All documents or communications that describe, discuss, refer to, report on, or relate to whether the purchase, sale, trading, or exchange of over-the-counter derivatives, or any other actual or contemplated financial instruments or products, is, was, would be, or will be legal or illegal, or permitted or prohibited, under any existing or proposed rule(s), regulation(s), law(s), standard(s), or other proscription(s), whether domestic or foreign.
- All documents or communications that describe, discuss, refer to, report on, or relate to the environmental impact of any activity or activities undertaken by the Company, including but not limited to, any measures taken to conform to, comply with, avoid, circumvent, or influence any existing or proposed rule(s), regulation(s), law(s), standard(s), or other proscription(s), such as those governing environmental emissions, spills, pollution, noise, and/or animal habitats.