Search
Close this search box.

FAQ's Regarding Merlin's GenAI Integration with LLMs

DiscoveryPartner is unique in integrating multiple Large Language models into its architecture. We found early on that certain models did an excellent job at summarizing documents, equaling the larger models in quality but were substantially faster and more cost-effective than the larger LLMs. In contrast, we recommend the larger, more intelligent models for more complex kinds of work such as synthesizing across hundreds of summarized documents and creating complex reports, chronologies, and analyses. 

That said, our recommendations change from time to time as new LLMs are released, which is another advantage of our multi-LLM architecture. At present, we support LLMs from OpenAI and Anthropic, employing Claude 3 and Turbo. When new and improved models are released, we test and promote them to DiscoveryPartner when we are satisfied that they are equal or better to the existing models we use. In almost all cases the newer models are faster and less expensive than their predecessors, which provide an additional benefit to our clients. 

Our DiscoveryPartner architecture allows administrators and users to choose between models for summarizing and reporting. The choices can be made at any time. We can also allow administrators to choose defaults for each function or to limit the choices allowed. 

Here is DiscoveryPartner’s interface for choosing different LLMs:

We give the user flexibility to choose from a range of models for different tasks. We recommend what we believe are the best and most cost effective models for summarizing documents and for synthesizing and reporting on their contents.

Here are the choices currently supported:

At present DiscoveryPartner sends 100 documents or document segments to the LLM at a time. Why? Because our research shows that 100 segments, which translates roughly to 30,000 tokens/words is optimal for the best quality answers from even the most powerful LLMs. Even though many LLM models can take in more than 30,000 tokens, our researchers found that the quality of the LLM answer decreased and the amount of detail and links to relevant documents decreased.

Rather than settle for lesser quality answers, we added a unique “Extend” feature that allows the user to dig deeper into a results set, by finding the next 100 most relevant segments for summarization and reporting. The LLM is asked to review the additional documents and then report on new information that is relevant to the topic question that was extended.

This process can continue until no new information is found. We created a first of its kind search report that allows the user to quickly view the effectiveness of the searches over the course of the inquiry. It looks like this:

DiscoveryPartner was built on the premise that users should have full control over the documents selected and prioritized for sending to the LLM. Users start by selecting documents based on keyword search, algorithmic search or any other criteria desired. Documents of interest can be copied to an Analyze folder for submission to chosen LLMs for summarizing and reporting.

Users can copy as many documents as desired into one or more Analyze folders and can create sub-folders for different document sets. When larger amounts of documents are foldered, Discovery runs keyword search, semantic search, and a powerful classifier, to find the most relevant documents for your specific topic inquiries. Our search capabilities are unique in the market and are calculated to find and promote relevant documents for LLM review far more quickly and efficiently than most simple chatbot systems. We would be happy to discuss our patent-pending techniques if you would like to learn more.

Our DiscoveryPartner platform accesses LLMs via designated API, passing credentials through an encrypted tunnel. We can support custom as well as publicly available LLMs as may be requested by our clients so long as they can be accessed via a secure API connection.

Fine tuning of private LLMs is typically done by our clients and their GenAI engineers.

The answer would depend on the model and the client’s needs. In general, we believe that the client should make the decision on which LLMs to use for which purposes. We study the market and test those models we believe provide the best combination of efficiency, functionality and cost effectiveness. In that regard we pay close attention to comparative ratings and available research about the models’ capabilities from reliable sources.

In making our recommendations we consider all of the above factors and do our own testing against known sources to ensure that the models offer advantages over those currently supported. Advantages include speed, quality of results and cost effectiveness. Our goal is to always provide a range of choices for different functions and to make sure clients have the best possible choices for the LLMs they employ on the site.

Yes, all information can be copied or downloaded to Word, Excel or CSV formats. There are no restrictions to this functionality.

Yes, DiscoveryPartner can handle all of these types of data either separately or in combination. We break longer files into sections for search and summarization, allowing us to effectively normalize data across any text-based file type. Thus a user could place transcripts, SMS data, and other formats into one folder for LLM analysis or group different combinations of these files into separate folders.

Multimedia content is a broad category. At present our system is optimized for text. With the emerging capabilities of the most advanced LLMs we will soon be able to directly analyze non-text media such as video and audio files. We would be happy to discuss other multimedia formats with you as well.

DiscoveryPartner doesn’t currently have these limits but we are considering adding these to the site. We do provide real-time information about the number of tokens the user has processed, including the type and purpose of the use and the model used. Thus, our users can see on a daily basis the volume of information that is being sent to the LLM per case and per month.

This is a difficult question to answer without some discussion. As a general matter, we believe we offer the fastest summarization and reporting capabilities in the industry, probably by a factor of 100x. GenAI scalability is an amorphous concept. An LLM is not the same as a traditional search or TAR engine. It is not designed to sift through millions of documents in milliseconds like ElasticSearch or our lightning-fast machine algorithms. Rather, it is designed to quickly summarize and process a discrete volume of data and give reports.

That said, there are no limits on the volume of data that can be analyzed other than time and costs.

We typically include options for different providers and different models for both summarization and reporting as a hedge against this problem. If the vendor’s LLM is offline for a period, the user can quickly switch to another model from the same vendor or one from another provider. Currently we support OpenAI and Anthropic’s models and can source them from different locations.

We take several measures to protect the privacy and confidentiality of the information we send to LLMs for analysis. First, we only work with large, well-funded and reputable LLM providers such as OpenAI, Microsoft, AWS and Anthropic. It is important to work with companies that understand the need for confidentiality, have the requisite security practices and accompanying ISO, SOC and HIPAA certifications and can be expected to protect the data they analyze and host.

Second, we only access these LLMs through hardened, secure APIs and locked down commercial licenses with strict and clear provisions that include the following requirements:

  • External Service Providers do not use the input prompts and output responses to train their Generative AI, machine learning models,
  • The External Service Providers agree to keep input prompts and output responses confidential;
  • The External Service Providers will not store input prompts and output responses on their servers for any longer than reasonably necessary to provide the services; and
  • The External Service Providers do not claim ownership of the input prompts and output responses and agree that Merlin, on behalf of our Clients, retain the ownership of all User-generated content and all system-generated output resulting from your use of the site.

Lastly, we constantly monitor the market to ensure that there are no reported security instances across our providers that might give us concern regarding the security and confidentiality of client data.

We provide answers and information directly from each LLM that the client chooses to use. Neither we nor anyone else has any access or view into the internal workings of the model.

However, we do instruct the models to base their answers solely on the information provided to them which is controlled by our clients. Going further, we instruct the models to always base summaries and answers on specific documents and to provide links to those documents so that clients can confirm the information provided.

Thus, a client can go from an answer to a document summary to the linked text of the underlying document and then to the document itself with a series of linked clicks. In that way the answers can be verified and audited, which provides the best possible level of transparency and explainability.

LLM updates and training is in the hands of the companies that provide them. With our unique, multi-LLM system, we can test and update to newer models quickly when we conclude they provide better, faster or more cost-effective capabilities for our clients.

DiscoveryPartner doesn’t currently have these limits but we are considering adding these to the site. We do provide real-time information about the number of tokens the user has processed, including the type and purpose of the use and the model used. Thus, our users can see on a daily basis the volume of information that is being sent to the LLM per case and per month.

Not for those LLMs provided under commercial license. However, we can integrate a client’s private model which can be fine-tuned or trained on client specific data through a secure API if the client wants to make it available to our system.

We can’t speculate on the capabilities and abilities of newer models or how they might respond to similar questions but we can say that we instruct the models to limit the variability of their answers and to base them on the data provided. Thus, while we can’t say that the model will answer the question the same way, much like we wouldn’t expect two associates to write the same memo, we do see strong consistency in successive answers from the same model.

We can’t speculate on the capabilities and abilities of newer models or how they might respond to similar questions but we can say that we instruct the models to limit the variability of their answers and to base them on the data provided. Thus, while we can’t say that the model will answer the question the same way, much like we wouldn’t expect two associates to write the same memo, we do see strong consistency in successive answers from the same model.

We could try but the models and model capabilities change seemingly by the week. For example, Anthropic’s Claude 3 comes in three models: Haiku, Sonnet and Opus. We can say that Haiku is the fastest and least expensive model. Opus is the most intelligent and therefore expensive model (and the slowest). Sonnet is in the middle, a strong LLM that is faster and cheaper than Opus.

We provide different models to give our clients choices based on speed, intelligence and cost so they can make the best possible decisions about analyzing their data.

Scroll to Top