FAQ's Regarding Merlin's GenAI Integration with LLMs

1. What Large Language Models (LLMs) are used in your platform, and what specific version or model architechture are they based on? Are there multiple LLMs available?

DiscoveryPartner is unique in integrating multiple Large Language models into its architecture. We found early on that certain models did an excellent job at summarizing documents, equaling the larger models in quality but were substantially faster and more cost-effective than the larger LLMs. In contrast, we recommend the larger, more intelligent models for more complex kinds of work such as synthesizing across hundreds of summarized documents and creating complex reports, chronologies, and analyses.

That said, our recommendations change from time to time as new LLMs are released, which is another advantage of our multi-LLM architecture. At present, we support LLMs from OpenAI and Anthropic, employing Claude 3 and Turbo. When new and improved models are released, we test and promote them to DiscoveryPartner when we are satisfied that they are equal or better to the existing models we use. In almost all cases the newer models are faster and less expensive than their predecessors, which provide an additional benefit to our clients.

2. If more than one LLM is in use do users have the ability to choose or switch between different LLMs within the platform, and if so, what is the process for doing so?

Our DiscoveryPartner architecture allows administrators and users to choose between models for summarizing and reporting. The choices can be made at any time. We can also allow administrators to choose defaults for each function or to limit the choices allowed.

Here is DiscoveryPartner’s interface for choosing different LLMs:

3. Are the same LLMs used for different tasks such as summarization, analysis, and question answering, or are there dedicated LLMs for specific tasks? If dedicated LLMs are used, how are they optimized for their respective tasks?

We give the user flexibility to choose from a range of models for different tasks. We recommend what we believe are the best and most cost effective models for summarizing documents and for synthesizing and reporting on their contents.

Here are the choices currently supported:

4. What is the maximum number of documents or document segments that can be sent to the LLM per prompt or query? Can users adjust this limit based on their requirements, and what are the implications of varying this number in terms of performance, accuracy, or cost?

At present DiscoveryPartner sends 100 documents or document segments to the LLM at a time. Why? Because our research shows that 100 segments, which translates roughly to 30,000 tokens/words is optimal for the best quality answers from even the most powerful LLMs. Even though many LLM models can take in more than 30,000 tokens, our researchers found that the quality of the LLM answer decreased and the amount of detail and links to relevant documents decreased.

Rather than settle for lesser quality answers, we added a unique “Extend” feature that allows the user to dig deeper into a results set, by finding the next 100 most relevant segments for summarization and reporting. The LLM is asked to review the additional documents and then report on new information that is relevant to the topic question that was extended.

This process can continue until no new information is found. We created a first of its kind search report that allows the user to quickly view the effectiveness of the searches over the course of the inquiry. It looks like this:

5. How are the documents or document segments selected and prioritized for sending to the LLM? Is there a specific algorithm or methodology used for this selection process, and can users customize or override the selection criteria?

DiscoveryPartner was built on the premise that users should have full control over the documents selected and prioritized for sending to the LLM. Users start by selecting documents based on keyword search, algorithmic search or any other criteria desired. Documents of interest can be copied to an Analyze folder for submission to chosen LLMs for summarizing and reporting.

Users can copy as many documents as desired into one or more Analyze folders and can create sub-folders for different document sets. When larger amounts of documents are foldered, Discovery runs keyword search, semantic search, and a powerful classifier, to find the most relevant documents for your specific topic inquiries. Our search capabilities are unique in the market and are calculated to find and promote relevant documents for LLM review far more quickly and efficiently than most simple chatbot systems. We would be happy to discuss our patent-pending techniques if you would like to learn more.

6. Does the platform support integrating custom or proprietary LLMs trained on a client's specific data or domain? If so, what is the process for integrating and fine-tuning these custom LLMs, and how is data privacy and security ensured during this process?

Our DiscoveryPartner platform accesses LLMs via designated API, passing credentials through an encrypted tunnel. We can support custom as well as publicly available LLMs as may be requested by our clients so long as they can be accessed via a secure API connection.

Fine tuning of private LLMs is typically done by our clients and their GenAI engineers.

7. If a client wants to integrate a commercial LLM that is not currently available on the platform, what is the process for evaluating and onboarding new LLMs? What factors are considered (e.g., model performance, licensing, security, etc.), and what is the typical timeline and cost associated with this process?

The answer would depend on the model and the client’s needs. In general, we believe that the client should make the decision on which LLMs to use for which purposes. We study the market and test those models we believe provide the best combination of efficiency, functionality and cost effectiveness. In that regard we pay close attention to comparative ratings and available research about the models’ capabilities from reliable sources.

In making our recommendations we consider all of the above factors and do our own testing against known sources to ensure that the models offer advantages over those currently supported. Advantages include speed, quality of results and cost effectiveness. Our goal is to always provide a range of choices for different functions and to make sure clients have the best possible choices for the LLMs they employ on the site.

8. Can users download or export the summaries, analyses, or other outputs generated by the LLMs within the platform? Are there any restrictions or limitations on exporting this data?

Yes, all information can be copied or downloaded to Word, Excel or CSV formats. There are no restrictions to this functionality.

9. Can the platform's LLMs effectively handle and process different types of data formats and sources, such as transcripts, short messaging data, medical records, spreadsheets, or multimedia content? If so, what specific capabilities or optimizations are in place for each data type?

Yes, DiscoveryPartner can handle all of these types of data either separately or in combination. We break longer files into sections for search and summarization, allowing us to effectively normalize data across any text-based file type. Thus a user could place transcripts, SMS data, and other formats into one folder for LLM analysis or group different combinations of these files into separate folders.

Multimedia content is a broad category. At present our system is optimized for text. With the emerging capabilities of the most advanced LLMs we will soon be able to directly analyze non-text media such as video and audio files. We would be happy to discuss other multimedia formats with you as well.

10. Is there a limit on the number of tokens or prompts that can be sent to the LLMs per case, per month, or per subscription tier? If so, what are these limits, and how are they enforced or managed within the platform and what is the outcome if the limit is breached?

DiscoveryPartner doesn’t currently have these limits but we are considering adding these to the site. We do provide real-time information about the number of tokens the user has processed, including the type and purpose of the use and the model used. Thus, our users can see on a daily basis the volume of information that is being sent to the LLM per case and per month.

11. What is the performance, scalability and process of the Generative AI model when dealing with large volumes of data?

This is a difficult question to answer without some discussion. As a general matter, we believe we offer the fastest summarization and reporting capabilities in the industry, probably by a factor of 100x. GenAI scalability is an amorphous concept. An LLM is not the same as a traditional search or TAR engine. It is not designed to sift through millions of documents in milliseconds like ElasticSearch or our lightning-fast machine algorithms. Rather, it is designed to quickly summarize and process a discrete volume of data and give reports.

That said, there are no limits on the volume of data that can be analyzed other than time and costs.

12. What process do you have in place to mitigate against a loss/reduction of service from your LLM provider during a case?

We typically include options for different providers and different models for both summarization and reporting as a hedge against this problem. If the vendor’s LLM is offline for a period, the user can quickly switch to another model from the same vendor or one from another provider. Currently we support OpenAI and Anthropic’s models and can source them from different locations.

13. What measures are in place to ensure the privacy and security of sensitive data when using LLMs, particularly in regulated industries like healthcare or finance?

We take several measures to protect the privacy and confidentiality of the information we send to LLMs for analysis. First, we only work with large, well-funded and reputable LLM providers such as OpenAI, Microsoft, AWS and Anthropic. It is important to work with companies that understand the need for confidentiality, have the requisite security practices and accompanying ISO, SOC and HIPAA certifications and can be expected to protect the data they analyze and host.

Second, we only access these LLMs through hardened, secure APIs and locked down commercial licenses with strict and clear provisions that include the following requirements:

External Service Providers do not use the input prompts and output responses to train their Generative AI, machine learning models,
The External Service Providers agree to keep input prompts and output responses confidential;
The External Service Providers will not store input prompts and output responses on their servers for any longer than reasonably necessary to provide the services; and
The External Service Providers do not claim ownership of the input prompts and output responses and agree that Merlin, on behalf of our Clients, retain the ownership of all User-generated content and all system-generated output resulting from your use of the site.

Lastly, we constantly monitor the market to ensure that there are no reported security instances across our providers that might give us concern regarding the security and confidentiality of client data.

14. What level of transparency and explainability is provided into the LLM's decision-making process, and how can users audit or verify the outputs generated by the LLM?

We provide answers and information directly from each LLM that the client chooses to use. Neither we nor anyone else has any access or view into the internal workings of the model.

However, we do instruct the models to base their answers solely on the information provided to them which is controlled by our clients. Going further, we instruct the models to always base summaries and answers on specific documents and to provide links to those documents so that clients can confirm the information provided.

Thus, a client can go from an answer to a document summary to the linked text of the underlying document and then to the document itself with a series of linked clicks. In that way the answers can be verified and audited, which provides the best possible level of transparency and explainability.

15. How frequently are the LLMs updated or retrained, and what is the process for deploying these updates within the platform to ensure continuity and consistency for ongoing cases or projects?

LLM updates and training is in the hands of the companies that provide them. With our unique, multi-LLM system, we can test and update to newer models quickly when we conclude they provide better, faster or more cost-effective capabilities for our clients.

16. What is the process for putting Token usage limits in place for each topic/case, how do we cap usage? Can we assign a set number of Tokens per individual user per case?

17. Can the Generative AI model be fine-tuned or retrained on a client's specific data or use case? If so, what is the process, and how does it ensure data privacy and security?

Not for those LLMs provided under commercial license. However, we can integrate a client’s private model which can be fine-tuned or trained on client specific data through a secure API if the client wants to make it available to our system.

18. How does the platform leverage multi-modal capabilities (e.g. combining text, images, audio, etc.) of Generative AI models, if applicable?

We can’t speculate on the capabilities and abilities of newer models or how they might respond to similar questions but we can say that we instruct the models to limit the variability of their answers and to base them on the data provided. Thus, while we can’t say that the model will answer the question the same way, much like we wouldn’t expect two associates to write the same memo, we do see strong consistency in successive answers from the same model.

19. Can you get the same results from the same data set from an LLM today and the next upgrade/version?

20. Can you give me an LLM comparison, capabilities/Speed/Cost/Output?

We could try but the models and model capabilities change seemingly by the week. For example, Anthropic’s Claude 3 comes in three models: Haiku, Sonnet and Opus. We can say that Haiku is the fastest and least expensive model. Opus is the most intelligent and therefore expensive model (and the slowest). Sonnet is in the middle, a strong LLM that is faster and cheaper than Opus.

We provide different models to give our clients choices based on speed, intelligence and cost so they can make the best possible decisions about analyzing their data.

Meet Merlin

Meet DiscoveryPartner

Latest News

FAQ's Regarding Merlin's GenAI Integration with LLMs

Contact Us to Learn More

Quick Links

Software