Open-Source Deep Research Agents: Enterprise-Grade Speed, Security & Saving them Millions

Written by Vasanth Mohan | March 10, 2025

Agents have emerged as a major trend for AI in 2025. Specifically, Deep Research agents are one of the most notable examples that we have seen released from companies such as Google, OpenAI, Perplexity, X.AI, and just recently Manus. All these agents allow users to generate detailed reports on any topic of their choosing as long as the data is publicly accessible. However, there are three major critical barriers that enterprises face using these tools today:

Security: A large percentage of enterprise data must remain private on-premise and thus inaccessible to existing Deep Research tools.
Speed: Deep Research tools today are often very slow and cannot be used for quick iteration and urgent use cases.
Cost: Because deep research agents consume many more tokens, the costs for running these systems is increasing exponentially, especially with OpenAI.

Today SambaNova has answered the call to help enterprises conduct deep research 3X faster than the best GPU providers and more efficiently on their data. Together with CrewAI we have built and open-sourced a new deep research framework, now helping enterprises solve their biggest challenges:

Agents with Enterprise Data: As an open source solution, enterprises can build their own agents with access to their data and deploy these solutions securely on-premise with SambaNova hardware.
3X Faster Agentic Inference than GPUs: Using SambaNova Cloud, agents run 3X faster than any other GPU provider allowing for users to iterate much more efficiently while researching
Cost Savings: By using SambaNova Cloud and Open Source models like Llama 3.3 70B, enterprises can save millions per year switching from OpenAI models like GPT-4o.

Why Faster Inference for Deep Research

Deep Research, by definition, is meant to go in depth on a topic and come back to a user with a compelling report and analysis that would typically take days for a human to produce. Unlike traditional LLM chat applications, Deep Research and Agentic AI require 10X, sometimes up to 100X, the tokens to generate compelling responses. Instead of days of human research, current Deep Research implementations aim to generate that same compelling report in about the time it takes to get coffee at Starbucks.

So what does faster inference with SambaNova unlock? In short, iterative deep research at blazing speeds! With our Deep Research framework, you can generate reports in seconds because of SambaNova’s 10X faster speeds enabled by our RDU chips, not GPUs. This means users can do their work 10X more efficiently, saving time and costs. Imagine if you still had to wait minutes for a video to load instead of instantly streaming videos.

For Enterprises, time and efficiency are absolutely critical to allow their organizations to do more with less. We worked with one of the largest investment firms in North America, with hundreds of trading desk analysts conducting research every day. These analysts must perform many queries very quickly to keep up with the day’s market fluctuations. Performing Deep Researcher 10X faster means they have an edge against their competitors with more accurate information to make better real-time decisions.

Agentic Routing Improves Efficiency

As part of the framework, we have also included an Agentic Router that plans and routes requests to different agents that can deliver results with higher degrees of accuracy.

The framework by default comes with three agents: A general search agent, a Deep Research agent, and a Financial Analyst. Because the framework is open-sourced, anyone can add additional agents that are connected to their own data source as well.

To see how these agents work in action, let’s assume we’re an analyst on a trading desk who needs to generate a report on the latest market trends. I might start my day by doing a quick search on what happened to the market overnight with a query like “Summarize the latest market news about Amazon”. The query is routed to the general-purpose RAG agent with search tools that need only three queries to accurately find a real-time answer with about 1K total tokens.

After getting some basic information, an analyst would start to do more detailed research, asking a query such as “Generate a financial analysis of Amazon”. This will route the query to the Financial Analyst agent that is responsible for doing more in-depth research. This agent provides a lot more detail, using about 15 prompts to generate a more detailed answer with more than 20X the number of tokens required.

Finally, based on this analysis, a trading desk analyst would want to generate a comprehensive report summarizing and citing findings from various articles. Using the Deep Research Agent, the system compiles information from a wide range of sources and uses them to generate a final report for the analyst, which can be cleaned up and submitted for review. Generating the final report requires more than 50K tokens.

Even though tens of thousands of tokens were generated, each step is lightning fast, taking seconds rather than minutes to produce useful data and also allows for quick iterations along the way to creating the final report. Because the analyst is in the loop, he or she can ensure tokens are not wasted on generating inaccurate reports, saving time and money in the process. Having different agents involved further improves the efficiency and accuracy of the entire system, allowing the analyst to complete the task.

Math CFOs Love: Saving Millions per Year with Open Source Models

Because Deep Research Agents require so many more tokens and different models, enterprises need a more efficient way to deploy these solutions. The price of using models on closed source providers is constantly rising, while at the same time, open source models like Meta’s Llama and Deepseek R1 are matching, if not exceeding, the performance of closed models, thus offering less expensive alternatives. See the chart below for a comparison of prices of these models.

	Input Token Price ($/million tokens)	Output Token Price ($/million tokens)	Artificial Analysis Intelligence Ranking
Reasoning Models
DeepSeek R1 on SambaNova Cloud	$5	$7	60
OpenAI o1	$15	$60	62
Text Models
Meta Llama 3.3 70B on SambaNova Cloud	$0.6	$1.2	41
OpenAI GPT-4o	$2.5	$10	41

Assuming an employee at an enterprise performs just 20 Deep Research queries per day, with each query requiring an average of 20K output tokens, at a small scale of just 200 employees, that would generate around 80 million output tokens per day. This roughly matches the patterns seen on Open Router today, which processes 3 billion input tokens per day and 90 million output tokens on its Llama 3.3 70B model. These numbers imply potential savings of more than $1 million per year simply by using Llama 3.3 on SambaNova instead of OpenAI’s GPT-4o.

Try Faster Deep Research Demo for Free Powered by SambaNova Today

Deep Research on SambaNova is an exciting advancement for enterprises to solve many pain points associated with using AI today. To try this demo, log in to the web app and start researching with lightning-fast deep research today. To use the application, you will need free API Keys for some of the tools used in the demo, such as SambaNova Cloud, Exa, and Serper. For enterprises interested in deploying the solution themselves to use with their data, they can clone the Github repo and start integrating their agents.

Contributions to the project are highly encouraged to make this the best Open Source Deep Research tool on the planet. Code your genius, “Bring Your Own Agent” AI (BYOAI) style! Deploy your SambaNova-powered agent, share your breakthroughs online (tag us on social), and let your AI agents shine. Stay tuned for some upcoming contests and prizes for the best contributions to the project!

We are excited to see more enterprises adopt open source and build fast, affordable agents that deliver value for their customers and employees.

#AIGameOn

View full post