MongoDB 8.3 for AI Agents: Voyage AI, LangGraph & RAG Costs - Hero image

MongoDB 8.3 for AI Agents: Voyage AI, LangGraph & RAG Costs

On 7 May 2026 at MongoDB.local London, MongoDB announced a set of features that sound modest on paper but are genuinely significant: automated Voyage AI embeddings inside Atlas Vector Search, a generally available LangGraph.js long-term memory store, first-class integrations with the major agent frameworks, and a new MongoDB 8.3 release of the core database tuned for sub-100ms retrieval.

The headline framing from MongoDB’s Chief Product Officer for AI, Pablo Stern, is worth quoting directly:

“The infrastructure for autonomous AI doesn’t look like a smarter LLM. It is a data platform.”

That captures the strategic bet. According to MongoDB’s own numbers, 79% of enterprises are building AI agents and only 11% have one in production. The gap is not the model. The gap is data: getting the right context to the model fast enough, remembering what happened in previous conversations, and keeping all of that synchronised when the underlying business data changes.

This post walks through what was actually shipped, how it works, where it fits in a real-world Retrieval Augmented Generation (RAG) architecture, and what it costs to run.

Key takeaways

  • Automated Voyage AI embeddings turn embedding generation into a database concern, like an index. No more external embedding pipelines to build and monitor.
  • LangGraph.js long-term memory store is now generally available, giving JavaScript and TypeScript agents persistent memory across sessions, channels, and users.
  • Framework-agnostic: MongoDB is an official memory and state backend for LangGraph, CrewAI, Mastra, Spring AI, and Semantic Kernel.
  • MongoDB 8.3 delivers ~45% more reads and ~35% more writes than 8.0, with sub-100ms retrieval targets for agent workloads.
  • Real cost for a 50,000-claim-per-month insurance RAG workload: roughly $6,000/month all-in (Atlas + Vector Search + Voyage + LLM), compared with a £375,000-£500,000/month skilled human cost being offset.

The problem in one paragraph (for business leaders)

When your customer talks to an AI assistant on Monday and a different one on Wednesday, the second conversation has no memory of the first. When your AI agent answers a question by pulling data from your knowledge base, the search is only as good as how well that data has been “indexed” for AI to find. Today, most enterprises solve these two problems by gluing together half a dozen different tools: a database for source data, a separate vector database for AI search, an embedding API to translate text into numbers, a memory service, and a pipeline that keeps everything in sync when data changes. That stack is slow to build, expensive to run, and breaks frequently. MongoDB is making the case that all of it should live inside a single managed platform.

What MongoDB announced at MongoDB.local London 2026

Three things matter most.

1. Automated Voyage AI embeddings inside Vector Search (public preview)

Vector embeddings are how AI systems “understand” text. A sentence becomes a list of numbers (a vector), and similar meanings produce vectors that are close together in mathematical space. This is what makes Retrieval Augmented Generation (RAG) work: you embed your documents, embed the user’s question, and ask the database “show me the documents whose embeddings are closest to this question’s embedding.”

The traditional workflow looked like this:

  1. Pick an embedding model (OpenAI, Cohere, etc).
  2. Build a pipeline that calls the model’s API every time a document is inserted or updated.
  3. Store the resulting vector alongside the document in a separate vector database.
  4. Build a second pipeline for query-time embedding.
  5. Monitor it all, and handle failures, rate limits, and model version changes.

MongoDB has acquired Voyage AI, which consistently sits at the top of the Retrieval Embedding Benchmark, and folded it directly into Atlas. You now declare an autoEmbed field on a Vector Search index, and MongoDB handles the rest:

db.policies.createSearchIndex({
  name: "policy_vector_index",
  type: "vectorSearch",
  definition: {
    fields: [
      {
        type: "autoEmbed",
        path: "claim_summary",
        model: "voyage-4",
        modality: "text"
      },
      { type: "filter", path: "product_line" },
      { type: "filter", path: "region" }
    ]
  }
})

When you insert or update a document, Atlas calls Voyage AI behind the scenes, stores the vector, and keeps it in sync as the source field changes. The available models are voyage-4-large, voyage-4, voyage-4-lite, and voyage-code-3 for source code retrieval.

The point that’s easy to miss: this is not just convenience. Without auto-embedding, every database write requires an out-of-band call to your embedding pipeline, with its own failure modes. With it, embedding becomes a database concern, like an index. That changes the operational model entirely.

2. LangGraph.js Long-Term Memory Store (generally available)

LangGraph is the dominant framework for building stateful AI agents in JavaScript and Python. Up until this announcement, the JavaScript version had short-term memory (so an agent could remember what was said within a single conversation), but it had no standard way to remember across conversations. If a user came back the next day, the agent started fresh.

The new MongoDB Store for LangGraph gives JS and TypeScript developers a production-grade backend for cross-session memory. Memories are stored as JSON documents organised by namespace and key, with optional semantic search powered by Voyage AI embeddings, and TTL indexes for automatic cleanup of stale data.

The framework distinguishes four functional memory types, and the store supports all of them:

  • Episodic memory: specific events (“the customer raised a complaint about delivery timing on 12 March”).
  • Semantic memory: general facts learned over time (“this customer prefers email over phone”).
  • Procedural memory: rules and instructions (“always confirm policy number before discussing claims”).
  • Associative memory: relationships between entities (“this customer is linked to these three claims”).

The technical structure is straightforward. A memory has a namespace (logical grouping, often by user or topic), a key, a JSON value, and optionally a vector embedding of that value:

import { MongoDBStore } from "@langchain/langgraph-store-mongodb";

const store = new MongoDBStore({
  connectionString: process.env.MONGODB_URI,
  databaseName: "agent_memory",
  index: {
    embed: "voyage-4-lite",  // semantic recall on stored memories
    dims: 1024,
    fields: ["content"]
  }
});

await store.put(
  ["user:42", "preferences"],
  "communication",
  { channel: "email", language: "en-GB", tone: "formal" }
);

const memories = await store.search(
  ["user:42"],
  { query: "how does this customer like to be contacted?", limit: 3 }
);

That last call is the interesting one. It searches across all of user 42’s memories semantically — meaning the query doesn’t need to match the stored words, only the meaning. This is how an agent answers “what’s their preferred contact method?” without that exact phrase ever being written down.

3. First-class framework integrations

MongoDB is now an officially supported memory and state backend for LangGraph (Python and JS), CrewAI, Mastra, Spring AI, and Microsoft’s Semantic Kernel. The strategic move here is to be framework-agnostic. Whichever framework your team prefers, the data layer underneath is the same — which matters if your organisation has multiple teams making different choices, or if you expect to swap frameworks as the space matures.

This complements the broader trend toward standardised tool access through protocols like MCP — covered in detail in MCP: The Future of Agentic AI, or an Unnecessary Abstraction?.

4. MongoDB 8.3 itself

The underlying database release delivers 45% more reads, 35% more writes, 15% more ACID transactions, and 30% more complex operations than 8.0, with no application code changes. None of this is AI-specific, but sub-100ms retrieval matters because agents make many database calls per user message, and latency compounds.

RAG example: an AI claims triage assistant for a UK insurer

Let me make this concrete with a use case that maps to actual UK enterprise work: a claims triage assistant for a general insurer.

The business problem

The insurer receives ~50,000 claim notifications a month across motor, home, and travel. Each comes in through a different channel (web form, broker email, phone-to-text transcription, mobile app). A first-line claims handler currently spends 15-20 minutes per claim assessing severity, checking the policy wording, looking up the customer’s history, and routing to the right team. The business wants an AI assistant that drafts a triage summary and routing recommendation, leaving the handler to review and approve rather than to research.

What needs to happen on every claim

The agent needs to:

  1. Read the new claim notification.
  2. Find the relevant clauses in this specific policy document (RAG).
  3. Find similar past claims by this customer or for this risk type (RAG plus filters).
  4. Remember anything we’ve learned about this customer or this broker over time (long-term memory).
  5. Draft a summary and routing decision.

The architecture, mapped to the new MongoDB features

Three collections in a single Atlas cluster:

CollectionPurposeIndexed how
policiesFull policy wordings, broken into sectionsautoEmbed on section_text with voyage-4
claimsAll historical claim notifications and outcomesautoEmbed on claim_summary plus filters on customer_id, product_line, severity
agent_memoryLangGraph long-term memory storenamespace by customer_id and broker_id, voyage-4-lite embedding for semantic recall

A new claim arrives. The LangGraph agent does the following:

// 1. Retrieve relevant policy clauses
const policyContext = await db.policies.aggregate([
  {
    $vectorSearch: {
      index: "policy_vector_index",
      path: "section_text",
      query: { text: newClaim.description },
      k: 50,
      limit: 5,
      filter: { policy_number: newClaim.policy_number }
    }
  }
]).toArray();

// 2. Retrieve similar historical claims for the same customer
const customerHistory = await db.claims.aggregate([
  {
    $vectorSearch: {
      index: "claim_vector_index",
      path: "claim_summary",
      query: { text: newClaim.description },
      k: 100,
      limit: 10,
      filter: { customer_id: newClaim.customer_id }
    }
  }
]).toArray();

// 3. Pull anything we've previously learned about this customer
const customerMemory = await store.search(
  ["customer:" + newClaim.customer_id],
  { query: newClaim.description, limit: 5 }
);

// 4. Compose prompt and call the LLM
const triage = await llm.invoke({
  systemPrompt: TRIAGE_SYSTEM_PROMPT,
  context: { policyContext, customerHistory, customerMemory },
  task: newClaim
});

// 5. Persist a new memory if the agent learned something useful
if (triage.notable_observation) {
  await store.put(
    ["customer:" + newClaim.customer_id, "observations"],
    crypto.randomUUID(),
    { text: triage.notable_observation, claim_id: newClaim._id }
  );
}

What used to be three separate systems (a document database, a vector database, a memory service) is one Atlas connection, accessed through the same query language anyone familiar with MongoDB already knows — see MongoDB Aggregation Pipelines: The Power You Might Not Be Using for a deeper dive into the $vectorSearch family of stages. The embeddings update themselves when a policy is reworded or a claim is resolved. The agent’s memory persists across handlers, channels, and weeks of elapsed time.

Why this specifically matters for an insurer

  • Policy wordings change. When legal updates an exclusion clause, the embedding regenerates automatically. The old vector database approach would have left stale vectors in place until someone rebuilt the index.
  • Claim summaries are written in everyday English. Vector search finds “I bumped into a car park bollard” against a policy clause about “impact with stationary objects” without anyone having to tag either.
  • Brokers have preferences. Some want phone calls, some want emails, some never want anyone contacting their client directly. Long-term memory lets the agent remember these preferences across thousands of claims without retraining the model.

MongoDB Atlas AI cost analysis

This is where the technical case meets the budget case. The full cost has four components: the database, the embedding API, optional dedicated search nodes, and the LLM calls themselves. I’ll cost a realistic mid-market deployment serving the insurance scenario above.

Component 1: Voyage AI embedding costs

Voyage’s pricing is token-based, with the first 200 million tokens free per account.

ModelCost per million tokensUse case
voyage-4-large$0.12Highest accuracy, complex semantics
voyage-4$0.06Recommended default for general text
voyage-4-lite$0.02Cost-sensitive, high-volume
voyage-code-3$0.06Code retrieval

The Batch API gives a further 33% discount.

For the insurance example, assume:

  • Policy corpus: 50,000 policy documents × ~5,000 tokens each = 250M tokens. One-off indexing: $15 with voyage-4, or free using the new-account allowance.
  • New claims: 50,000/month × ~500 tokens each = 25M tokens/month. Ongoing indexing: $1.50/month with voyage-4.
  • Query embedding: ~3 retrieval calls per claim × 50,000 claims × ~200 tokens = 30M tokens/month. $1.80/month.
  • Memory writes: ~2 observations per claim × 50,000 × 100 tokens = 10M tokens/month with voyage-4-lite. $0.20/month.

Total Voyage AI: roughly $4/month at this volume. The first month is essentially free thanks to the 200M-token allowance.

Component 2: MongoDB Atlas cluster

For 50,000 claims/month plus historical data, a production deployment of perhaps:

  • M30 dedicated cluster (8 GB RAM, 2 vCPU, 40 GB+ storage): ~$388/month per region.
  • Three-region replication for resilience: ~$1,164/month.

For larger insurers with hundreds of thousands of claims, you’d scale to M50 or M60 (~$1,440 to $2,844/month per region).

Vector Search workloads can run on the main cluster, but isolating them on dedicated search nodes prevents query spikes from affecting transactional performance. Atlas pricing for Vector Search nodes:

Node tierRAMHourlyMonthly
S20 (high CPU)4 GB$0.12$86
S30 (high CPU)8 GB$0.24$173
S40 (high CPU)16 GB$0.48$346
S50 (high CPU)32 GB$0.99$713

For the insurance example, an S30 or S40 in two regions is realistic: roughly $350-700/month.

Component 4: LLM API costs

Out of MongoDB’s scope, but worth listing. Reasoning-quality LLM calls for triage at ~5,000 input tokens and 1,000 output tokens, called once per claim:

  • 50,000 claims × $0.015 input + $0.075 output (approximate Claude Sonnet rates) = roughly $4,500/month.

Total cost of ownership, monthly

ComponentMonthly cost
Atlas M30 × 3 regions$1,164
Vector Search nodes (S30 × 2)$350
Voyage AI embeddings$4
LLM API calls$4,500
Total~$6,018

For an insurer processing 50,000 claims/month where the existing process consumes 15-20 minutes per claim of skilled human time at £30-40/hour fully loaded, the human cost being offset is in the region of £375,000-£500,000 per month. Even if AI assistance only saves 30% of that time, the data infrastructure is paying for itself many times over.

MongoDB Atlas vs DIY vector stack: cost comparison

To make this concrete, here’s what a do-it-yourself stack looks like for the same workload:

ComponentDIY stackMonthly cost
Document databasePostgreSQL on AWS RDS, multi-AZ$700
Vector databasePinecone or Weaviate (managed)$400-$1,200
Embedding APIOpenAI text-embedding-3-large$20 (similar volume)
Memory storeRedis cluster or custom Postgres$300
Sync pipelineEngineering time (amortised)$2,000-$5,000
Total$3,420-$7,220

The infrastructure cost is in the same range. The real difference is engineering time. MongoDB’s claim, supported by their case studies, is that the work to build and maintain the sync pipeline between document store, vector store, and memory store typically consumes around one engineering quarter. That’s £25,000-£50,000 of build cost that doesn’t appear on the AWS bill, plus ongoing maintenance.

Where MongoDB’s bet might not pay off

To be balanced about this:

  • If you’ve already invested in a separate vector database (Pinecone, Weaviate, Qdrant) and it’s working, the migration cost is real and the benefits are incremental.
  • If your team is deep in the Python ecosystem and not using Atlas, the Python long-term memory store is mature but isn’t getting all of the new shiny first.
  • If your embeddings need to come from a specific model your business has standardised on (some regulated industries require this), you can still bring your own embeddings — but you give up the auto-sync benefit.
  • The autoEmbed index type is still in public preview. Pricing model and edge-case behaviour can change before GA.

When MongoDB’s AI stack is the right architecture

The strongest case for the MongoDB stack is when all of the following are true:

  1. You’re building agents (not just one-shot RAG) and need cross-session memory.
  2. Your source data changes (so static embeddings drift out of accuracy).
  3. You want to avoid running and maintaining a separate vector database and a separate memory service.
  4. You’re already on Atlas, or open to moving onto it.

If you’re just embedding a static knowledge base once and serving it to a chatbot, a simpler stack will work fine. The new features are aimed at the production agent problem, not the prototype-RAG problem.

For organisations weighing multi-model alternatives, MarkLogic 12: Has the Multi-Model Database Made Itself Relevant for RAG? covers the closest enterprise rival, and Retool Agents: Democratising AI covers a different layer of the stack — building the agent UI on top of whichever backend you choose.

Frequently asked questions

What is MongoDB autoEmbed?

autoEmbed is a new index field type in MongoDB Atlas Vector Search (public preview as of May 2026) that automatically generates Voyage AI vector embeddings for a text field whenever a document is inserted or updated. It removes the need to build an external embedding pipeline.

How much do Voyage AI embeddings cost?

Voyage 4 embedding models on MongoDB are priced per million tokens: voyage-4-large at $0.12, voyage-4 at $0.06, voyage-4-lite at $0.02, and voyage-code-3 at $0.06. The first 200 million tokens are free per account, and the Batch API gives a further 33% discount.

Does the MongoDB long-term memory store replace a vector database?

For agent workloads, often yes. The MongoDB Store for LangGraph combines persistent JSON memory, namespace-based organisation, semantic search via Voyage embeddings, and TTL-based cleanup in a single backend. If your existing vector database is purely serving a separate, static knowledge base, it can still coexist — but the case for keeping it weakens once Atlas is already in the stack.

Is MongoDB 8.3 generally available?

MongoDB 8.3 is generally available as of May 2026. The LangGraph.js long-term memory store is also GA. The automated Voyage AI embeddings feature (autoEmbed) is in public preview, with pricing and edge-case behaviour subject to change before its own GA.

Which agent frameworks does MongoDB officially support?

LangGraph (Python and JavaScript/TypeScript), CrewAI, Mastra, Spring AI, and Microsoft Semantic Kernel. MongoDB serves as the memory and state backend underneath each.

Can I use my own embedding model instead of Voyage AI?

Yes. You can still write your own vectors to a Vector Search index and skip autoEmbed entirely. You give up the automatic sync benefit, but it lets you keep using a model you’ve already standardised on — which matters for some regulated industries.

Verdict: should you adopt MongoDB’s AI stack?

The interesting strategic observation in this announcement is the framing. MongoDB is not pitching itself as a faster vector database or a cheaper embedding provider, even though it is now both. It’s pitching itself as the data layer that the agentic AI era will be built on, with the database, the embeddings, the memory, and the framework integrations all being parts of one platform.

Whether that platform bet pays off depends on whether enough enterprises agree that the bottleneck really is data rather than the model. The 50,000-claim insurer in the example above doesn’t need a smarter LLM to make this work. It needs the policy wording to be findable, the claim history to be recallable, and the broker preferences to be remembered. All three of those are data problems, and they’re the problems the new stack is built to solve.

If your team is in the 11% that has agents in production, none of this is news. If you’re in the 79% trying to get there, the announcement is worth taking seriously. If you’d like a second opinion on whether this is the right architecture for your specific workload, get in touch.

Back to Blog