# GraphRAG or vector RAG: a decision guide.

The claim that travels fastest in RAG discussion is "GraphRAG beats vector RAG." It is the kind of statement that sounds like a benchmark result and gets repeated like one — a newer, more sophisticated architecture outperforming the plain baseline, the natural order of things. It is also, as an unconditional claim, false, and false in a way that costs real money to discover. GraphRAG does not beat vector RAG. It beats vector RAG on two specific classes of question, loses to it on the most common class, and costs three to four orders of magnitude more to build whatever the question. "Beats" is the wrong verb because it implies a single comparison with a single winner, and there is no single comparison here — there is a routing decision, made per query class.

Consider what happens when the unconditional claim meets a real corpus. A team takes "GraphRAG wins" at face value and builds it over three thousand pages of product documentation. The knowledge graph takes a long, expensive indexing run to construct. Then the query logs come in, and nine in ten queries are fact lookups — what is the default timeout, which endpoint returns this field, does this plan include that feature. On those queries the graph loses to the plain vector retrieval it replaced: the answer was always sitting in one chunk, and building a graph of entities and relationships to locate one chunk is elaborate machinery the question never asked for. The team did not pick a bad architecture. It picked an architecture without first looking at the query mix — and the query mix is the only thing that decides this. This essay is the guide for making that decision deliberately.

## What each index costs to build

Vector RAG has one indexing step: split the corpus into chunks, embed each chunk with a model, store the vectors. Retrieval is a nearest-neighbour lookup over those vectors. The index is cheap to build — embedding is a single forward pass per chunk — and cheap to keep current: when a document changes, you re-embed the handful of chunks it touched and the rest of the index is untouched. It is a flat, additive structure, and flat additive structures are inexpensive to maintain.

GraphRAG adds a heavier index, and the weight is worth being precise about because it is where the decision is actually settled. It runs the corpus through a model to extract entities and the relationships between them — a generation call over the text, far more expensive than an embedding pass — assembles those extractions into a knowledge graph, and then, in the Microsoft design that named the approach, runs another model pass to pre-compute a natural-language summary of each densely connected community in that graph. Retrieval can then do things the vector index cannot: traverse relationships from entity to entity, and read a community summary that describes a whole region of the corpus. That is a genuine reasoning structure, and it is genuinely useful for the right question. But it is bought with an extraction-and-summarization indexing stage that is the single most expensive part of the system — multiple model passes over the entire corpus before a single query is served. The cost section below puts the magnitude on it.

## What vanilla RAG is good at

For fact retrieval — the answer lives in one passage, and the task is to find that passage — vanilla vector RAG is not just adequate, it is often better than the graph, and the measurement says so directly. The GraphRAG-Bench study ([arXiv 2506.05690](https://arxiv.org/abs/2506.05690)) was built to test exactly this comparison and states the finding without hedging: "GraphRAG frequently underperforms vanilla RAG on many real-world tasks." On its simple fact-retrieval level the gap is wide — basic RAG scored 64.73% on a medical corpus where a Microsoft-style GraphRAG scored 38.63%, and on a novel corpus the two were 60.92% against 49.29%. The more elaborate system lost, and lost clearly, on the simplest and most common task.

The reason is structural, not a tuning artifact. When the answer sits in one chunk, the graph's entity extraction and relationship traversal are pure overhead — extra machinery interposed between the query and the chunk, and every extra step is an extra place to go wrong. The extraction can split or merge an entity incorrectly; the traversal can wander into a related but unhelpful region of the graph; the community summary can paraphrase away the exact detail the question needed. A precise retriever has none of those failure surfaces because it has none of those steps: it embeds the query, finds the nearest chunk, returns it. A fact lookup wants exactly that — a precise retriever, not a reasoning structure — and a fact lookup is the most common query class in most production corpora, which is why the team in the opening lost on nine queries out of ten.

## What the graph is actually for

The graph earns its keep on the two query classes vanilla RAG handles badly — and it handles them badly for the same structural reason it wins fact lookup, just inverted.

The first is **multi-hop reasoning** — questions whose answer is not in any one document but must be assembled by connecting facts across several. Which supplier ships the component used in the product that a given customer bought: no chunk contains that chain, and a nearest-neighbour retriever, which fetches chunks one at a time by similarity to the query, has no way to follow it. HippoRAG ([arXiv 2405.14831](https://arxiv.org/abs/2405.14831)), a lighter-weight graph approach, reported up to 20% gains over strong baselines on multi-hop QA, and GraphRAG-Bench's complex-reasoning level showed graph methods beating vanilla RAG by roughly 3 to 10 points. The graph helps here because the relationships it extracted and stored _are_ the hops the question needs — traversing an edge from one entity to a related one is the operation a multi-hop question is made of, and the graph has those edges precomputed.

The second is **global, corpus-level questions** — "what are the main themes here," "summarize the position taken across all of these documents." The original GraphRAG paper ([arXiv 2404.16130](https://arxiv.org/abs/2404.16130)) is explicit that conventional RAG "fails on global questions directed at an entire text corpus." The failure is not a weakness to be tuned away; it is definitional. Nearest-neighbour retrieval works by fetching the chunks most similar to the query, and a global question has no single chunk that answers it — the answer is a property of the whole corpus, distributed across all of it, so there is nothing for a similarity search to retrieve. Pre-computed community summaries are built precisely for that case: each one already condenses a region of the corpus, and answering a global question becomes reading and combining summaries rather than retrieving chunks. That is a class vanilla RAG cannot serve at all, not merely one it serves poorly.

```
  query class            vanilla vector RAG     knowledge-graph RAG
  ────────────────────   ───────────────────    ────────────────────
  fact lookup            ✓ wins  (one chunk)    ✗ overhead, can wander
  multi-hop reasoning    ✗ misses the hops      ✓ wins  (~3–10+ pts)
  global / "the themes"  ✗ no chunk to fetch    ✓ wins  (its design case)
```

## The cost is the decision variable

If GraphRAG only ever helped, the query mix would not matter — you would build the graph regardless and accept that fact lookups got no better. But it does not only help; it loses outright on fact lookup, and it costs enough that the loss is expensive rather than free. Microsoft's own LazyGraphRAG analysis put numbers on the gap, and they are not small-multiple numbers. A full GraphRAG index costs on the order of **1,000× more to build** than a vector-RAG-level index — the extraction and community-summarization passes against one embedding pass. And answering a global query with full GraphRAG runs **about 700× more expensive** than the lazy variant at comparable quality. The overhead is not confined to indexing, either: GraphRAG-Bench measured Microsoft-style global-search prompts reaching tens of thousands of tokens, against roughly 900 tokens for a vanilla RAG prompt — so the heavier system costs more on indexing and again on every query it serves.

That is the trade laid bare, and it should be read as a single sentence. You are not buying better retrieval across the board. You are buying better retrieval _on two query classes_ — multi-hop and global — at a three-to-four-order-of-magnitude increase in indexing cost and a large increase in per-query cost. Whether that is a good purchase depends entirely on how much of your traffic falls in those two classes. If multi-hop and global questions are most of what users ask, the graph pays for itself many times over. If they are a small fraction — as they were for the team in the opening, where they were essentially none of the traffic — the graph is an enormous bill for a benefit almost no query collects. The cost does not change the architecture's strengths; it changes whether those strengths are worth their price for your specific query mix.

## Not every graph carries the same price

"GraphRAG" names a family of systems, not one system, and the cost figures above are for the heaviest member of the family. Which variant you build moves the cost substantially:

- **Community-summary GraphRAG** — the original Microsoft design — is the most powerful on global, whole-corpus questions, because the community summaries are exactly built for them. It is also the most expensive to build and the most expensive to rebuild, since the summaries depend on the graph and the graph depends on the full extraction.
- **HippoRAG** is far lighter. Its authors report single-step retrieval that is comparable to or better than iterative retrieval methods while being 10–30× cheaper and 6–13× faster — a graph approach that captures much of the multi-hop benefit without the community-summarization stage.
- **LightRAG** ([arXiv 2410.05779](https://arxiv.org/abs/2410.05779)) attacks the maintenance side directly: it adds an incremental update algorithm, so a changed document updates the affected part of the graph instead of triggering a full rebuild.

The maintenance side is the half teams routinely forget when they price a graph. The build cost is a one-time number that shows up in a planning estimate; the maintenance cost is a recurring number that shows up only after the corpus starts changing. For a vector index, a changed document is a few re-embedded chunks — cheap, local, done. For a community-summary graph, a changed document can mean re-running entity extraction over the affected text and then re-summarizing every community that text belongs to, because a new relationship can shift a community's membership and therefore its summary. A graph is a living index, and on a corpus that changes often its upkeep is a real line item, not a footnote to the build.

## The honest answer is usually both

Because the strengths are genuinely disjoint — vanilla RAG owns fact lookup, the graph owns multi-hop and global, and neither encroaches on the other — the systematic comparison "RAG vs GraphRAG" ([arXiv 2502.11371](https://arxiv.org/abs/2502.11371)) concludes that the two have "distinct strengths" across task types and that strategies which "combine the strengths of both paradigms" deliver consistent improvements. That is the mature architecture, and it is neither graph nor vector but both, with a router in front. A vector index serves the fact-lookup majority cheaply and precisely. A graph index — the lightest variant that covers the query classes you actually have — serves the multi-hop and global minority. A classifier in front inspects each incoming query, decides which class it belongs to, and sends it to the index built for that class. It is the same per-class routing logic [the hybrid-search essay](/blog/hybrid-search-bm25/) applies to lexical and dense retrieval, lifted one level up: there the router chooses between two retrievers over the same corpus, here it chooses between two whole indexing strategies.

## The decision guide

| If your query mix is…                   | Build…                                                                  |
| --------------------------------------- | ----------------------------------------------------------------------- |
| Mostly fact lookup (the common case)    | Vector RAG. Do not build a graph.                                       |
| A real share of multi-hop reasoning     | Add a graph index — start with a light one (HippoRAG-class)             |
| Global "summarize the corpus" questions | Community-summary GraphRAG for that class specifically                  |
| Mixed, and the corpus changes often     | Vector RAG plus a light, incrementally-updatable graph; route per query |

- [ ] Pulled the actual query logs and classified them — fact lookup vs multi-hop vs global.
- [ ] Confirmed multi-hop and global questions are a real share of traffic before building any graph.
- [ ] Priced the indexing run _and_ the re-indexing cost when documents change.
- [ ] Chosen the lightest graph variant that covers the query classes you actually have.
- [ ] Routed by query class instead of replacing vector RAG wholesale.

## Reading list

- From Local to Global: A Graph RAG Approach to Query-Focused Summarization — the original GraphRAG, and why conventional RAG fails global questions: [arXiv 2404.16130](https://arxiv.org/abs/2404.16130)
- GraphRAG-Bench — measured proof that GraphRAG underperforms vanilla RAG on fact retrieval and wins on reasoning: [arXiv 2506.05690](https://arxiv.org/abs/2506.05690)
- LazyGraphRAG — Microsoft Research on the ~1,000× indexing and ~700× query cost gap: [microsoft.com](https://www.microsoft.com/en-us/research/blog/lazygraphrag-setting-a-new-standard-for-quality-and-cost/)
- RAG vs GraphRAG: A Systematic Evaluation — distinct strengths, and why combining beats choosing: [arXiv 2502.11371](https://arxiv.org/abs/2502.11371)
- HippoRAG — a lighter graph approach: up to 20% on multi-hop, 10–30× cheaper retrieval: [arXiv 2405.14831](https://arxiv.org/abs/2405.14831)
- LightRAG — graph RAG with incremental updates, attacking the maintenance cost: [arXiv 2410.05779](https://arxiv.org/abs/2410.05779)

A knowledge graph is not an upgrade to your retriever. It is a second retriever — good at the questions vanilla RAG fails, useless overhead on the questions it already answers, and far more expensive to keep current. Build it when those questions are in your logs, not when the claim is in your feed.