Article Information
Category: Development
Published: June 20, 2026
Author: Chris de Gruijter
Reading Time: 20 min
Tags

SEO at Scale: My Keyword Research Methodology and the Systems That Run It
Published: June 20, 2026
SEO at scale is not about one keyword. It's about discovering thousands of them, figuring out which ones actually matter, and turning that pile of search demand into a content pipeline without drowning in spreadsheets. The hard part was never finding keywords — any tool spits out tens of thousands. The hard part is classification: knowing which 200 of those 15,000 terms translate into revenue, which 75 topic groups deserve a page, and which 8,000 are retailer noise you should never look at again.
This is the methodology I use to do exactly that — both for agency clients through Webfluentia and on freelance engagements outside the agency — and the engineering that makes it repeatable. I'll walk through how raw discovery becomes a classified, scored dataset, how that dataset feeds both content generation and the optimisation of existing pages across whole clusters, and the technology layer underneath it all: MCP servers, AI agents, multi-tenant dashboards with client-scoped auth, and the regression contracts that stop an SEO change from quietly breaking a site.
The real problem: classification, not discovery
Run a single broad seed through any keyword tool and you get a wall of terms — volume, CPC, competition, all of it. On a recent freelance engagement, one bathroom-renovation client started from a first-pass discovery of 2,631 keywords and, after a proper enrichment pass, grew to 9,514 unique keywords. A concrete-products client landed at 19,376. That volume is trivial to produce. It is also useless until you can answer one question for every single row: what is this keyword worth to this business, and what do I do about it?
That is the entire game. A keyword research dataset is only as good as its classification. If you can't separate the 992 Tier-A revenue terms from the 1,726 out-of-scope watchlist terms, you don't have a strategy — you have a CSV. So the methodology I'm about to describe spends almost no effort on discovery and almost all of it on turning discovery into decisions. The output is never "here are 15,000 keywords." It is "here are 79 scored content opportunities, ranked, each with a recommended page type and a coverage status."
I've written before about running SEO as an autonomous operation in building an autonomous SEO system with AI agents. This post is the layer beneath that — the data methodology that feeds the agents, and the systems that make it reproducible across clients and quarters.
Stage 1 — Discovery from multiple sources
Every research cycle starts with a seed list of 10–20 terms drawn from the client's actual product or service catalogue — their own site menu, not my guesses. I split seeds across five intents deliberately:
- Service-intent seeds — the bread-and-butter terms a customer types when they're ready to act
- Component seeds — the parts and sub-products inside the service domain
- Local seeds — city and region variants for local-service businesses
- A brand seed — the client's own name, to capture navigational demand
- Competitor seeds — three to six direct competitors, which is where the gap signal comes from
Each seed fans out through a keyword data API. On the freelance side that's a SeRanking-backed MCP server; at the agency it's an Ubersuggest-backed one wired into the wider pipeline (more on both later). The crucial move is crawling competitor domains, not just expanding seeds. When you pull every keyword a direct competitor ranks for, you discover the adjacent-trade and whole-category terms that pure seed expansion never surfaces. On that bathroom client, competitor-domain crawls grew the "lateral" niche by 52× versus the seed-only first pass — those are exactly the terms growing competitors' traffic that the client wasn't capturing.
I target 5,000–10,000 raw keywords per client and dump every API response to a raw JSON directory as an audit trail. Discovery is cheap and reproducible: the same script runs every quarter, so I can diff against last cycle and detect emerging terms.
Deduping and normalisation
A classifier reads every raw JSON file, auto-detecting each source's schema, and merges them. The merge rule is the part people get wrong, so here it is explicitly: dedupe by lowercased keyword, and on collision the highest non-null volume row wins as the base. Then, for every blank field on that base row, backfill from the loser. So a Google-bid row can hand a CPC to a SeRanking row that lacked one, and vice versa. You end up with the richest possible row per keyword instead of throwing away half your metadata on every dedupe.
# Merge: highest-volume row is the base; backfill blanks from collisions.
def merge(rows):
by_kw = {}
for r in rows:
kw = r["keyword"].strip().lower()
base = by_kw.get(kw)
if base is None or (r.get("volume") or 0) > (base.get("volume") or 0):
# new base — but keep already-known fields from the old one
if base:
for f in ("cpc", "competition", "difficulty", "sr_intent"):
r.setdefault(f, base.get(f))
by_kw[kw] = r
else:
for f in ("cpc", "competition", "difficulty", "sr_intent"):
if not base.get(f):
base[f] = r.get(f)
return list(by_kw.values())Stage 2 — The classification framework
Once you have one clean row per keyword, every row gets tagged on four independent axes. These four axes are the spine of the entire methodology, and the vocabulary is deliberately locked — changing a niche or intent label would break every downstream pivot and score.
Axis 1 — Niche (the onion model)
I model the market as concentric rings of audience proximity to the client's core service. From the centre out: CORE (direct service intent, the bullseye), SPECIFIC (component-level searches inside the service domain), LATERAL (adjacent trades and whole-home problems), and GENERIC (broad category mentions with no real signal). Outside the rings sit two navigational buckets: BRAND (the client's own name) and COMPETITOR (everyone else's).
┌────────────────────────────┐
│ GENERIC │ broad category, no signal
│ ┌────────────────────┐ │
│ │ LATERAL │ │ adjacent trades / problems
│ │ ┌────────────┐ │ │
│ │ │ SPECIFIC │ │ │ components in the domain
│ │ │ ┌──────┐ │ │ │
│ │ │ │ CORE │ │ │ │ service intent — highest value
│ │ │ └──────┘ │ │ │
│ │ └────────────┘ │ │
│ └────────────────────┘ │
│ COMPETITOR │ BRAND │ navigational
└────────────────────────────┘Niche is the single most useful axis because it answers "how in-market is this searcher?" before you look at any number. A 40,000/month GENERIC head term is worth less than a 90/month CORE service query. The onion encodes that priority structurally.
Axis 2 — Cluster (topical sub-groups)
Inside each niche, keywords roll up into clusters — the exact topical sub-group a page would target. For a bathroom client, SPECIFIC splits into clusters like Showers, Baths, Vanities, Tiles, Lighting & Mirrors, Taps & Fittings, Toilets, Accessibility. Competitor clusters stay as the brand name. Clusters are what eventually become pages, so the granularity matters: too coarse and one page can't serve the demand, too fine and you're writing forty near-duplicate articles.
Axis 3 — Intent
Four locked buckets: Informational (learn, ideas, how-to), Commercial (research before buying — costs, comparisons, "best", reviews), Transactional (ready-to-act verbs: quote, install, hire, buy), and Navigational (brand-name searches). I prefer the data provider's native intent codes where they exist — on the bathroom client, 80% of rows carried native intent — and fall back to rule-based regex only for rows the API didn't label. One override always wins: any brand or competitor match forces Navigational, because providers routinely mislabel branded queries as "local."
The intent distribution itself is a sanity check. Commercial should dominate (50–80%) for most service businesses; if Transactional is suspiciously high you've probably let an over-greedy regex misfire. On that bathroom run, the split came out Commercial 50%, Informational 35%, Navigational 11%, Transactional 3% — a healthy shape for a research-heavy purchase.
Axis 4 — Relevance Tier
The planning axis. Every keyword gets a tier A through E: A direct revenue/service intent, B strong in-market component, C supporting informational, D competitor/conquest, E out-of-scope watchlist. Tier E is where the magic of not doing work happens — broad-retailer noise (the big-box DIY chains) is auto-marked E so it never enters the planning shortlist. No manual cleanup. On one client, the Tier-E split pulled 8,581 of 19,431 rows straight out of the planning file while keeping them in a separate excluded CSV for audit.
The classifier itself is a short, ordered set of regex stages — brand first, then competitors, then CORE/SPECIFIC/LATERAL patterns where order matters (narrow before broad, first match wins), then intent fallback. Adding a new client is mostly swapping the brand regex and competitor list and rewriting the pattern lists for the new domain; the intent logic is language-driven and stays put.
# Ordered, first-match-wins. Narrow patterns must precede broad ones.
CORE_PATTERNS = [
(r"\b(renovat|verbouw|laten\s+plaats)\w*", "Complete renovation"),
(r"\boffert|prijs|kosten\b", "Pricing"),
(r"\bspecialist|installateur|aannemer\b", "Installer"),
]
def classify_niche(kw):
if BRAND_RX.search(kw): return "BRAND", "Brand terms"
for name, rx in COMPETITORS.items():
if rx.search(kw): return "COMPETITOR", name
for rx, cluster in CORE_PATTERNS:
if re.search(rx, kw): return "CORE", cluster
for rx, cluster in SPECIFIC_PATTERNS:
if re.search(rx, kw): return "SPECIFIC", cluster
for rx, cluster in LATERAL_PATTERNS:
if re.search(rx, kw): return "LATERAL", cluster
return "GENERIC", "Broad"Stage 3 — Opportunity scoring
Individual keywords are the wrong unit to plan against. Nobody builds a page per keyword — you build a page per topic. So I group related keywords into topic opportunities and score the group, not the row. The score is a weighted blend of signals:
Topic Score =
25% Relevance Fit (niche proximity)
20% Local Fit (matches client service area)
15% Search Demand (log-scaled, so head terms don't crush niches)
15% Conversion Intent (commercial/transactional weighting)
5% CPC Proxy (capped, so one pricey term can't dominate)
10% Competition Opportunity (organic difficulty + paid competition, 50/50)
5% Existing Coverage Gap
5% Competitive Gap (competitor in top-10, client absent)A few design decisions in there are load-bearing. Search demand is log-scaled on purpose — without it, a single 40,000/month head term flattens every good 200/month local service query, and you end up recommending the unwinnable terms. CPC influence is capped so one expensive keyword can't hijack the roadmap; for an SEO-first tool, paid-bid data is a noisy proxy for organic value, not the main event. And competition is split: organic difficulty drives organic strategy, paid competition drives paid strategy, and conflating them is a conceptual bug I removed early.
The weights aren't fixed. I run configurable weight profiles per client type. A national e-commerce client has no use for the 20% Local Fit weight, so that dead weight gets redistributed onto Relevance and Intent. Switching one client from the default local-service profile to a national-ecom profile took its count of High-priority topics from 0 to 22 — same data, correct weighting, and suddenly the editorial team has a clear hit list instead of a flat sheet.
Every score also decomposes. The opportunities file carries one sub-score column per component, each the weighted contribution, so the columns literally sum to the total. That transparency matters: when someone questions why a topic ranked where it did, I can point at the exact component that drove it instead of waving at a black box.
Stage 4 — From classified data to content
The whole point of classification is that it feeds two distinct content workflows — generating new pages and optimising existing ones across entire clusters — and the recommended-page-type field is what routes each opportunity to the right one.
New page generation
Topic groups with no strong existing page and a High/Medium priority become net-new content. The opportunities file already carries the focus keyword, secondary keywords, intent mix, SERP features, and recommended page type, which is exactly the brief a generator needs. I export a topic queue as JSON and hand it to a content generator with per-platform adapters — a markdown adapter for review-first workflows, a CMS-draft adapter for direct-to-CMS drafts. Client-site content always publishes as a draft, never auto-live; these are editorial workflows, not volume plays.
{
"topic_id": "client-planters-care-2026Q3",
"topic_label": "Planters & containers (supporting blog)",
"focus_keyword": "concrete planter maintenance",
"secondary_keywords": ["sealing a concrete planter", "..."],
"cluster": "Planters & containers",
"niche": "SPECIFIC",
"intent_mix": { "Informational": 0.7, "Commercial": 0.3 },
"serp_features": ["people_also_ask", "images"],
"platform_page_type": "blog article",
"boost_url": "https://example.test/collections/planters",
"priority": "High",
"opportunity_score": 73.7
}There's a feedback loop too. After a topic publishes, the generator writes back a publish state, and the next research cycle reads it and drops published topics out of the queue. Without that loop, every quarter re-proposes things you already wrote — the loop is what keeps the pipeline from spinning its wheels.
Optimising existing pages across clusters
The more valuable half is rarely new pages — it's lifting pages you already have. Because every keyword is tagged with its cluster, I can take an underperforming page and enrich it with the Tier-A/B keywords from the same cluster that it isn't yet capturing. This is where the agency's SEO CLI earns its keep: a bulk-optimize pass reads an opportunity manifest and drives targeted rank-lift across a whole cluster of pages at once, rather than me editing them one by one. The competitive-gap signal is the trigger — keywords where a competitor sits in the top 10 and the client doesn't are the precise places where a cluster-wide enrichment pass moves the needle.
SERP-feature tags refine the angle. A topic tagged with a People-Also-Ask box gets an FAQ block; a featured-snippet opportunity gets an answer-box-shaped intro; a shopping-carousel signal tells me the intent is transactional even if the provider didn't label it so. These are deterministic editorial tags derived from the SERP feature codes, so the content brief carries them automatically.
The technology layer — MCP servers and AI agents
None of this scales by hand. The methodology runs on a stack of MCP (Model Context Protocol) servers that expose keyword data as callable tools, AI agents that orchestrate the workflows, and a CLI that consumes the output offline. I've covered the general pattern in the programmatic marketer-engineer's arsenal; here's how it applies specifically to keyword work.
MCP servers as the data plane
On freelance engagements I drive a SeRanking MCP server. It exposes the discovery primitives as tools the agent calls directly — related keywords, domain keywords, similar keywords, long-tail, questions, and a bulk metrics enricher that takes up to 5,000 keywords per call. A typical call shape:
// MCP tool call — topical expansion from a seed
{
"tool": "DATA_getRelatedKeywords",
"arguments": {
"keyword": "bathroom renovation",
"source": "nl",
"limit": 1000
}
}
// → { "keywords": [
// { "keyword": "...", "volume": 1900, "cpc": 2.1,
// "competition": 0.74, "difficulty": 61,
// "intents": ["C"], "serp_features": ["people_also_ask"] }, ...
// ] }The agent fans these calls out in parallel — one per seed, one per competitor domain, plus a question pull for the top clusters — and writes raw JSON for the classifier. Because the MCP server is a thin, typed boundary over the data provider, the same agent logic works whether the underlying provider is SeRanking, Ubersuggest, or a keyword-planner API; only the tool names and field mappings change. I keep a roundup of the MCP servers I actually run in useful MCP server configs for developers.
The agency agent — pooling four sources into one snapshot
At the agency, the star of the stack is a single orchestrator agent that owns the SEO data layer end-to-end. It pools four sources into one coherent snapshot per client: an Ubersuggest MCP for discovery and research, Google Search Console for ground truth on what the client already ranks for, a keyword-planner source for volume validation, and a competitor scraper. It's the only writer to the storage layer — everything else is a pure consumer — which means there's exactly one place where data shape and freshness are guaranteed.
That single-writer design is what makes "optimise across a whole cluster" safe. When the agent enriches keyword tracking, it fills difficulty, CPC, competition, and SERP features on existing rows additively — never destructively — and stamps every row with its source (ubersuggest, gsc, and so on) so the CLI can filter by provenance. The agent never deletes; it only ever writes today's snapshot. History accumulates.
Hybrid storage — Supabase plus a per-client cache
Every snapshot lands in two sinks at once. The first is Supabase — the historical truth, normalised for query and time-series, and the source for the dashboard. The second is a per-client JSON cache committed nowhere but written to each client repo at seo-data/<YYYY-MM-DD>/, denormalised so the offline CLI can read it without a single network call. A latest/ symlink points at the most recent snapshot; the CLI only ever reads latest/.
The contract between those two sides is enforced, not assumed. Both sinks share a canonical set of Zod schemas, and the writer validates every file against its schema before writing — malformed data is logged and skipped, never persisted. The CLI validates on read and fails loudly if the schema version drifted, which is the signal that the cache is stale relative to the schema package. The database is long-term truth; the cache is a per-client convenience layer holding only the most recent snapshot, fully regeneratable from Supabase if a machine gets wiped.
<client-repo>/seo-data/
latest/ → symlink to most-recent YYYY-MM-DD/
2026-06-13/
_meta.json (required — sources used, file list)
domain-overview.json (required)
keywords-tracked.json (required — GSC truth + research enrichment)
keyword-opportunities.json [optional]
competitors.json [optional]
content-ideas.json [optional]
audit-issues.json [optional]Bringing it together in deployed dashboards
A CSV nobody opens helps nobody. The output of all this surfaces in deployed, authenticated dashboards so clients and account managers can browse opportunities without touching a terminal. I run two.
On the freelance side, a Python dashboard sits over the SeRanking MCP and renders one client at a time across pages for Executive Overview, Rankings, Technical, Backlinks, Content, Competitors, AI Search, and an Action Plan. It deploys from a container to a self-hosted PaaS, with a persistent cache volume so the provider's unit-spending fetch cache survives redeploys — without it, every redeploy re-spends API units on the first fetch. Access is gated at the edge: the live URL sits behind Cloudflare Access with an allowlist, so only specific emails ever load it.
At the agency it's a proper multi-tenant dashboard — covered in depth in building a unified marketing dashboard and, for the paid side, the Google Ads client dashboard. The part that matters for SEO data is the auth model: every client's SEO data lives in the same Supabase project, isolated by row-level security keyed to a tenant anchor, with a per-user access table deciding who can read which client. The agent writes with service-role access; the dashboard reads under RLS. I wrote up that exact pattern in multi-tenant SaaS with Supabase row-level security — client-scoped access isn't a feature you bolt on, it's the thing the whole storage layer is designed around.
Not breaking the site while you optimise it
Here's the failure mode nobody talks about: the most dangerous moment for a client site is an SEO change. You're editing framework config, adding redirects, touching meta and tracking — and a stale editor buffer or a misaimed automated edit can silently revert a locale, drop a prerender route, or strip the analytics bootstrap. The build still passes. You find out weeks later when traffic craters.
My defence is a Site Integrity Contract — a declarative invariants file per project, checked by a tiny pure-Node runner. The contract asserts the things that must never silently disappear: required locales and locale files, prerender roots and redirects, runtime flags, framework modules, the GTM container ID and consent-default ordering, and the CSP wildcards that keep analytics domains allowlisted. It's gated at three layers — the build script, the deploy script, and a pre-push hook — plus a CI check on every PR, so a regression has to defeat four independent gates to ship.
Two companion pre-commit hooks catch the rest. A mass-delete guard blocks any commit where a single file loses 50-plus lines (the classic stale-buffer revert), and a tracking-files guard blocks commits that touch framework config, headers, or the root layout unless you explicitly opt in. The originating incident was real: a commit titled "small fixes for worker" silently removed 119 lines from a Nuxt config, including prerender routes and an LCP preload. The contract caught two of those at pre-push; the mass-delete guard would have caught all of them at commit time. When you're running content and optimisation changes at volume, this is the safety net that lets you move fast without betting the client's traffic on it every push.
There's a deeper rule baked into the contract: analytics always wins over performance. A "performance optimization" that defers the GTM bootstrap or tightens a CSP to drop an analytics domain is forbidden — losing tracking fidelity is more expensive than a slower Lighthouse score, and the contract encodes that so a well-meaning optimisation pass can't quietly violate it.
SEO as an engineering discipline
Strip away the specific tools and the methodology is just software engineering applied to search. Discovery is data ingestion. Classification is a deterministic, version-controlled function. Scoring is a weighted model with configurable, decomposable parameters. Storage is a contract with a single writer and validated schemas. Content generation is a queue with a feedback loop. And the whole thing is wrapped in regression tests so a change can't silently break production.
That framing is what lets one person run keyword research across many clients without the quality degrading into guesswork. The classifier runs the same way every quarter, so growth is measurable and drift is detectable. The scoring is transparent, so recommendations are defensible. The storage is reproducible, so nothing is lost. And the contracts mean the optimisation work — the part that actually moves rankings — can run at volume without the constant low-grade fear that you just broke something. SEO at scale isn't a content treadmill. It's a system, and systems are engineered.
Frequently Asked Questions
How many keywords should a keyword research dataset contain?
For a typical service or e-commerce business I target 5,000–10,000 raw keywords per client, which usually settles around 10,000–20,000 unique terms after merging discovery sources. But raw count is the wrong metric to optimise. The deliverable that matters is the classified output — usually a few hundred Tier-A revenue keywords and roughly 75 scored content opportunities. The rest is context and audit trail, not the plan.
What is the difference between a niche and a cluster in keyword classification?
Niche measures audience proximity to the core service — CORE, SPECIFIC, LATERAL, GENERIC, plus BRAND and COMPETITOR for navigational searches. It answers "how in-market is this searcher?" Cluster is the topical sub-group inside a niche — the exact topic a single page would target, like Showers or Pricing. Niche sets priority; cluster sets what page gets built.
Why classify keywords by search intent?
Intent decides what kind of content wins the click and whether a ranking converts. Informational queries need guides, Commercial queries need comparison and pricing content, Transactional queries need a page that lets the user act, and Navigational queries are brand searches. Ranking a transactional buyer on an informational blog post wastes the demand, so intent routing is what connects a keyword to the right page type.
How does competitor analysis improve keyword discovery?
Crawling the keywords a direct competitor already ranks for surfaces adjacent-trade and whole-category terms that pure seed expansion never finds. On one engagement, competitor-domain crawls grew the lateral niche by 52× versus a seed-only first pass. It also produces the competitive-gap signal — keywords where a competitor sits in the top 10 and the client is absent — which is the highest-value trigger for optimisation work.
What is an MCP server and why use one for SEO?
An MCP (Model Context Protocol) server exposes a data source as typed tools an AI agent can call directly. For SEO it turns keyword APIs like SeRanking or Ubersuggest into callable primitives — related keywords, domain keywords, bulk metrics enrichment. Because the MCP server is a thin typed boundary, the same agent orchestration works across different underlying providers; only the tool names and field mappings change.
How do you keep SEO data isolated per client in a multi-tenant dashboard?
All clients live in one Supabase project, isolated by row-level security keyed to a tenant anchor, with a per-user access table that decides who can read which client. The data-writer agent uses service-role access; the dashboard reads strictly under RLS. The dashboards themselves are also gated at the edge — for example behind Cloudflare Access with an email allowlist — so client-scoped access is enforced both in the database and at the network boundary.
How do you avoid breaking a client site while doing SEO work?
I use a Site Integrity Contract: a declarative invariants file per project that asserts the things which must never silently disappear — locales, prerender routes, runtime flags, tracking bootstrap, CSP wildcards. It is enforced at four layers (build script, deploy script, pre-push hook, and CI), backed by pre-commit guards that block large silent deletions and unauthorised edits to tracking-sensitive files. A regression has to defeat every gate to ship.