The measurement platform underneath our AI Search strategy.
RecommendationOS is the instrument we built because no existing tool measures the signals that actually matter for AI recommendations. Four modules that together explain why AI models do or don't mention your brand, based on repeated, statistically grounded measurements instead of one-off snapshots.
In development. Beta from summer 2026. Demos available for strategy clients and early access requests.
There are hundreds of tools tracking visibility in AI answers. We've used and tested many of them, and ultimately decided we had to build it ourselves. Not just because we like building things, but mainly because nearly all GEO tools share the same fundamental shortcoming. They are SEO tools for an AI era, and that doesn't work. They're built on assumptions that worked with classic SEO, but no longer hold today.
Traditional SEO is fundamentally deterministic in nature. Although personalisation and location play a role, Google leans on fixed algorithms, signals and an indexed snapshot of the web to rank the most relevant results for a specific query.
AI Search works fundamentally differently: it's probabilistic. Language models (LLMs) predict text on the spot based on probability distributions over tokens. They generate the most likely and ‘best' answer based on the prompt and the unique user context. Precisely because this process is dynamic, the answer to the exact same question can vary per run. There is simply no fixed ranking being retrieved from a database.
Almost all current tracking tools ignore this fundamental difference. They measure the output too sporadically to make statistically reliable statements about it. Periodically they send a list of artificial prompts to a model, without context, to then reverse-engineer the output. They measure visibility, sentiment and cited sources. While those are useful indicators, they remain statistically unreliable and don't explain why you are or aren't recommended. A score without context gives no lever for optimisation.
Anyone who doesn't understand how LLMs actually make decisions optimises with limited visibility.
That's why we built RecommendationOS: four modules that don't merely measure the outcome, but steer on the underlying signals that cause that outcome.
On four points we've explicitly chosen to work differently than the rest of the market.
Most tools are glorified prompt trackers. They send artificial prompts at a model, measure visibility and citations, and then come back with generic advice: get more active on Reddit, invest in trade press, do more PR, because those sources get cited a lot. The size of your dataset doesn't change that.
Compare it with this: someone notes the ingredients of 10,000 supermarket products and concludes that water and wheat appear most frequently. So if you want to make a successful product, use water and wheat.
It makes no sense. And yet this is exactly the logic behind much of the advice on AI visibility. A source gets cited a lot, so you have to be on it. It's correlation confused with causation, and it's the wrong reason to start with a platform.
We measure the signals that cause the recommendation: what AI knows about you, how your site is read, how your brand lives externally, which topics define your market. You can steer on these. Outputs follow from there.
AI works with entities and associations, not with static rankings. AI platforms recommend brands they understand and trust. It's about who you are, who you're there for, why you're valuable in specific situations, and when you're not.
Whether you're mentioned is therefore just one question. How you're framed, in what competitive landscape you're placed, with which associations, and whether those come from model memory or live retrieval, are equally important questions. Tools that only measure “am I visible” answer a limited part of what counts.
We measure the full reputation profile per model. Including the gap between what you want AI to say about you and what AI actually says. That gap is where the work sits. Our focus isn't on reverse-engineering prompts, but on building and strengthening your entity and the right associations.
An AI answer is not fixed reality. It's one chance outcome from a probability distribution. Tools that ask a prompt once and present that single outcome as truth are doing the same as rolling a die once and thinking they know the distribution.
We ask every question as often as needed to make statistically reliable statements, and report every measurement with a 95% confidence interval. A score with insufficient signal underneath is reported separately, not sold as advice.
Most tools dump data and dashboards on you. Pretty to look at, but also overwhelming, and the question “what do I need to improve to score better” is rarely answered. You figure it out yourself.
We flip that around. Every measurement comes with concrete recommendations: what needs to change, in what order, and why. Step by step. The marketing team or agency can act on it directly, without first pulling in a data analyst. The platform isn't called RecommendationOS for nothing.
RecommendationOS consists of four modules. Two measure where you stand, two show where you need to act. Together they form a complete picture of what you need to do to be recommended more often by AI.
What AI models actually know and say about your brand.
AI Reputation shows how ChatGPT, Claude, Gemini and Perplexity see your brand. Not loose mention counts, but a structured profile across ten dimensions, clustered in three groups: presence (are you named, and at what position?), quality (is what the model says correct, and how well does it back that up?), and context (in what competitive landscape does the model place you?).
Three things make this measurement different from a prompt tracker. Per dimension we separately measure what the model says on its own and what it says when it's allowed to look things up live. That difference decides whether your investment should go towards reputation building (model memory) or towards content and findability (live retrieval). On top of that, we put side by side what you want AI to say about you and what AI actually says: the gap is where the work sits. And we measure not just whether you're mentioned, but who appears in your place when you're not mentioned.
Every measurement is based on repeated queries and reported with a confidence interval: the score plus how certain we are of it (95% confidence interval). A score with insufficient signal underneath is reported separately, not sold as advice.
Result: a reputation profile per model, with confidence labels per dimension, built up over time so you see a trajectory rather than a snapshot.
All relevant topics for your market mapped, and where you stand per topic.
Topic Universe maps what your market talks about, not just what you write about yourself. Topics come from search data, AI extraction and external trend sources. They're then classified by search intent (where is the user in their journey?) and by coverage status: claimed by you or not, claimed by competitors or not.
The most interesting status is “opportunity”: topics with high strategic value where nobody in your market sits strongly yet. Per topic we measure your visibility against the main competitors, alongside trend direction (rising, stable, declining) and channel fit (which platforms suit this topic best). When topics are already covered by others, we point out exactly how you can differentiate with your knowledge, expertise and perspective. Because nobody, certainly not AI, is waiting for the next blog post with the same content as every competitor.
For the topics that matter strategically, we also measure how your brand surfaces in AI answers. Statistically reliable, so every question is asked as often as needed to establish confidence intervals. That costs significantly more in API spend than a one-off measurement, and we're happy to pay it. A cheap measurement that's wrong gives no lever. An expensive measurement that's right does.
Result: a topic overview where the opportunities are visible to anyone who scans it. It's the direct input for your content strategy and decides which topics you actively want to monitor in AI Reputation.
Whether your own site can be read, used and cited as evidence by AI.
Website Intelligence analyses your site sitewide and per page, across multiple fronts that each measure something different.
Each page type gets its own criteria. A homepage behaves differently from a product page or a blog post. Per signal you get, alongside the score, a quote from the page as evidence, a concrete recommendation, and a confidence label so it's clear which recommendations are rock-solid and which are still hypotheses.
On top of that, you can test any page on a specific search query. Enter the query the page should win, and we break it down into the underlying question dimensions, score how well the page covers each dimension, and give one concrete improvement per dimension.
Result: a list of page-level actions that's directly executable. Not a generic SEO report, but concrete interventions in the places where it makes a difference. As if an experienced strategist looks over your shoulder and tells you exactly what could be better, page by page.
Website Score
ModerateTechnical SEO, content quality and AI agent readability at a glance.
How your brand lives on the places beyond your own site where AI gets its picture from.
AI models don't form their picture of your brand out of nothing. They draw it from the open internet: encyclopaedic sources, review platforms, trade press, social platforms, conversation forums. External Footprint scans those external places for your brand and for the main competitors.
We measure the signals AI models weigh most heavily, broken into four Trust Tiers. From independent authority (Wikidata, Wikipedia, Common Crawl) and validated reviews (G2, Trustpilot, Google Reviews) through to trade press and public conversations (Reddit, YouTube, forums). Each tier weighs differently, and the weighting is adjustable per industry, because what gives authority in B2B software doesn't do the same in retail.
Result: an External Trust Score that shows on which platforms you stand stronger than the competition, and where the gaps are that still hold the recommendation back. This is the slow, sticky layer of AI Search, and at the same time the layer with the biggest leverage.
An AI answer is not fixed reality. It's one chance outcome from a probability distribution. That changes how you have to measure. Three choices make our measurements different from what the market does today.
Most GEO tools ask a prompt once and present that single outcome as the truth. That's the same as rolling a die once and thinking you know the distribution. We run every probe dozens of times, and adapt the count based on the variance in the answers. Only when the measurement stabilises does it count.
No score without a confidence interval. A measured 78 with a narrow interval says something different than the same 78 with a wide one. We report both, and use Wilson score intervals to be honest about how certain we are of a proportion. A score with a wide interval gets no advice attached, but is reported separately as ‘insufficient signal'.
Score with confidence interval
Robust (240 runs)
0
Minimal (18 runs)
0
Same score, different certainty.
AI models can talk about your brand in two ways. From their memory, built up during training. Or by looking things up live during a conversation. We separately measure what the model says on its own and what it says when it's allowed to look things up. The difference between those two decides whether your investment should go towards long-term reputation or short-term content. As far as we know, RecommendationOS is the only measurement platform that makes this distinction explicit per association.
Confidence labels
We don't sell a score as advice if there's not enough signal underneath. Every metric gets an explicit confidence label:
Tens to hundreds of runs, low variance. You can base strategy on this.
Enough runs for direction, not rock solid. Useful for hypothesis forming.
Insufficient signal. Reported separately, not sold as advice.
Not just scores and charts, but answers your marketing team can act on directly. Four examples from live engagements.
The reputation overview. How ChatGPT, Claude, Gemini and Perplexity see your brand. Per platform: do they know you, which associations do they make, and does that come from memory or from live retrieval.
The topic prioritisation. The topics your market talks about, ranked by strategic value. Per topic your visibility versus the main competitors, with confidence label.
The website analysis. Overall and per page, a report on content, visual quality, SEO foundation and agent readability. Per signal a quote as evidence, a concrete recommendation and a confidence label. Not a list of scores, but a list of interventions you can work through page by page.
Website Score
ModerateTechnical SEO, content quality and AI agent readability at a glance.
The prioritised action plan. Concrete actions, ranked on impact versus effort. Not ‘improve your SEO', but ‘add these three pages, restructure this section, change this CTA because it's not good enough now, claim this topic with this angle and that fits this platform in this way'. As concrete as possible, so you can act on it directly.
Prioritised action plan
Top priority
ROIPriority A
53Priority B
51Priority C
36Priority D
26Priority E
15Priority F
9RecommendationOS is built as part of our strategy engagements. It is not a standalone consumer tool.

RecommendationOS isn't entirely finished yet. This summer it will be. We're hard at work testing it with the first clients, shipping daily optimisations, and building it in the open. Early access partners help us decide what needs to be ready first.
Curious to see where we stand and how your brand could benefit? We're happy to give a demo of about 45 minutes in which we show RecommendationOS on your brand. You get a live reputation measurement across the four models, the top associations AI currently attaches to your brand with parametric and retrieval side by side, a first topic prioritisation for your market, and an honest assessment of where the biggest opportunities lie and whether RecommendationOS is the right instrument for your situation.
Or email Peter directly: peter@thinkagain.nl
Roadmap
Concept
First four modules specified.
Strategy pilots
Live measurements for early access partners.
Beta · summer 2026
Open to all strategy clients.
Wider access
Via selected partner agencies.
A few recurring questions about status, cost and how RecommendationOS relates to other tools in the market.
Prefer to call?
Peter: 06-52717644 · peter@thinkagain.nl