Deep Dive

Why ChatGPT doesn't recommend your brand, and what you can do about it

by Peter Minkjan · April 2026

Prompt tracking is a thermometer, not a compass. AI Search demands positioning, entity, and evidence, not just more prompts.

Published: 7 April 2026

Explore with AI

TL;DR

Prompt tracking shows whether you are mentioned, not why. AI Search is not a faster Google: answers are unstable, dashboards suffer from selection and survivor bias, and real decisions happen in full conversations, not isolated test prompts. When a competitor is recommended, it almost always comes down to five layers: technical readability, a sharp entity with consistent associations, internal evidence (comparisons and scenarios), valuable external presence, and external proof from others. Start there, in that order; use trackers afterwards as feedback, not as strategy.

Introduction

A marketer I spoke to was proud to say they were "doing GEO" now. Their agency had bought a prompt tracking tool, and they had been busy with it for a client: a list of eighty prompts, fired daily at ChatGPT, Perplexity, and Gemini. They proudly showed me the dashboard with the visibility scores.

"We score quite well on some prompts, thankfully, but on some specific prompts and topics we do really badly. So we need to create more content about that."

I asked: do you also know why you score badly there?

"Content, right?" they said, at a loss.

Silence.

That is the problem with prompt tracking as a strategy. It measures whether you are mentioned, not why. It shows you an outcome but tells you nothing about the cause. As long as you do not know that, you are optimising in the dark, busy with dashboards while the real question stays unanswered.

Prompt tracking measures whether you are mentioned, not why. It shows an outcome, not a cause.

In this article I explain why prompt tracking is not a solid foundation for your AI Search strategy, what really happens when ChatGPT or other AI platforms recommend your competitor and not you, and what you should start with instead if you want AI tools to recommend you more often.

AI Search is not classic search in a new skin

Before we get to why prompt tracking falls short, it is worth pausing on what AI Search or GEO (Generative Engine Optimization) actually is. Most of the measurement problems I describe below stem from one misconception: we treat ChatGPT and Perplexity like a faster Google, while they serve fundamentally different (search) behaviour.

Where Google gives links, AI Search gives answers. An average AI prompt runs about 60 words versus 3.4 words for a Google query. Users do not type a keyword; they issue a task. Because AI often gives a better, more complete, and faster answer than Google, traffic on broad informational queries has dropped sharply. Many small and large sites are hit and now receive significantly less traffic than they used to. Research by Growtika, for example, shows that ten major tech publishers together lost 65 million monthly Google visits in one year, a drop of 58%. The reason: those queries are now answered directly by AI. There is simply less reason to click through to sites for the answer.

Where Google gives links, AI Search gives answers.

Google returns a list of links to click through; AI Search often gives one answer inside the interface.

But people do not use AI for everything. As we explain in the Search Intent Framework and detail in Platform Scores, search happens everywhere, but everywhere differently. AI is strong for deep information, orientation, analysis, and generation. Especially in the orientation phase, when people discover new brands, compare options, and decide what is best for their situation, you want to be visible as a brand, not as a link or citation, but as a recommendation. AI Search overlaps heavily with classic SEO, but success takes a different approach. And that approach does not start with tracking prompts.

SEO started operational, became tactical. AI Search demands brand strategy.

SEO has always been about understanding the system in which you want to be visible. You see that clearly when you look at how it evolved:

Era	Level	Goal	Input	Mechanism
Classic SEO	Operational	Rank pages	Keywords and links	Reverse-engineer algorithms
Mature SEO	Tactical	Match intent	Search intent	Reverse-engineer SERPs
AI Search (GEO)	Strategic	Get recommended	Entities and associations	Build and strengthen brand associations

AI Search is the first discipline that truly demands strategic choices: who are you really for, what makes you different, what story do you tell consistently across channels. Those questions belong with the CMO, not (only) the SEO specialist. Most organisations still treat it as an operational task and staff it that way. That is exactly why they get stuck.

There is an immediate pitfall. The reflex is to approach AI Search like classic SEO: study the system through its output. Test prompts, analyse responses, spot patterns. Building and strengthening brand associations does not work primarily by studying the model through its output. It works by building what the model can know about you, through your site, external sources, and the consistency of your story. Prompt tracking can later tell you whether that works. Not whether you got the fundamentals right.

Yet that is what marketers are now doing at scale: tracking prompts and their output. Hence column names like "AI visibility per prompt" instead of "position in Google". For anyone with an SEO background, that feels familiar. You define a set of questions around your brand and category, fire them periodically at AI platforms, analyse answers for brand mentions and website citations, track which sites get cited, and put it all in a dashboard with visibility scores and share of voice.

That familiarity is exactly the problem. Keyword tracking worked because results for the same query stayed largely stable for days or weeks. You could earn a position and then defend it. AI answers work fundamentally differently, but the measurement logic we layered on top pretends they do not.

You see the same pattern in circulating advice. "For SEO you optimise a page for a keyword. For GEO you optimise a paragraph for a prompt." Or: "Where SEO was about driving traffic, now you want as many citations as possible." It sounds concrete, but it is the same logic in a new coat. Google is replaced by ChatGPT, the page by a paragraph, the keyword by a prompt, clicks by citations. What is missing is the question that comes first: why would a model want to recommend your paragraph at all? That is not a matter of optimisation. It is a matter of authority, positioning, and trust: things you do not fix with technical tweaks to content alone.

And there is something rarely said out loud in SEO: the same traffic obsession that haunted SEO for decades is showing up in AI Search. Screenshots of rising lines in Ahrefs and Google Search Console were trophies on social media. Now it is dashboards with citation rates.

SEO was never about traffic; not then, not now, and certainly not in the future.

And it really does not matter how often your content is cited. The goal is to be recommended when your audience makes a choice: you want to influence purchase decisions and drive revenue. Don't let anyone tell you it was ever any different.

Correlation is not causation

Suppose a GEO tool shows you were mentioned this week for the prompt "best tool for [your category]" and not next week. What do you actually know for sure?

Only one thing: that for that synthetic prompt, in that test setup, at that moment, the output changed.

You do not know why you were mentioned at all. Which reviews, comparison articles, and discussions caused it. Whether real users phrase the question that way. What context plays into their real conversations with AI. Whether you are actually recommended consistently on other, more concrete questions.

Yet in practice that correlation is often sold as causation. "After our GEO optimisation, AI visibility rose X% in Y weeks." While much of that visibility in reality comes from years of SEO, off-site reputation, and brand building: things that existed long before the GEO tool.

The dashboard tells you that you are mentioned. Not that a specific intervention caused it. The question nobody wants to answer: was that page cited because it had expert quotes, statistics, and structured layout? Or did it have those traits because it was already authoritative content that performed everywhere, and AI Search simply picks that up too?

A rising line in a dashboard does not prove your intervention caused the change.

The dashboard tells you that you are mentioned, not that a specific intervention caused it.

What dashboards show, and what they hide

AI visibility dashboards have built-in selection bias: they only show what is already visible in AI output at that moment. That is why you keep seeing the same subset of brands in some markets: players with strong SEO, lots of external mentions, and a clear category position. Not because they have the best GEO strategy, but because they are rich brands with a matching history. You cannot catch up to that by copying their content strategy, and that is probably not the best move anyway.

Infographic 'The Prompt Tracker Illusion': the symptom is a dashboard of eighty synthetic prompts; the reality is that trackers measure whether you are mentioned, not why. — The prompt tracker illusion: dashboards show whether you are mentioned, not why.

With top lists of cited sources, a related but slightly different effect appears: survivor bias. GEO research keeps pointing to the same sources as "most cited by AI": Reddit, Wikipedia, YouTube, major publishers. Logical: they have dominated the web for years and survived the authority battles of the past. But the conclusion drawn ("you must be on Reddit", "you must get into this specific top list") is again correlation sold as causation. Those sources are cited often because they built authority and breadth of content over years, not because they know a secret GEO trick. Moreover, such a list only measures what already exists. Another trade publication that reaches your audience and is relevant but has not published on the topic yet simply does not appear in the report, but it could. The advice is backward-looking by definition.

Agencies and specialists focused on highly cited channels like Reddit, YouTube, and LinkedIn happily reinforce that reflex. Before you know it, serious budget shifts to "doing something with Reddit" or "building a YouTube strategy" simply because that channel ranks high in an AI citation report. The problem is not that Reddit, YouTube, or LinkedIn are unimportant: on the contrary, in many markets they are where the conversation happens.

The problem is choosing a channel only because a tool says it is often cited, and jumping straight into tactics without first deciding what credible contribution and value you can add there. And the practical question: is that channel the best choice when you weigh investment against impact? Organic Reddit success takes a long game and real community presence. Without that, you might buy reach at best, not authority.

The risk is micro-optimising visibility on one platform while the underlying issue is simply that AI does not yet know enough about you to recommend you with confidence. That is a different problem, with a different solution.

You measure the isolated question, not the whole conversation

There is an even deeper problem with prompt tracking: the prompts you measure are not the prompts on which decisions turn. They are synthetic fragments. Nobody goes to ChatGPT, asks "What is the best solution in industry X?", reads the answer, and happily closes their laptop to order from the first brand mentioned.

People do not enter isolated queries in ChatGPT. They have full conversations with the tool. They explain their situation, share what they have already tried, refine their criteria, and only then ask for a concrete recommendation. That conversation is a chain of messages, context, refinement, and trade-offs before any name appears. If you only put the first or last question into a GEO tool, you miss everything that came before. Grow and Convert aptly coined the term "invisible prompts" for this: the context that actually determines the answer, but never lands in a dashboard, and never will.

Illustration: above the waterline a short ChatGPT question; below, a large iceberg of context and criteria (audience, budget, goals) that shape the answer, the so-called invisible prompts. — What you see in the chat is the tip of the iceberg; most of the context steering the answer stays invisible to prompt trackers.

What you measure is a bare, synthetic version of a rich, personal conversation. If you and I ask ChatGPT the same question, we probably get different answers. Your ChatGPT knows your chat history, context, and preferences. Mine does not. A GEO tool always tests from an empty, anonymous session. That context simply does not exist outside the AI interface itself. And even for a short question, the product uses your full history of prior chats in its answer. A prompt tracker does not have that context.

Prompt tracking is looking through a keyhole. You see something real, but it is a small, arbitrary slice of what is actually happening in the room. No matter how often you look through it, the keyhole stays the same size. Tracking more or more often does not give you a better picture of the room. It gives you more glimpses through the same hole.

Prompt tracking is looking through a keyhole: you see something real, but only a small, arbitrary slice of what is actually happening.

Amanda Natividad at SparkToro had eight mothers with the exact same intent (finding a basketball league for their child); each phrase a prompt. No two prompts looked alike: one asked ChatGPT to act as a basketball coach, another only gave a postcode, another called the AI "Dolly" without naming where they lived. In the same piece, SparkToro calculated average semantic similarity between those prompts at just 0.081: about as much overlap as between recipes for pea soup and tiramisu.

That illustrates the structural problem with prompt tracking: you pick a handful of phrasings that seem logical to you, but they barely overlap with how your audience asks the same question. The questions where you would be the best answer rarely appear in that set: too specific to generate at scale, too dependent on real customer language to invent behind a desk.

Prompt tracking misses conversation context, user context, and phrasing context: three layers that all matter in practice, and all three are invisible in a dashboard.

AI answers are not rankings

Classic SEO had one big advantage as a measurement object: positions were relatively stable. "We rank third for this keyword" was meaningful, because tomorrow you were probably still third.

AI answers work differently. Research by SparkToro and Gumshoe, with 600 volunteers firing twelve different prompts nearly 3,000 times across ChatGPT, Claude, and Google AI, shows less than a 1 in 100 chance that an AI tool returns the same brand list across a hundred runs. For exact order, it is less than 1 in 1,000, regardless of topic. Rank position in a single AI answer is mostly noise. That says more about how probabilistic models work than about your real position in the market.

Same type of question asked three times: brand order shifts heavily.

Rank position in a single AI answer is mostly noise. It says more about the model than about your real position in the market.

Does that mean you cannot measure anything? Not entirely. The same SparkToro research suggests visibility percentage (how often a brand appears at all across dozens or hundreds of runs) does have some statistical validity. If a brand appears in 85 of 95 answers, that says something about how prominent it is in the model's corpus. That is a defensible metric, but only with enough volume, and it still does not tell you why the brand sits there. More useful than position or sentiment is tracking which topics and attributes the model associates with your brand: what you actually represent in AI answers, not only how often you show up.

What is more stable than rank position is which brands and solution directions routinely come back for a given kind of question: the candidate set a model has built around a category or problem. You want to be part of that. But you do not get there by tracking prompts.

That brings us to the question the proud marketer from the intro could not answer: why does AI not actually recommend my brand?

Five reasons ChatGPT recommends your competitor and not you

If ChatGPT, Perplexity, or Gemini names your competitor and not you, it rarely comes down to a smarter GEO strategy. Almost always it is because the model has a clearer, more consistent picture of them: who they are, who they are for, and why they are worth recommending. That picture is built in five layers, each building on the last.

1. The technical foundation: can a model read you at all?

The first layer is the most basic and the most underestimated. AI models that query the live web lean heavily on the same technical signals as search engines: crawlability, information architecture, structured data, clear product and service pages.

If your site hides critical information behind forms or JavaScript mazes, lacks clear structure, and offers little clear product context or author information, the model simply has fewer reliable facts to fill in your brand. Not because you are bad, but because you are hard to read. This is boring hygiene, but without this layer most of your AI Search work stays theoretical.

Layer 1: crawlers extract facts from a clear structure; messy sites lose signals.

2. The entity: does the model understand who you are?

Technical accessibility alone is not enough. The model also has to understand what you are, and that is harder than it sounds.

AI models work with entities and associations. An entity is your brand as a recognisable object in the model. The associations attached to it (categories, use cases, audiences, traits) define your footprint. The richer and more consistent that footprint, the better your chance of surfacing for the right question. If someone asks for "the best HR software for healthcare organisations", the model looks at which vendors are consistently associated with that specific mix of category, audience, and context.

Here is the deepest reason many brands are not recommended: their entity and its associations are vague, inconsistent, or too weak. The website tells one story, reviews another. Product pages speak to a different audience than LinkedIn posts. Case studies cover other sectors than the homepage promises. Together those signals reconstruct the brand image a model builds. If each signal tells a slightly different story, you do not get a solid entity: you get a cloud of associations. A vague cloud does not get recommended.

AI bases its picture not on your identity but on your reputation: what it finds about you across the ecosystem.

Diagram for layer 2, the entity: weak positioning as a diffuse cloud of signals versus a sharp entity as concentrated, reinforced identity. — Layer 2, the entity: diffuse signals versus a sharp, recognisable position.

An important distinction: identity is what you say you are, defined internally and on purpose. Reputation is what others have built about you over time, external and earned. AI bases its picture not on your identity but on your reputation: what it finds about you across the information ecosystem. When those two diverge, AI builds a picture you do not recognise.

The cause runs deeper than content alone: ghost citations. Your content clears the bar to be used as a source, but your brand name does not appear in the recommendation. Your content does the work; your competitor gets the recommendation. Research by Seer Interactive across more than 540,000 LLM answers shows this is not a rare exception. It is a pattern when a model knows your content is relevant but does not strongly associate what you are and for whom. Being cited as a source is fundamentally different from being recommended as the solution.

Diagram: your content page, a citation in the AI answer, and a recommendation that names a competitor. — Ghost citation: your content is cited as a source, but the recommendation names another brand.

You cannot build a clear entity without sharp positioning. If your brand tries to be everything for everyone (broadly usable, flexible, suited to every sector), you inevitably build diffuse associations. The model may know you exist, but not when you are the best choice. If a model does not know that, it will not recommend you.

Sharp positioning means: define for which audiences and situations you are demonstrably the best choice, dare to say when you are not (often harder for marketing managers), and make sure that story shows up consistently on your site, in your cases, in external comms, and in what others write about you. Consistency is not only marketing advice: it is a technical requirement for a strong entity.

But claiming a strong entity is one thing. A model also needs evidence to support that claim in a recommendation. That is where the next layer comes in.

3. Internal evidence: do you supply the arguments?

A sharp entity tells a model who you are. But a model also needs reasons to recommend you over others. You must supply that evidence explicitly, for humans and for models, through comparisons, alternatives, scenarios, and cases that show when and why you are the best choice.

Infographic for layer 3, internal evidence: flow from internal domain and on-site evidence through an AI synthesis engine (comparisons, alternatives, scenarios, case studies) to targeted AI recommendations. Shows orientation vs nuance, competitor vs you, and fit for a situation. — Layer 3: after broad AI answers you need concrete comparisons, scenarios, and cases.

AI tools are absorbing more and more broad orientation questions. People get the high-level answer inside the AI interface and click through less for general information. Where they still go deep is the lower funnel: comparisons, alternatives, fit for a specific situation, backed by clear case studies and proof. Questions like "[competitor] vs [you]", "[you] for [specific sector]", "is [you] right for [concrete scenario]".

This content has always converted well; now it has a second job: it feeds AI models the nuance and scenarios they need to make targeted recommendations. Without this evidence, the model has fewer concrete arguments to recommend you in specific situations. The logic is strong and matches what we see in practice: whoever fills in this layer well gives the model more to base an answer on.

4. External presence with value: are you where the conversation happens?

Internal evidence on your own site is a start, but not enough. AI models look for signals outside your domain, and it is not only about being present, but about being present with value.

You can have a thousand posts on Reddit or LinkedIn and still never get recommended as a solution, because none of those posts substantively prove your expertise or positioning. Engagement bait and giveaways do not count. What counts is substantive contribution that ties you to the right categories, use cases, and audiences: guest articles in trade media, deep contributions in relevant communities, podcasts where you share expertise, content that leads others to name and recommend you.

It is also about platform choice. If your audience orients on YouTube, niche forums, or podcasts, but you are completely absent there, AI has little to synthesise about you. Presence on the wrong platforms, however active, builds the wrong associations.

5. External evidence: do others say it too?

The fifth layer is the strongest and the hardest to steer. AI models weight most what independent sources say about you: not what you claim, but what others confirm without you controlling it.

McKinsey research among 1,927 consumers suggests a brand's own domain accounts for only 5-10% of sources AI Search platforms consult. HubSpot reaches a similar conclusion: 92% of AI mentions do not come from your own content but from external sites that mention you. Reviews on relevant platforms, comparison articles where you emerge as the best option, analyst reports that name you, news that positions you in your category: that is the kind of external evidence that gives a model the confidence to recommend you outright.

Layer 5, external proof: AI leans heavily on what others say about you (HubSpot, McKinsey).

This is also the layer most brands neglect most, while it is the layer AI models lean on hardest. Your website is the foundation; the discovery platforms around it (YouTube, Reddit, LinkedIn, trade media, forums) are the amplifier. AI synthesises what the whole ecosystem says about you. Whoever is consistently positioned as an authority on those external platforms has a structural advantage that is hard to copy with on-site content tricks.

So: why is your brand not recommended?

Back to the question the marketer could not answer. The reason ChatGPT recommends your competitor and not you almost always comes down to one or more of these five points:

Your site is hard to read. Critical information is buried, structure is unclear, or a model cannot reliably infer who you are and what you do.
Your entity is vague or inconsistent; there is a gap between identity and reputation. Your brand speaks to too many audiences, tells a slightly different story in different places, or lacks a clear category and use case it is consistently tied to. Sharp positioning is the prerequisite. Without it you are building on sand.
You lack internal evidence that you are the best choice. You have little or no comparisons, alternatives, or scenario-specific content that gives the model concrete arguments to recommend you in a specific situation.
You are not present with value where the conversation happens. Your audience orients on platforms where you are absent, or your presence does not add substantive depth that reinforces your positioning.
Your external evidence is thin. Others do not independently confirm that you are the best choice: no reviews, no trade media mentions, no external sources that position you in your category.

These are also the five things to start with, in that order, not with a prompt set of eighty questions.

Prompt tracking has value, but serious limitations

This is not an argument to stop using prompt tracking tools. They can be very useful to track whether visibility rises over time by topic, to see which sources AI uses when it does recommend you, and to spot new competitors or sources. But be critical of tips and recommendations based on reverse engineering output, survivor bias, and the misconception that LLMs behave deterministically like a search engine.

Use LLM trackers for what they should be: a feedback layer on an existing strategy. They are not a compass but a thermometer. The marketer with eighty prompts had built an excellent thermometer. They knew exactly how high the fever was. What they did not know was what caused the fever, and that is the question to start with. The answer is not in a dashboard. It is in your positioning, your entity, your external presence, and the content a model needs to recommend you with conviction.

Only once that foundation is in place does a thermometer tell you something useful; it does not cure the fever. GEO tools measure whether you get recommendations, but the work to earn those recommendations happens outside them.

About the authors

Peter Minkjan

Co-founder

Peter has been working in search since 2008. What started with SEA and SEO grew with the profession: from Facebook marketing and YouTube strategy to a broader focus on discoverability beyond the beaten paths of Google. What remained is the fascination with how platforms work, and how that keeps changing. That fascination has only grown now that AI is fundamentally changing the landscape: not just how people search, but how we as marketers do our work.

Marketing strategyAI SearchDigital strategySEO