The End of “Local by Default”
How AI Search Broke the Implicit Filter and Why Businesses Are Paying for It
For nearly twenty-five years, search engines trained users to be confident in their ambiguity.
People learned they did not need to be precise. They could ask broad, incomplete questions and trust that the system would quietly fill in the missing context. When someone in Warsaw searched for “best auction sites,” Google understood that “best” meant best in Poland. When a user in Spain searched for cosmetics brands, the results reflected what was relevant, available, and culturally familiar in that market.
This behavior was not accidental. It was enforced by an invisible layer of logic comprising IP signals, language detection, regional indexes, and serving-layer eligibility rules. Together, these systems made locality the default assumption.
AI-powered search removed that assumption.
Not temporarily.
Not accidentally.
Structurally.
And the cost of that change is now appearing as cross-market traffic leakage, broken customer journeys, and revenue loss that is difficult to diagnose because nothing appears obviously broken.
A Subtle Shift in the Question Being Answered
The shift becomes obvious when you look at how AI systems interpret deceptively simple queries.
Take the question: “What are the best cosmetics brands?”
This is the kind of query people have asked search engines for decades. Yet the way AI systems handle it is fundamentally different.
Traditional search engines treated this as a logistical question and silently localized the answer. Generative systems treat it as a conceptual question and resolve it globally. That single change in interpretation alters everything that follows.
What Peec.ai Observed in Poland
New research from Peec.ai shows that ChatGPT searches in English, even when you don’t provide a concrete view of how this plays out in practice.
In their large-scale analysis of ChatGPT’s query fan-out and source selection behavior, Peec.ai tested a Polish-language query using a Polish IP address:
“Jakie są najlepsze portale aukcyjne?”
(“What are the best auction portals?”)
What Peec.ai observed was not a failure to detect language or location. The system clearly understood the query was Polish. The notable behavior occurred during the research phase. When ChatGPT assembled the answer, its underlying source exploration frequently reverted to English-language material.
As a result, global auction platforms were often prioritized over Allegro, Poland’s dominant and operationally most relevant marketplace.
Peec.ai presents this as an observation of system behavior, not as a claim of intent or preference. Their analysis documents what the system did during answer construction, not why it made those choices.
What Peec.ai Observed in Spain—and Why This Case Is Even Clearer
Peec.ai documented the same pattern in a Spanish-language test, and in this case, the internal mechanics make the issue even more explicit.
They asked ChatGPT:
“¿Cuáles son las mejores marcas de cosméticos?”
(“What are the best cosmetics brands?”)
The resulting answer did not include Spanish cosmetics brands, despite Spain having a well-established domestic cosmetics industry. On its own, that result already raises questions. But Peec.ai went further by analyzing how ChatGPT researched the query behind the scenes.
Source: Peec.ai, “ChatGPT Searches in English Even When You Don’t”
Using Peec.ai’s tooling, they identified the internal search queries, referred to as query fan-outs, that ChatGPT used to assemble its response.
The first fan-out was issued in English:
“best cosmetic brands skincare makeup top brands.”
This initial step naturally surfaces global sources and English-language rankings, already tilting the answer set toward internationally dominant brands.
The second fan-out was issued in Spanish:
“Mejores marcas de cosméticos globales alta calidad”
(“Top global high-quality cosmetic brands.”)
This detail matters.
The original user did not ask about global brands; to be accurate, they did not ask about Spain-based or Spanish-language brands either. They asked about cosmetic brands in Spanish. The inclusion of “global” was introduced by the system itself as part of its interpretation of the query.
At that moment, the eligibility decision was effectively made.
Interpreting a Spanish-language query, from Spain, as a request for global cosmetic brands is not a neutral translation choice. It is a design choice. And that choice structurally excludes brands that dominate the Spanish market in favor of brands that dominate English-language content and global rankings.
Peec.ai documents this behavior without assigning a motive. The fan-outs simply show us how the system framed the question internally.
Interpreting the Pattern: Defensibility Under Uncertainty
What Peec.ai shows us is the behavior.
The explanation for why this happens is my inference.
The issue is not primarily language. Language influences which corpus dominates once the frame is set, but it does not explain why local relevance disappears altogether. The deeper issue is how AI systems resolve ambiguity when a query lacks explicit constraints.
When a user asks, “What are the best cosmetics brands?”, the query itself does not imply global evaluation, international recognition, or cross-market consensus. Those criteria are introduced by the system, not by the user.
Faced with an unconstrained superlative like “best,” the AI must choose a frame. In the absence of logistical context, it defaults to defensibility under uncertainty. The system optimizes for answers it can justify, claims it can support, and comparisons it can defend using readily available evidence.
This is why, as I argued previously, “best” no longer behaves like a human qualifier and instead acts as a trigger for global, conceptual evaluation rather than local, practical guidance.
Once the question is framed conceptually rather than logistically, the outcome becomes predictable. Entities with the largest bodies of comparative content, repeated citations, and widely referenced claims rise to the top. Global brands dominate not because the question asked for them, but because they are safer for the system to defend.
The Spanish cosmetics fan-outs make this visible. The system did not merely translate the query. It reframed it to make the answer easier to justify.
Conceptual Answers vs. Usable Answers
This is where AI search quietly breaks the user contract.
A Polish user asking about auction portals or a Spanish user asking about cosmetics brands is almost certainly thinking in practical terms: platforms they can use, brands they recognize, products available in their market. The AI, however, is not optimizing for usability or operability. It is optimizing for conceptual defensibility.
Once the system commits to answering the abstract question of which entities most convincingly satisfy the idea of “best,” locality, availability, and real-world constraints fall out of consideration entirely.
The answer may be correct in an abstract sense. It may also be useless.
Why This Is Not Just a Multilingual Problem
This same failure mode appears across markets and industries. It shows up when Canadian buyers are sent to U.S. product specification pages, when regional SaaS providers are displaced by global documentation hubs, and when local market leaders vanish from AI-generated answers despite strong demand.
This is not primarily a multilingual problem. It is a context-resolution problem.
Language determines which sources dominate after the frame is chosen. Defensibility determines which entities are eligible in the first place. Eligibility decisions have moved upstream, out of the serving layer and into the reasoning layer itself.
Tokenization Explains Understanding — Not Defensibility
At this point, it’s worth addressing an apparent contradiction that often surfaces in discussions about multilingual AI search.
For years, we’ve explained how modern models tokenize and vectorize language in ways that make meaning largely language-agnostic. From a comprehension standpoint, these systems are exceptionally good at understanding that a concept expressed in Spanish, German, Polish, or Japanese maps to the same underlying idea.
But understanding a concept and standing behind an answer are not the same thing.
Tokenization and vectorization solve the problem of semantic equivalence. They allow the model to recognize that X in one language is the same X in another. What they do not solve is the problem of evidentiary weight.
Once the system moves from understanding a question to answering it, it enters a different mode entirely. Retrieval and synthesis require the model to decide which entities, facts, and claims it is willing to assert. That decision is governed not by meaning alone, but by defensibility.
When that choice is unconstrained—when the system is not given clear logistical or contextual boundaries—it gravitates toward where evidence is densest, most redundant, and easiest to corroborate.
Right now, that is overwhelmingly English-language content.
Not because English is inherently better.
Not because the model prefers it.
But because evidence density still matters after vectorization.
Vectors collapse the meaning.
They do not collapse evidentiary weight.
This is why Peec.ai’s findings do not contradict what we know about multilingual embeddings. The system can understand local-language content perfectly well. The problem emerges when it must justify an answer under uncertainty, and the safest place to do that is where the most comparative material exists.
This behavior is not a flaw. It is a design choice.
As I have argued previously, global search misalignment is an engineering feature—even when it becomes a business bug. The system is doing exactly what it was built to do. It is just no longer doing what users, markets, and companies were conditioned to expect.
Is This Bias or a Structural Outcome?
Peec.ai goes further by quantifying the pattern they observed at scale. Across more than 20 million query fan-outs, they found that 43% of the background research steps for non-English prompts were conducted in English, even when users searched in their native language from their home market.
Peec.ai is careful not to sensationalize this finding. They attribute the behavior to two practical forces: authority signals and risk minimization. Global pages tend to have more backlinks, citations, and repetition across independent sources, making them easier for AI systems to evaluate at scale. At the same time, with roughly half of the web’s content written in English, querying English sources reduces uncertainty by increasing the probability of finding corroborating material.
Seen through the framework developed earlier in this article, those two forces collapse into a single concept that I have consistently referred to as defensibility.
I explored this dynamic in more detail previously in Why Global Search Misalignment Is an Engineering Feature and a Business Bug, where I argued that the same mechanisms that make AI systems reliable at scale can simultaneously undermine local relevance.
What is often labeled as language bias is better understood as a structural outcome of how AI systems manage uncertainty. English-language content participates in every market, while local-language content participates in only one. When the system must choose what it can confidently stand behind, that asymmetry inevitably shapes the answer set.
From a business perspective, the effect feels like bias, and the consequences to local businesses are real. Local companies lose visibility to global competitors even in their home markets. But from an engineering standpoint, this is not a flaw waiting to be patched. It is the predictable result of optimizing for defensibility in an uneven information landscape.
What Comes Next: Forcing Locality Back Into the System
The uncomfortable implication of all this is that locality is no longer implicit. AI systems will not assume it on the user’s behalf. If businesses want local relevance to survive in generative answers, they have to make logistical reality explicit and unavoidable.
That means forcing locality back into the system through data, structure, and routing rather than relying on it to be inferred. What that looks like in practice and how organizations can do it without fighting the underlying architecture of AI systems themselves are the subjects of my next article.
Because in a world where ambiguity is resolved by defensibility rather than usefulness, local by default is no longer a given. It is a requirement that has to be engineered.



Super Interesting!