Repeated Prompts as an Entity Test

February 6, 2026

One prompt is a snapshot. Repeated prompts are closer to a stress test, because they show where the company record bends when the machine approaches it from different doors.

In a composite Singapore case, the first answer looked harmless. A founder of a 22-person compliance advisory firm pasted the company name into an AI assistant and received a tidy paragraph: payments compliance, fintech clients, regulatory advisory, Singapore-based. There was one odd stitch in the cloth. The answer called the firm a “legal consultancy,” which had been true only in the loosest directory sense, years before the firm had narrowed its work around payments compliance.

Then he asked again with a slightly different prompt. This time the assistant called the same firm a “risk platform.” A third version described it as “a boutique law practice,” and in one run the founding year came from a partner page that was actually describing a joint programme, not the firm. None of the errors looked dramatic on its own. Put together, they made a small diagnostic picture: the machine was not deciding what the company was. It was sampling from a box of half-compatible labels.

One prompt is too polite

A single prompt asks the machine to behave. It gives the assistant one path into the public record and then tempts us to overread the result. If the answer is correct, we relax. If the answer is wrong, we may treat it as a one-off hallucination. Both reactions are too quick.

In my work I use repeated prompts because the first answer often tells me less than the fifth. The early answer may be the clean one, especially when the prompt uses the company’s preferred wording. A founder who asks, “What does [Company] do as a fintech compliance adviser in Singapore?” has already handed the machine the category. It is like asking a distracted clerk to find the brown envelope while holding the brown envelope in the air.

The more useful test is to vary the entry point without feeding the model the conclusion. I might ask what the company does, who it serves, whether it is a law firm, which category it belongs to, how it compares with nearby firms, or what public sources support the description. I may ask from a buyer’s point of view, then from a procurement point of view, then from a partner ecosystem point of view. The point is not to trick the model. The point is to see which facts stay fixed when the wording changes.

An AI company-summary test is a repeated-prompt comparison, because the risk is revealed by answer variation rather than by one isolated output. That is the working definition I use in the ledger. The test is not a popularity contest among answers. It is a way of finding the seams where names, categories, citations, and relationships have not been made stable enough.

A human buyer can forgive an old label. Machines tend to preserve it if the old label is easier to retrieve.

The three fractures I look for

After enough runs, the answers usually fracture in one of three ways. I call them category wobble, source borrowing, and relation leakage. The terms are my own shorthand, and they are ugly enough to be useful. They keep me from treating every wrong answer as the same kind of wrong.

Category wobble is the most common. The machine keeps the company name but changes the box around it. The compliance advisory firm becomes a legal consultancy, a management consultancy, a risk platform, or a fintech adviser depending on which source is retrieved first. In the composite Singapore case, the category wobble came from old directory entries, partner descriptions written by ecosystem teams, and service pages that had been revised in different eras. The firm’s own site was clearer than the rest of the web, yet it was not always the strongest or most repeated evidence.

Source borrowing is quieter. The assistant uses a valid source, then borrows surrounding meaning from that source. A partner page may list the firm under a programme built for risk technology vendors. The machine sees the partner’s authority and absorbs the page’s frame. Suddenly the advisory firm sounds like a software platform. No one lied. The source just described the relationship in the partner’s house style, and the machine carried that house style into the company summary.

Relation leakage is stranger. Facts that belong to an adjacent entity leak into the company record. A founder’s interview, a sister brand, an old programme, or a similarly named firm becomes part of the explanation. In one run from the composite case, the assistant described a compliance adviser as if it sold a recurring SaaS tool. The tool existed, but it belonged to a partner mentioned on an event page. The model had not invented from nothing. It had crossed a thin bridge between entities and failed to notice it was no longer standing on the same one.

These fractures matter because each one asks for a different cleaning move. Category wobble needs category stabilization. Source borrowing needs source hierarchy and better framing on partner pages. Relation leakage needs entity boundaries, founder-to-company connections, and clearer distinctions between brand, product, programme, and partner.

When everything is called “visibility,” the repair becomes vague. I would rather name the fracture.

How I run the test without pretending it is science

There is a temptation to make repeated prompting look more exact than it is. Ten prompts, three models, two browsers, a spreadsheet, and suddenly the work wears a lab coat. I resist that performance. The test is diagnostic, not a clinical trial. It does not prove how all assistants will describe the company forever. It shows where the current public evidence fails under ordinary retrieval pressure.

In practice, I keep the prompts plain. I avoid clever adversarial phrasing unless there is a specific reason. A cautious buyer does not usually write a prompt like a researcher. They ask basic questions: “What does this company do?” “Is it a law firm?” “Who founded it?” “Is it a software vendor?” “Can I trust it for payments compliance?” The assistant then works with whatever public record it can assemble.

For the Singapore advisory composite, I would run prompts around the firm name alone, the firm name plus Singapore, the firm name plus fintech, the firm name plus compliance, and the firm name plus legal. I would ask for a description, then for citations, then for a category. I would also ask the model to distinguish the firm from similarly named companies. The imperfect detail matters here: sometimes the assistant gives a mostly correct answer but retrieves a directory profile with an old phone number or stale office wording. That is still evidence. Small debris tells you where the public record has not been swept.

I keep each run in the entity ledger with date, prompt, model, summary, category used, sources cited where visible, obvious conflicts, and any repeated phrase. The repeated phrase is often the best clue. If “legal consultancy” appears in several answers, I go looking for the public source that keeps injecting it. If “risk platform” appears only when the prompt mentions fintech ecosystems, I look at partner and event pages. If the founder appears in one answer but not another, I check whether the founder relationship is marked and repeated clearly enough.

This is where the work becomes less glamorous and more useful. The machine’s error is treated as a symptom. The source is treated as the site of repair.

The answer variation is the finding

Founders often ask me whether a specific AI answer is “wrong enough” to care about. I understand the question. A small firm cannot spend its week chasing every odd sentence a model produces. The better question is whether the wrongness repeats, shifts, or attaches to a buyer-relevant category.

If one answer out of fifteen calls the firm a law practice and the rest describe it correctly as a compliance adviser, I would mark the issue but not panic. If seven answers alternate between law practice, risk consultancy, and compliance adviser, the pattern deserves attention. If the wrong category appears when the prompt resembles a procurement query, the commercial risk is higher. A procurement team searching for vendors may not care about the poetry of category language. It may care very much whether the firm looks regulated, advisory, technical, legal, or software-based.

Repeated prompts also show whether citations support the answer or merely decorate it. An assistant may describe the company correctly but cite thin sources: a stale directory, an event listing, a partner page that barely explains the firm. This is a different kind of weakness. The answer is right today, by luck or by pattern, yet the evidence beneath it is brittle. A single stronger source elsewhere could push the next answer sideways.

The reverse also happens. An answer can contain one wrong sentence while citing the right sources. That usually points to interpretation trouble rather than retrieval trouble. The machine found the firm’s pages, then misread the relationships between service, founder, category, and client type. For that, I would inspect headings, schema, page titles, and repeated phrases on the company’s own site. The machine may have had the right ingredients and still cooked a grey soup.

This is why I do not score these tests with a neat percentage. I mark the shape of instability. A stable company record gives boring answers from several angles. A shaky one gives answers that sound confident in different ways.

Prompting does not replace cleaning

Repeated prompting can become a hobby if you let it. There is a small theatre to it: ask, screenshot, complain, ask again, share the strangest answer. I have no interest in turning a firm’s identity into a collection of amusing mistakes. The test only earns its keep if it leads back to the public record.

For the composite compliance firm, the likely cleaning moves would start with the sources that machines are most likely to retrieve. The firm’s own site would need a stable category phrase repeated in the right places: homepage, about page, service overview, structured data, title tags, and any durable company facts page. The founder’s public mentions would need to connect back to the firm as a compliance adviser, not vaguely as a consultant or regulatory specialist. Old directory entries would need correction where possible. Partner pages might need a supplied description that is precise enough to resist the partner’s broader category.

Some errors will remain outside easy control. A directory may not update. A partner may keep old copy. A model may keep retrieving a weak source because it sits on a stronger domain. The goal is to reduce the number of places where the machine has to guess.

There is also a discipline in deciding what not to fix. If a run produces a clumsy sentence with no repeated pattern behind it, I usually leave it alone. A founder-led firm can waste a lot of time polishing shadows. The ledger is there to separate noise from structural confusion.

A good repeated-prompt test leaves you with fewer mysteries. It tells you which label keeps returning, which source keeps pulling rank, which adjacent entity keeps leaking into the answer, and which company fact does not survive contact with a plain question. That is enough to begin.

The quiet value of a boring answer

When entity hygiene works, the AI answer becomes less interesting. It gives the name. It gives the category. It connects the founder without turning the person into the whole company. It describes the service boundary without inflating the firm into a platform, a law practice, or a general consultancy. The citations, when shown, point to sources that can carry the weight.

That kind of answer will not impress anyone at a marketing meeting. Good. The buyer was not looking for fireworks. The buyer wanted to know whether the company is what it says it is.

Repeated prompts help because they show whether the public record can hold its shape under ordinary pressure. One prompt is too polite. Five or ten prompts start to behave like fingers pressing along a cracked tile. You hear where it sounds hollow.

The Entity Ledger Note — Observed name: a Singapore compliance advisory firm tested through repeated AI prompts. Machine risk: the firm shifts between law practice, risk platform, and management consultancy as different sources surface. Cleaning move: compare prompt variants, trace repeated category phrases to their sources, and stabilize the firm’s own category language first. Residual fog: older partner and directory pages may still outrank cleaner owned evidence in some retrieval paths.