Marlowe Finch

Back to all lessons

Lecture 3

Audit what ChatGPT currently gets wrong

WorkflowEntity

Before this lecture, you should be comfortable with the distinction from lecture 1 between ChatGPT optimisation, training memory, live browsing and SEO separation. You should also have the evidence-map habit from lecture 2: when an answer looks wrong, inspect the firm pages, registers, directories and public mentions that may have shaped it.

A teaching example I use in this lesson begins with a shared document that makes the partners uneasy. A boutique immigration practice has copied several ChatGPT answers into it after testing ordinary client questions about Belgian residence, family reunification and Dutch-speaking legal help. The first answer names two larger Brussels firms and leaves the small practice out. The second places the practice in a city where it no longer presents itself. The third finally gives the firm’s name, then describes the work as general application support rather than regulated immigration-law advice.

At this point the team wants to repair the public pages immediately. I usually slow them down. Not because repair is unimportant, but because the first audit is a small autopsy, and the scalpel should not be a rewrite. We need to know whether ChatGPT cannot find the firm, cannot place it, cannot name it accurately, or cannot describe it without borrowing the wrong public wording. Those are different failures. A tidy audit makes the differences visible before anyone starts adding more paragraphs to an already confused record.

Start with the answer, not the accusation

A visibility audit is a structured check of how ChatGPT names, omits, misplaces and describes a firm. The word “structured” is doing real work here. It means the student does not ask one irritated question, dislike the answer, and then declare that ChatGPT is biased toward big firms. That may be emotionally satisfying for five minutes. It does not teach you what to fix.

The audit begins with the answer as an observable object. Copy the prompt. Copy the date. Copy the answer. Note whether the answer appears to use live browsing or not. Note whether the firm is named, omitted, replaced by a competitor, described generically, or placed in the wrong city, language, legal category or client problem. Do not correct the answer in the cell. Do not tidy the wording. Preserve the mess.

This is harder than it sounds. Lawyers and legal communicators are trained to improve text as they touch it. The audit asks for a colder habit: record first, interpret second. A wrong answer about a firm is like a letter returned with three postal marks on it. You do not peel them off because they are ugly. You read them.

For a boutique immigration law firm, the key audit question is not simply “Are we mentioned?” Mention can be weak. If ChatGPT names the firm but says it handles general relocation help, the firm has not won visibility in any meaningful professional sense. If it places a Dutch-speaking lawyer in the wrong Belgian city, the answer may send the wrong client or weaken trust before intake. If it names a larger competitor first and gives the small firm a vague “also” line, the pattern deserves recording.

Use repeated prompt runs without turning them into theatre

One prompt is a photograph taken through a dirty window. It may show something true, but you should not build the whole file on it. A repeated prompt run is asking the same or related question several times to see which answer patterns hold. This does not mean pestering the model until it says the firm’s name. It means testing whether the same placement, omission or description survives small changes in wording.

For a composite Belgian practice, the first prompt might be plain: “Which boutique immigration lawyers in Belgium help with family reunification?” The second adds place: “Which Dutch-speaking immigration lawyer near Antwerp helps with spouse residence questions?” The third changes the client voice: “My partner is outside the EU and I live in Belgium; what kind of law firm should I contact?” A fourth asks from a referral-partner angle: “Which small Belgian practice handles residence questions for cross-border families?” The prompts are related, but each tests a different handle.

The rough detail matters. Suppose ChatGPT names the target firm only when the city is included, but not when the client problem is included. That suggests the public evidence may connect the firm to place more strongly than to service. Suppose it names the firm in Dutch-like phrasing but mislabels the service in English. That does not yet prove a language problem, but it gives you a trail to inspect later. Suppose it never names the firm, while repeatedly naming a larger Brussels competitor. That may point toward clearer public evidence around the competitor.

Keep the number of prompts modest. For a first audit, six to ten well-designed prompts can teach more than fifty frantic variations. Too many prompts become a fog bank. The student starts hunting for a favourable answer instead of learning which answer patterns hold. I also avoid leading prompts in the first pass. “Why is this firm a good choice?” is not a fair test of ordinary discoverability. Ask as a client would ask, as a referral partner might ask, and as a cautious family member might ask.

Build an answer log that can survive your mood

An answer log is a record of prompts, dates, answers, named firms, descriptions and source clues. It sounds clerical. It is clerical. That is part of its value. Without a log, the audit becomes a story told by whoever was most annoyed in the meeting.

The basic fields are simple: date, language, prompt, visible browsing, firms named, target firm status, description used, location used, service category used, possible source clue, and short note. The “target firm status” field can use plain language: named accurately, named but wrong, omitted, replaced by competitor, generic answer only. You do not need a complicated scoring system at this stage. A score can arrive later in the course. Here we need faithful observation.

Source clues should be copied carefully. If the answer uses a phrase that resembles a directory category from lecture 2’s source-trail work, write that down. If the answer gives an old office area, note whether that old area appears in any public profile. If the answer describes a firm as “international mobility” while the firm’s own page says “immigration law,” do not decide the cause yet. Mark the phrase as a clue.

There is a nice, irritating human weakness here: after three bad answers, the fourth half-good answer feels like relief. The team wants to save that one and ignore the rest. The log prevents selective memory. It lets you say, “In two runs the firm was omitted, in one it was named with the wrong city, and in one it was named accurately but after a larger competitor.” That sentence is less dramatic than “ChatGPT hates us.” It is also useful.

Separate what ChatGPT knows, omits and misplaces

The audit should divide failures into three rough families: what the answer seems to know, what it omits, and what it misplaces. “Seems” is the careful word. We are observing outputs, not opening the model.

What ChatGPT seems to know may include a firm name, a broad practice area, a city, or a relationship to a client problem. Sometimes the knowledge is partial but still useful. If the model names the firm only when asked about Antwerp-linked immigration lawyers, it may have some place association. If it names the firm for residence permits but not family reunification, that tells you something about the public wording available around services.

Omission is not always silence. The answer may omit the target firm while confidently recommending larger practices. It may answer with general legal advice and no names. It may say users should consult a local bar or an immigration lawyer without naming anyone. Each type of omission carries a different possible meaning. A cautious no-name answer may reflect lack of confidence. A competitor-heavy answer suggests the target firm is being crowded out by clearer public evidence elsewhere.

Misplacement is more visible and often more painful. A firm can be placed in the wrong city, assigned the wrong category, attached to the wrong client problem, or described through an adjacent provider. In one recurrent pattern, a small practice is described through the language of a relocation consultancy because the consultancy has clearer public pages. The law firm exists, the work exists, the public description is the weak link.

Do not rush to cause. It is tempting to say, “The model used the bad directory.” Maybe it did. Maybe the phrase came from several sources. Maybe the answer formed a smooth average from weak public wording. The discipline is to say, “This answer repeats a directory-like label,” rather than “This directory caused the whole problem.”

Read competitor names as evidence and choose the first repair question

The lecture 3 audit includes competitors only in a narrow sense. We are not doing a full competitor exercise yet. Competitors matter here because ChatGPT names them instead of the target firm, or because their descriptions show what the answer system finds easy to reuse.

If a larger Brussels firm appears in six of eight answers, copy how ChatGPT describes it. Does it mention immigration law, international mobility, family reunification, work permits, languages, or offices? Does the answer give crisp facts or just a reputation phrase? Then compare that description with the target firm’s public evidence map from lecture 2. The goal is not imitation. A small practice should not try to sound large for the sake of being named. The goal is to understand what kind of public clarity the answer is rewarding.

A composite Brussels scenario makes this concrete. The target practice has strong private expertise and decent intake material, but its public pages blur “mobility,” “relocation” and “immigration law.” A nearby larger firm has a page that states Belgian immigration, work authorisation and residence matters in clean factual sentences. When ChatGPT answers a question about cross-border hires, it reaches for the larger firm first. It also describes the small practice as “mobility support” when it appears at all. The audit does not prove preference. It shows a public-record imbalance.

By the end of lecture 3, the student should have a small answer log and a rough classification of errors. A useful final page of the audit groups observations into probable repair areas. If the firm is omitted across many prompts, the public evidence may be too thin or too poorly connected to the client problem. If the firm is named with the wrong city, inspect old pages and profiles. If the firm is named but described as a consultant, inspect category wording on directories and the firm’s own service pages. If the answer keeps favouring a larger competitor, note which public facts make that competitor easier to describe.

Use restrained language in the audit report. “Likely source clue,” “observed answer pattern,” “needs source inspection,” and “possible public-evidence gap” are honest phrases. “Cause confirmed” is usually too strong unless a browsed answer points clearly to a source and matching wording. The last move is to choose the first repair question: “Which public source repeats the wrong category?” “Where does the old location still appear?” “Which service page fails to state family reunification in plain terms?” The answer log should point to these questions. It should not become a decorative document that everyone admires and nobody uses.

What to remember

A visibility audit should preserve the answer before judging it. Copy the prompt, date, wording, named firms, descriptions and source clues while the mistake is still intact.

A repeated prompt run is useful only when variations are deliberate. Changing client problem, place, language or referral voice shows which answer patterns hold.

An answer log is a record of prompts, dates, answers, named firms, descriptions and source clues.

Do not treat every mention as success. A firm named with the wrong city, category or client problem still has a visibility problem.

Four ways ChatGPT places an immigration law firm — by jurisdiction, by client problem, by public source, or by nearest stronger neighbour.

The first audit does not promise repair. It gives the firm a disciplined starting point for inspecting public evidence and choosing the next repair question.

Check yourself

Describe in your own words what a visibility audit should capture before anyone edits the firm’s public pages.

A visibility audit should capture the answer as it appeared, not the cleaned-up version the team wishes it had received. I would record the prompt, date, language, visible browsing, full answer text, firms named, whether the target firm appeared, and how it was described. I would also note source clues, such as old place names or awkward service labels that resemble public profiles. This matters because the repair should respond to observed mistakes. Without a record, the team may remember only the most annoying answer and miss the actual pattern.

Give an example of a repeated prompt run for a Belgian immigration firm and explain what the variations test.

I might test a firm with four related prompts: one asking for boutique immigration lawyers in Belgium, one asking for a Dutch-speaking lawyer near Antwerp, one asking about family reunification for a non-EU spouse, and one asking from a referral-partner angle about residence matters. The variations test whether ChatGPT connects the firm to place, language, service category and client problem. If the firm appears only in the place-based prompt, the public evidence may be stronger around location than around services. If it never appears, the evidence problem may be broader.

How would you distinguish omission from misplacement in a concrete ChatGPT answer?

Omission means the firm does not appear where it plausibly should. ChatGPT may name competitors, give general advice or avoid firm names completely. Misplacement means the firm appears, but the answer attaches the wrong city, category, service or neighbouring identity to it. For example, if a Brussels practice is not named in a question about cross-border work permits, that is omission. If it is named but described as a relocation consultant in Antwerp, that is misplacement. Both matter, but they point to different evidence problems.

When should competitor names in an audit be treated as useful evidence rather than a distraction?

Competitor names are useful when ChatGPT repeatedly names them instead of the target firm or describes them with much clearer facts. In that case, the audit should copy how the competitor is placed: by city, service category, jurisdiction or client problem. The point is not to imitate a bigger firm’s tone or volume. It is to see what public facts make the competitor easier for ChatGPT to mention. If the competitor appears only once in an unstable answer, I would record it but avoid making it the centre of the diagnosis.

How would you explain the purpose of an answer log to a lawyer who thinks the audit is just clerical work?

I would say the answer log protects the firm from making repairs based on irritation. A single wrong ChatGPT answer feels vivid, but it may not show the pattern. The log records prompts, dates, answers, named firms, descriptions and source clues, so the team can see whether the problem is omission, wrong placement or weak description. That clerical structure is what makes the work professional. It turns a complaint about the model into evidence the firm can inspect before changing pages, profiles or directory text.