Ad Budget Normalization Skill

Reconcile and normalise ad spend data across messy, inconsistent labels before answering any question about totals, channel performance, or budget.
---
name: ad-spend-normalisation
description: Reconcile and normalise ad spend data across messy, inconsistent labels before answering any question about totals, channel performance, or budget. Use this skill whenever a user asks "how much did we spend on X", "what's our total ad spend", "compare spend across channels", or hands over a spreadsheet, export, or pasted table of ad costs. Trigger especially when the data has free-text channel/source/campaign columns (the kind that get typed by humans across teams), when totals need to match a bank statement or invoice, or when the same platform appears under several names (Facebook, fb, FB-Ads, Meta, meta_global, etc.). Also trigger for finance reconciliation, marketing-finance variance investigations, and any question where the answer is a number the user will act on. Do NOT trigger for budget allocation decisions (use ad-budget-allocation) or creative analysis.
---

# Ad Spend Data Normalisation

A skill for one specific failure mode: giving a confident answer to "how much did we spend on X" when the data has the same thing labelled three different ways and you only counted one of them.

The post puts it well. A bank looking for "Abhishek" will not find "Chintu", even though both names point to the same person. An LLM looking for rows where channel equals "Facebook" will not find rows labelled "fb_pro" or "FB-Ads" or "Meta_Global", even though all four are the same platform. The number you return looks clean. It is wrong by 50%.

The whole job of this skill is to never let that happen.

## When to use this skill

Use it whenever the user asks a question that sums, filters, or groups ad spend data — and the data has any free-text column (channel, source, platform, campaign, account name) that humans typed.

Specifically:

- "How much did we spend on [platform] last [period]?"
- "Compare our spend across channels"
- "What's our total ad spend this month?"
- "Why doesn't this match the invoice / bank statement / finance report?"
- Any question where the user will act on the number

If the data is already a single clean export from one platform's ad manager, the risk is lower but not zero. Sub-account names and campaign labels still drift.

## The first move, every time

Before answering any quantitative question, do a label audit. Show the user every distinct value in the column they are asking you to filter or group on. Do not summarise. Do not skip values that look the same as ones above. Show the raw list.

This is the single most important step. Skipping it is what produces the ₹5,00,000 answer when the truth is ₹7,50,000.


Distinct values in the "channel" column:
- Facebook (124 rows)
- facebook (8 rows)
- FB (12 rows)
- fb_pro (47 rows)
- FB-Ads (31 rows)
- Meta (19 rows)
- Meta_Global (6 rows)
- meta global (2 rows)
- Google (88 rows)
- google ads (4 rows)
- GoogleAds (15 rows)
- ...


Then pause. Do not group, do not sum, do not answer. Show this to the user and ask them to confirm the mapping before you go further.

## The mapping step

Once the user has seen the distinct values, propose a mapping from raw labels to a canonical set. Keep the canonical set short and obvious:


Raw label          → Canonical
Facebook           → Meta
facebook           → Meta
FB                 → Meta
fb_pro             → Meta
FB-Ads             → Meta
Meta               → Meta
Meta_Global        → Meta
meta global        → Meta
Google             → Google
google ads         → Google
GoogleAds          → Google


Rules for the mapping:

- *Use one canonical name per platform.* Pick the current brand name (Meta, not Facebook; X, not Twitter, unless the user prefers otherwise — ask).
- *Do not guess on ambiguous labels.* If you see fb_pro_test_v2, ask the user what fb_pro refers to. It might be Meta. It might be a sub-product. It might be a colleague's nickname for something unrelated.
- *Show the row count for each raw label.* This tells the user which mismatches matter and which are rounding errors.
- *Flag anything you cannot map.* Do not silently drop rows. If a label does not fit any canonical bucket, show it and ask.

Save the mapping somewhere visible in the answer so the user can audit it. If the user is in a spreadsheet, suggest adding a channel_canonical column rather than overwriting the original.

## The reconciliation step

Once the mapping is agreed, sum the spend and check the total against an external source — bank statement, finance export, platform invoice, anything the user has.

If the totals match within a reasonable tolerance (say, 1–2%), the mapping is probably right. If they do not match, the mapping is probably missing something. Do not fudge the difference. Investigate:

- Are there rows with no channel label at all?
- Are there rows with currency mismatches (USD spend in an INR sheet, or vice versa)?
- Are there rows outside the date window that got included?
- Are there refunds or credits being double-counted?
- Is the bank statement showing platform fees that are not in the ad manager?

Surface the gap to the user with the candidate explanations. Let them decide which one applies.

## The workflow

### Step 1: Identify the question and the column

What is the user actually asking? Pin it down to one of:

- A total ("how much did we spend on Meta")
- A comparison ("Meta vs Google")
- A trend ("Meta spend over the last six months")
- A breakdown ("spend by platform")

Then identify which column in the data has the answer. Usually channel, source, platform, account_name, or some combination.

### Step 2: Audit the labels

Show every distinct value in that column with row counts. Do not skip ahead.

### Step 3: Propose the mapping

Build the raw → canonical table. Flag ambiguous labels. Wait for user confirmation before applying it.

### Step 4: Apply and sum

Once confirmed, group by canonical label and produce the answer.

### Step 5: Reconcile

Check against an external source if one exists. If totals do not match, surface the gap and propose explanations.

### Step 6: Write the answer

Give the number. Then immediately show:

- The mapping that produced it
- The reconciliation result (matched / off by X / no external source available)
- Any rows that were dropped or flagged as unmappable
- Any caveats about the time window, currency, or data source

## Things to avoid

- *Do not answer a sum question without a label audit.* Even if the data looks clean, do the audit. The cost is low, the upside is the difference between a right number and a wrong one.
- *Do not assume case-insensitivity is the only issue.* "Facebook" vs "facebook" is the easy case. "fb_pro" vs "Meta_Global" is the actual problem.
- *Do not silently merge labels.* Every merge is a decision the user should see.
- *Do not silently drop rows.* Unmappable rows are information, not noise.
- *Do not answer in a single number with no working shown.* The whole point of this skill is that the working is what makes the number trustworthy.
- *Do not treat reconciliation as optional.* A number that has not been checked against something external is a guess in formal clothes.

## When the data really is too messy

Sometimes the labels are so chaotic that no honest mapping is possible without the user's help. Signs:

- Free-text campaign names with no consistent channel column at all
- Multiple currencies with no currency column
- Rows from different teams using different conventions, with no team column to disambiguate
- Spend in different units (some rows in rupees, some in lakhs)

When this happens, do not produce a number. Tell the user the data needs cleaning first, and offer the smallest useful next step:

- Add a channel_canonical column based on a mapping the user confirms
- Add a currency column and convert to one base currency
- Filter to one team or one source first, answer for that, then expand

A confident wrong number is worse than no number. The post calls these "expensive hallucinations" for a reason — someone acts on them.

## Example output shape

*Question*: How much did we spend on Meta last week?

*Label audit* — distinct values in the channel column for the last 7 days:


Facebook       (124 rows, ₹2,10,000)
fb_pro         (47 rows,  ₹1,80,000)
FB-Ads         (31 rows,    ₹95,000)
Meta_Global    (6 rows,     ₹65,000)
Google         (88 rows,  ₹3,20,000)
GoogleAds      (15 rows,    ₹40,000)
(blank)        (4 rows,     ₹15,000) — flagged


*Proposed mapping*:


Facebook, fb_pro, FB-Ads, Meta_Global → Meta
Google, GoogleAds                     → Google
(blank)                               → unmapped, please confirm


*Answer (pending your confirmation of the mapping)*:
- Meta: ₹5,50,000
- Google: ₹3,60,000
- Unmapped: ₹15,000

*Reconciliation*:
- Bank statement total for Meta last week: ₹5,50,000. Match.
- The ₹15,000 in unmapped rows does not appear on the Meta or Google invoices. Possibly a third platform or a manual entry error. Please check.

*Caveats*