Proof · before & after

Before and after: a product catalog, cleaned line by line.

Below is a messy product catalog export turned into a clean, deduplicated, import-ready file. You can see each fix — caps and whitespace, mixed units, a bad GTIN check-digit, duplicate SKUs, and inconsistent date and currency formats — and exactly what lands in the change-log and findings report you receive. Nothing is merged, dropped, or guessed silently; anything uncertain is flagged for you to decide.

Deduplication Unit & GTIN standardization Format normalization Per-cell change-log Findings report

The same catalog, raw and cleaned

FIG. 01 — RAW → CLEANED
supplier_export.csv ⚠ 6 issue types
SKUNameSizeGTINUpdatedPrice
1024 VALVE, brass 33cl050123456789003/4/24$12.5
1024valve brass330ML50123456789002024-03-0412,50
2087Hose Clamp1,5L050123456789104-03-2024EUR 8
2087Hose Clamp1.5 l050123456789172024/03/048.00
3391Gasket Kit(blank)2024-03-0515
cleaned.csv ✓ import-ready
SKUNameSizeGTINUpdatedPrice
1024Valve, Brass330 ml050123456789002024-03-0412.50 USD
2087Hose Clamp1.5 L⚑ flagged2024-03-048.00 EUR
3391Gasket Kit⚑ flagged⚑ flagged2024-03-0515.00 USD
5 rows → 3 SKUs · 2 deduped · units & dates standardized · 3 cells flagged · 0 rows dropped, 0 values invented

Every edit above is recorded in the change-log; every flag is in the findings report. Nothing is changed silently.

What each fix was — and why

SECTION 02

Caps & whitespace

" VALVE, brass " and "valve brass" became one consistent "Valve, Brass" — leading/trailing spaces trimmed, casing normalized, the missing comma reconciled against the duplicate row.

Mixed units

"33cl", "330ML", "1,5L" and "1.5 l" were resolved to a single unit-of-measure convention — 330 ml and 1.5 L — so the same size never appears two ways.

GTIN check-digit

Each barcode was run through the GS1 GTIN check-digit calculation. The valid GTINs passed and were zero-padded to 14 digits; one that failed the check-digit was flagged, not silently corrected; the blank one was flagged as missing.

Duplicate SKUs

SKU 1024 and SKU 2087 each appeared twice. They were merged with survivorship rules — keeping the most complete, most recent value per field — never blindly. Five raw rows reconciled to three real products.

Date formats

"3/4/24", "04-03-2024" and "2024/03/04" all referred to the same day but were ambiguous. They were normalized to ISO 8601 (2024-03-04), with the day/month order confirmed against the catalog rather than assumed.

Currency & price

"$12.5", "12,50", "EUR 8" and bare numbers became consistent minor-unit decimals with an explicit ISO 4217 code (12.50 USD, 8.00 EUR) — so no price is off by a decimal place or missing its currency.

In this sample, 3 of the cells could not be resolved with certainty, so they were flagged for the catalog owner to confirm — not guessed.


The four files you get back

THE DELIVERABLE
01

A cleaned, import-ready data file

02

A data dictionary — every column explained

03

A per-cell change-log — original → cleaned → why

04

A findings report + a short decision list

1 · The cleaned data file

The catalog above, deduplicated and standardized, in the exact column layout and format you need to import — Excel or CSV, ready to load into your PIM, webshop, or marketplace without further hand-editing.

2 · The data dictionary

Every column defined: what it holds, its expected format and unit, whether it is required, and the rule applied — for example "Size: numeric value plus standardized UoM (ml, L); source values in cl/ML normalized." So anyone on your side can read the file without guessing.

3 · The per-cell change-log

One row for every value we touched, with SKU, column, the original value, the cleaned value, and the reason — e.g. "1024 · Name · ' VALVE, brass ' → 'Valve, Brass' · trim + title-case." You can audit the whole job line by line; nothing is changed off the record.

4 · The findings report & decision list

A plain-English summary of what was wrong and what was fixed, plus a short list of the items only you can decide — here, the failed GTIN check-digit on SKU 2087, the missing GTIN on 3391, and its blank size. Reconciliation counts (rows in, rows out, rows merged, cells flagged) are included so the numbers tie out.


This sample is a demonstration — not a client's data.

  • Representative, not real. The catalog above is a constructed industrial example built to show the issue types we handle. It is not any client's data — client work is confidential and never shown.
  • The method is real. Every fix shown is exactly how we work: GS1 GTIN check-digit validation, survivorship-based deduplication, ISO 8601 dates, ISO 4217 currency.
  • Want it on your data? The free 50-SKU sample below runs this same process on your real SKUs — so you judge the quality on your own catalog, not ours.
  • Confidential by default. Anything you send is used only for your job and deleted after delivery.

Questions about the sample

FAQ
Is this before-and-after from a real client?

No. The catalog shown is a constructed, representative industrial example, built to demonstrate the issue types we routinely clean — caps and whitespace, mixed units, a failed GTIN check-digit, duplicate SKUs, and inconsistent date and currency formats. It is not any client's data. Real client work is confidential and is never displayed publicly.

Why are some cells flagged instead of fixed?

Because the correct value could not be established with certainty, and we never guess. A GTIN that fails the GS1 check-digit could be a typo in any of several digits; a blank size has no source to fill from. Rather than invent a value, we flag it in the findings report as a short decision only you can make.

What exactly is in the change-log?

One entry for every value we changed: the SKU, the column, the original value, the cleaned value, and the reason for the change. It lets you audit the entire job line by line and confirm that nothing was altered silently. Untouched values are not in the change-log because they were not changed.

How long does a real cleanup take?

A full cleanup is usually delivered in about five business days, depending on catalog size and complexity. The free 50-SKU sample is faster — it is a small subset meant to let you judge the quality quickly, with no card, no obligation, and no call required.

Will you run this on my data first?

Yes — that is the free 50-SKU sample. Send 50 of your real SKUs and we clean them with the same process shown here and send them back, so you can compare the before and after on your own catalog. We will sign your NDA, and the data is deleted after delivery.

See this on your own catalog — free.

Send 50 of your real SKUs and I'll clean them with the same process you see above and send them back, with the change-log and findings. No card, no obligation, no call.

Get 50 SKUs cleaned free