Caps & whitespace
" VALVE, brass " and "valve brass" became one consistent "Valve, Brass" — leading/trailing spaces trimmed, casing normalized, the missing comma reconciled against the duplicate row.
Proof · before & after
Below is a messy product catalog export turned into a clean, deduplicated, import-ready file. You can see each fix — caps and whitespace, mixed units, a bad GTIN check-digit, duplicate SKUs, and inconsistent date and currency formats — and exactly what lands in the change-log and findings report you receive. Nothing is merged, dropped, or guessed silently; anything uncertain is flagged for you to decide.
| SKU | Name | Size | GTIN | Updated | Price |
|---|---|---|---|---|---|
| 1024 | VALVE, brass | 33cl | 05012345678900 | 3/4/24 | $12.5 |
| 1024 | valve brass | 330ML | 5012345678900 | 2024-03-04 | 12,50 |
| 2087 | Hose Clamp | 1,5L | 0501234567891 | 04-03-2024 | EUR 8 |
| 2087 | Hose Clamp | 1.5 l | 05012345678917 | 2024/03/04 | 8.00 |
| 3391 | Gasket Kit | — | (blank) | 2024-03-05 | 15 |
| SKU | Name | Size | GTIN | Updated | Price |
|---|---|---|---|---|---|
| 1024 | Valve, Brass | 330 ml | 05012345678900 | 2024-03-04 | 12.50 USD |
| 2087 | Hose Clamp | 1.5 L | ⚑ flagged | 2024-03-04 | 8.00 EUR |
| 3391 | Gasket Kit | ⚑ flagged | ⚑ flagged | 2024-03-05 | 15.00 USD |
| 5 rows → 3 SKUs · 2 deduped · units & dates standardized · 3 cells flagged · 0 rows dropped, 0 values invented | |||||
Every edit above is recorded in the change-log; every flag is in the findings report. Nothing is changed silently.
" VALVE, brass " and "valve brass" became one consistent "Valve, Brass" — leading/trailing spaces trimmed, casing normalized, the missing comma reconciled against the duplicate row.
"33cl", "330ML", "1,5L" and "1.5 l" were resolved to a single unit-of-measure convention — 330 ml and 1.5 L — so the same size never appears two ways.
Each barcode was run through the GS1 GTIN check-digit calculation. The valid GTINs passed and were zero-padded to 14 digits; one that failed the check-digit was flagged, not silently corrected; the blank one was flagged as missing.
SKU 1024 and SKU 2087 each appeared twice. They were merged with survivorship rules — keeping the most complete, most recent value per field — never blindly. Five raw rows reconciled to three real products.
"3/4/24", "04-03-2024" and "2024/03/04" all referred to the same day but were ambiguous. They were normalized to ISO 8601 (2024-03-04), with the day/month order confirmed against the catalog rather than assumed.
"$12.5", "12,50", "EUR 8" and bare numbers became consistent minor-unit decimals with an explicit ISO 4217 code (12.50 USD, 8.00 EUR) — so no price is off by a decimal place or missing its currency.
In this sample, 3 of the cells could not be resolved with certainty, so they were flagged for the catalog owner to confirm — not guessed.
A cleaned, import-ready data file
A data dictionary — every column explained
A per-cell change-log — original → cleaned → why
A findings report + a short decision list
The catalog above, deduplicated and standardized, in the exact column layout and format you need to import — Excel or CSV, ready to load into your PIM, webshop, or marketplace without further hand-editing.
Every column defined: what it holds, its expected format and unit, whether it is required, and the rule applied — for example "Size: numeric value plus standardized UoM (ml, L); source values in cl/ML normalized." So anyone on your side can read the file without guessing.
One row for every value we touched, with SKU, column, the original value, the cleaned value, and the reason — e.g. "1024 · Name · ' VALVE, brass ' → 'Valve, Brass' · trim + title-case." You can audit the whole job line by line; nothing is changed off the record.
A plain-English summary of what was wrong and what was fixed, plus a short list of the items only you can decide — here, the failed GTIN check-digit on SKU 2087, the missing GTIN on 3391, and its blank size. Reconciliation counts (rows in, rows out, rows merged, cells flagged) are included so the numbers tie out.
This sample is a demonstration — not a client's data.
No. The catalog shown is a constructed, representative industrial example, built to demonstrate the issue types we routinely clean — caps and whitespace, mixed units, a failed GTIN check-digit, duplicate SKUs, and inconsistent date and currency formats. It is not any client's data. Real client work is confidential and is never displayed publicly.
Because the correct value could not be established with certainty, and we never guess. A GTIN that fails the GS1 check-digit could be a typo in any of several digits; a blank size has no source to fill from. Rather than invent a value, we flag it in the findings report as a short decision only you can make.
One entry for every value we changed: the SKU, the column, the original value, the cleaned value, and the reason for the change. It lets you audit the entire job line by line and confirm that nothing was altered silently. Untouched values are not in the change-log because they were not changed.
A full cleanup is usually delivered in about five business days, depending on catalog size and complexity. The free 50-SKU sample is faster — it is a small subset meant to let you judge the quality quickly, with no card, no obligation, and no call required.
Yes — that is the free 50-SKU sample. Send 50 of your real SKUs and we clean them with the same process shown here and send them back, so you can compare the before and after on your own catalog. We will sign your NDA, and the data is deleted after delivery.
Send 50 of your real SKUs and I'll clean them with the same process you see above and send them back, with the change-log and findings. No card, no obligation, no call.
Get 50 SKUs cleaned free