Mapped, not invented
Source categories are matched to your tree — never guessed. Unmappable items go on a decision list.
What we clean · done for you
CatalogSmith cleans your product catalog data across six areas: deduplication with survivorship, unit and GS1 GTIN standardization, category mapping to your taxonomy, attribute completion, ISO date and currency normalization, and pre-PIM migration prep. You send a messy Excel or CSV export; we return a deduplicated, standardized, import-ready file — with a per-cell change-log of every edit, usually in about five business days.
Duplicate SKUs, near-duplicate variants, and the same product imported twice from two suppliers are the most common reason a catalog won't load cleanly. We identify duplicates by matching across SKU, name, GTIN, and key attributes — then resolve each group with explicit survivorship rules so the most complete, most correct record wins. Nothing is merged blindly: where two records disagree on a value, the conflict is recorded and, if it can't be resolved with confidence, flagged for your call. Every group we collapse is listed in the change-log, so you can see exactly which records merged into which, and why.
| SKU | Name | GTIN |
|---|---|---|
| 001 | widget A | 05012345678900 |
| 001 | WIDGET a | (blank) |
| 1 | Widget A | 5012345678900 |
| SKU | Name | GTIN |
|---|---|---|
| 001 | Widget A | 05012345678900 |
| 3 rows → 1 · survivorship applied · logged | ||
Units of measure and barcodes are where small inconsistencies block an import. We normalize every unit of measure to one consistent convention — "33cl", "33 CL" and "330ml" become a single standard value — so quantities, pack sizes, and pricing units are comparable across the whole catalog. Every GTIN is validated against the GS1 check-digit algorithm and normalized to a consistent length (GTIN-8, GTIN-12, GTIN-13, or GTIN-14), with leading zeros handled correctly. A barcode that fails the check digit, or that is shared by two different products, is flagged for your confirmation — never silently corrected, and never dropped.
| SKU | Size | GTIN |
|---|---|---|
| 001 | 33cl | 05012345678900 |
| 047 | 1,5L | 5012345678901 |
| SKU | Size | GTIN |
|---|---|---|
| 001 | 330 ml | 05012345678900 |
| 047 | 1.5 L | ⚑ check digit fails — review |
Inconsistent, free-text, or missing categories make a catalog impossible to browse or filter. You give us the target taxonomy — your own category tree, a marketplace scheme, or the structure your new PIM expects — and we map every product to it. Where source categories are spelled differently, nested differently, or simply blank, we resolve them to the right node in your tree. Products that are genuinely ambiguous, or that have no clear home in your taxonomy, are not forced into a wrong category: they go onto a short decision list for you to confirm. The full before-and-after mapping is recorded in the change-log.
Source categories are matched to your tree — never guessed. Unmappable items go on a decision list.
Your own taxonomy, a marketplace category set, or the structure your new platform requires.
The original and mapped category for every product is in the per-cell change-log.
A catalog that loads but is full of blank required fields fails at the next step — the webshop won't publish, the marketplace rejects the feed. We identify the attributes your destination requires, then complete the gaps from the sources you provide: other columns in your export, supplier sheets, spec documents, or your existing master data. We never invent a value to fill a cell. Anything that cannot be completed from a reliable source is surfaced as a short, prioritized decision list — the small set of items that genuinely need someone inside your business to answer — so you spend your time only where a real decision is required.
Gaps filled from supplier sheets, spec docs, or your existing master data — not from thin air.
We complete exactly the attributes your webshop, marketplace, or PIM needs to accept the feed.
What can't be sourced reliably comes back to you as a focused list of calls only you can make.
Mixed formats are silent killers: "1.5" and "1,5", a date that reads as March in one row and the third of the month in the next, prices that mix euros and cents. We normalize dates to ISO 8601 (YYYY-MM-DD), currency amounts to ISO 4217 minor units with the correct currency code, and number formats to one consistent decimal and thousands convention. Trailing whitespace, stray characters, smart quotes, and inconsistent casing are cleaned across every text field. The result is one clean, predictable standard your import or PIM can read without surprises — and every reformatted cell is logged with its original value.
| SKU | Price | Released |
|---|---|---|
| 001 | €12,5 | 3/4/24 |
| 047 | 9.90 EUR | 04-03-2024 |
| SKU | Price | Released |
|---|---|---|
| 001 | 12.50 EUR | 2024-03-04 |
| 047 | 9.90 EUR | 2024-03-04 |
Most PIM migrations and webshop launches stall on dirty product data — not on the software. Pre-PIM data readiness is the work of staging and validating your catalog before it loads, so the migration goes in clean the first time. We consolidate your supplier files and exports into one source, apply all of the cleanup above, and validate the result against the structure and required fields your new PIM or platform expects. You receive an import-ready file mapped to the target schema, plus a findings report that calls out anything the new system will reject — caught here, on a spreadsheet, instead of mid-migration. The other five services above feed straight into this one.
Your catalog is checked against the target schema and required fields before it ever loads.
Scattered supplier files and exports consolidated into a single, deduplicated import set.
Columns mapped to the structure your PIM or platform expects, so the load matches the spec.
A cleaned, import-ready data file
A data dictionary — every column explained
A per-cell change-log — original → cleaned → why
A findings report + a short decision list
Every service above is backed by the same rule: nothing changes silently.
CatalogSmith cleans the full spread of a messy catalog export: duplicate SKUs and variants, inconsistent units of measure, invalid or missing GTINs, mismatched categories, gaps in required attributes, and non-standard date and currency formats. Whatever the mess, you get back a deduplicated, standardized, import-ready file — with a per-cell change-log of every edit.
Yes. Every GTIN is validated against the GS1 check-digit algorithm and normalized to a consistent length (GTIN-8, 12, 13, or 14). Barcodes that fail validation or are duplicated across products are flagged for your confirmation, never silently changed or dropped.
Yes. You provide your target taxonomy — your own category tree, or a marketplace or PIM scheme — and we map every product to it. Items that are ambiguous or have no clear home are surfaced on a short decision list for you to confirm, rather than forced into a wrong category.
No. CatalogSmith is a focused, audit-grade data cleanup service, not an offshore data-entry mill. Every edit is recorded in a per-cell change-log, anything uncertain is flagged rather than guessed, and the whole job is reconciled in and out — so no row is quietly dropped and no value is invented.
Send 50 of your real SKUs and we'll clean them across every service above, then send them back with a change-log so you can judge the quality first. No card, no obligation.
Get 50 SKUs cleaned free