What we clean · done for you

Product data cleanup services — import-ready catalogs.

CatalogSmith cleans your product catalog data across six areas: deduplication with survivorship, unit and GS1 GTIN standardization, category mapping to your taxonomy, attribute completion, ISO date and currency normalization, and pre-PIM migration prep. You send a messy Excel or CSV export; we return a deduplicated, standardized, import-ready file — with a per-cell change-log of every edit, usually in about five business days.

Deduplication Unit & GTIN standardization Category mapping Attribute completion Format & currency normalization Pre-PIM migration prep

Get 50 SKUs cleaned free See a sample →

Deduplication

01 SURVIVORSHIP, NEVER BLIND

Duplicate SKUs, near-duplicate variants, and the same product imported twice from two suppliers are the most common reason a catalog won't load cleanly. We identify duplicates by matching across SKU, name, GTIN, and key attributes — then resolve each group with explicit survivorship rules so the most complete, most correct record wins. Nothing is merged blindly: where two records disagree on a value, the conflict is recorded and, if it can't be resolved with confidence, flagged for your call. Every group we collapse is listed in the change-log, so you can see exactly which records merged into which, and why.

raw_export.csv ⚠ duplicates
SKU	Name	GTIN
001	widget A	05012345678900
001	WIDGET a	(blank)
1	Widget A	5012345678900

cleaned.csv ✓ one record
SKU	Name	GTIN
001	Widget A	05012345678900
3 rows → 1 · survivorship applied · logged

Unit & GTIN standardization

02 GS1 CHECK-DIGIT

Units of measure and barcodes are where small inconsistencies block an import. We normalize every unit of measure to one consistent convention — "33cl", "33 CL" and "330ml" become a single standard value — so quantities, pack sizes, and pricing units are comparable across the whole catalog. Every GTIN is validated against the GS1 check-digit algorithm and normalized to a consistent length (GTIN-8, GTIN-12, GTIN-13, or GTIN-14), with leading zeros handled correctly. A barcode that fails the check digit, or that is shared by two different products, is flagged for your confirmation — never silently corrected, and never dropped.

raw_export.csv ⚠ units & GTINs
SKU	Size	GTIN
001	33cl	05012345678900
047	1,5L	5012345678901

cleaned.csv ✓ standardized
SKU	Size	GTIN
001	330 ml	05012345678900
047	1.5 L	⚑ check digit fails — review

Category mapping

03 TO YOUR TAXONOMY

Inconsistent, free-text, or missing categories make a catalog impossible to browse or filter. You give us the target taxonomy — your own category tree, a marketplace scheme, or the structure your new PIM expects — and we map every product to it. Where source categories are spelled differently, nested differently, or simply blank, we resolve them to the right node in your tree. Products that are genuinely ambiguous, or that have no clear home in your taxonomy, are not forced into a wrong category: they go onto a short decision list for you to confirm. The full before-and-after mapping is recorded in the change-log.

Mapped, not invented

Source categories are matched to your tree — never guessed. Unmappable items go on a decision list.

Any scheme

Your own taxonomy, a marketplace category set, or the structure your new platform requires.

Fully logged

The original and mapped category for every product is in the per-cell change-log.

Attribute completion

04 FILLED OR FLAGGED

A catalog that loads but is full of blank required fields fails at the next step — the webshop won't publish, the marketplace rejects the feed. We identify the attributes your destination requires, then complete the gaps from the sources you provide: other columns in your export, supplier sheets, spec documents, or your existing master data. We never invent a value to fill a cell. Anything that cannot be completed from a reliable source is surfaced as a short, prioritized decision list — the small set of items that genuinely need someone inside your business to answer — so you spend your time only where a real decision is required.

From your sources

Gaps filled from supplier sheets, spec docs, or your existing master data — not from thin air.

Required-field aware

We complete exactly the attributes your webshop, marketplace, or PIM needs to accept the feed.

A short decision list

What can't be sourced reliably comes back to you as a focused list of calls only you can make.

Format, date & currency normalization

05 ISO 8601 · ISO 4217

Mixed formats are silent killers: "1.5" and "1,5", a date that reads as March in one row and the third of the month in the next, prices that mix euros and cents. We normalize dates to ISO 8601 (YYYY-MM-DD), currency amounts to ISO 4217 minor units with the correct currency code, and number formats to one consistent decimal and thousands convention. Trailing whitespace, stray characters, smart quotes, and inconsistent casing are cleaned across every text field. The result is one clean, predictable standard your import or PIM can read without surprises — and every reformatted cell is logged with its original value.

raw_export.csv ⚠ mixed formats
SKU	Price	Released
001	€12,5	3/4/24
047	9.90 EUR	04-03-2024

cleaned.csv ✓ one standard
SKU	Price	Released
001	12.50 EUR	2024-03-04
047	9.90 EUR	2024-03-04

Pre-PIM migration prep

06 READY TO LOAD

Most PIM migrations and webshop launches stall on dirty product data — not on the software. Pre-PIM data readiness is the work of staging and validating your catalog before it loads, so the migration goes in clean the first time. We consolidate your supplier files and exports into one source, apply all of the cleanup above, and validate the result against the structure and required fields your new PIM or platform expects. You receive an import-ready file mapped to the target schema, plus a findings report that calls out anything the new system will reject — caught here, on a spreadsheet, instead of mid-migration. The other five services above feed straight into this one.

Staged & validated

Your catalog is checked against the target schema and required fields before it ever loads.

One source of truth

Scattered supplier files and exports consolidated into a single, deduplicated import set.

Mapped to target

Columns mapped to the structure your PIM or platform expects, so the load matches the spec.

What you get, every job

THE DELIVERABLE

A cleaned, import-ready data file

A data dictionary — every column explained

A per-cell change-log — original → cleaned → why

A findings report + a short decision list

Every service above is backed by the same rule: nothing changes silently.

Logged, not silent. Every edit — a deduped row, a fixed unit, a remapped category — is in the per-cell change-log.
Flagged, never guessed. A failed GTIN, an ambiguous category, a missing required field is surfaced for your call — not invented.
Reconciled in and out. No row quietly dropped, no value invented — the counts tie out, backed by a three-layer quality review.
Confidential by default. Your data is used only for your job and deleted after delivery.

Questions about what we clean

FAQ

What kinds of product data does CatalogSmith clean?

CatalogSmith cleans the full spread of a messy catalog export: duplicate SKUs and variants, inconsistent units of measure, invalid or missing GTINs, mismatched categories, gaps in required attributes, and non-standard date and currency formats. Whatever the mess, you get back a deduplicated, standardized, import-ready file — with a per-cell change-log of every edit.

Do you fix GTIN and barcode errors?

Yes. Every GTIN is validated against the GS1 check-digit algorithm and normalized to a consistent length (GTIN-8, 12, 13, or 14). Barcodes that fail validation or are duplicated across products are flagged for your confirmation, never silently changed or dropped.

Can you map our products to our own category taxonomy?

Yes. You provide your target taxonomy — your own category tree, or a marketplace or PIM scheme — and we map every product to it. Items that are ambiguous or have no clear home are surfaced on a short decision list for you to confirm, rather than forced into a wrong category.

Is this data entry?

No. CatalogSmith is a focused, audit-grade data cleanup service, not an offshore data-entry mill. Every edit is recorded in a per-cell change-log, anything uncertain is flagged rather than guessed, and the whole job is reconciled in and out — so no row is quietly dropped and no value is invented.

See it on your own data — free.

Send 50 of your real SKUs and we'll clean them across every service above, then send them back with a change-log so you can judge the quality first. No card, no obligation.

Get 50 SKUs cleaned free