Product data cleansing

Product data cleansing is the process of correcting, standardizing, de-duplicating and validating a product catalog so it is accurate, consistent and ready to import into a PIM, ERP, webshop or marketplace. It fixes inconsistent units of measure, malformed identifiers, duplicate records and mixed formats, brings every field into one agreed structure, and validates the result against named standards — all without inventing facts that were not in the source. Done well, cleansing turns a fragile catalog export that a target system rejects or silently corrupts into a file that loads cleanly the first time, with a record of every change so you can trust what was done.

Updated 21 June 2026 · 9 min read

Most product catalogs accumulate problems faster than anyone fixes them. Years of manual entry, system migrations, merged supplier lists and rushed bulk edits leave a file where the same product appears three ways, weights are in three different units, and half the identifiers fail validation. The data is usually good enough to ship orders, which is exactly why nobody cleans it — until a PIM migration or a new webshop launch demands structured, validated input and the catalog stalls. This guide explains what is actually wrong in a typical catalog, what cleansing covers, how it is done well, and how it differs from enrichment and from data entry.

Common product catalog data problems

Dirty product data is rarely one big failure; it is hundreds of small, systematic inconsistencies. The most common categories are below, with concrete examples of what they look like in an export.

Inconsistent units of measure

The same attribute is recorded in different units, or with the unit baked into a free-text field. One row says 1.5 kg, another says 1500g, a third says 1,5 with the unit only in the product name. Length appears as mm, cm and " across the same column. A target system reading the column as a number cannot compare or filter these, and any calculation on them is wrong. Cleansing normalizes every value to one declared unit per attribute and separates the number from the unit.

Invalid or malformed GTINs

Global Trade Item Numbers are entered with the wrong length, with a leading zero stripped by a spreadsheet, or with a check digit that does not validate. A GTIN-13 stored in a cell formatted as a number silently loses its leading zero and becomes a 12-digit value that no longer scans. Cleansing validates each identifier with the GS1 GTIN check-digit algorithm, preserves leading zeros as text, and flags any code that fails — rather than guessing a replacement.

Duplicate and near-duplicate records

The same physical product exists as several rows: one from an old import, one created by hand, one from a supplier feed. They differ by a trailing space, a different brand spelling, or a manufacturer part number entered with or without dashes. Blindly merging them risks keeping the wrong description or losing a valid attribute. Cleansing detects these clusters and resolves them with explicit survivorship rules — deciding which value wins, field by field, and recording why — never a blind merge.

Mixed date, currency and number formats

Dates appear as 03/04/2026 (ambiguous between March and April), 2026-04-03 and 3 Apr 26 in one column. Prices mix 1,250.00 and 1.250,00, with currency sometimes in a separate column and sometimes as a $ prefix. Cleansing normalizes dates to ISO 8601 (YYYY-MM-DD), resolves currency to ISO 4217 codes with the correct minor units, and applies one consistent decimal and thousands convention across the file.

Incomplete and miscategorized attributes

Required attributes are blank, filled with placeholders like N/A, tbd or 0, or scattered across inconsistent category labels that do not match the taxonomy of the target system. Cleansing maps every product to the client's agreed category structure, completes attributes where the correct value can be sourced from the existing data without invention, and clearly marks every gap that genuinely needs a decision.

What product data cleansing covers

A complete cleanse works across the whole file, not just the obvious errors. In practice it covers:

How product data cleansing is done well

The hard part of cleansing is not making changes; it is making changes you can trust. A messy file can be made to look clean in an afternoon by overwriting everything that seems wrong — and that is exactly how silent errors enter a catalog and surface months later in a customer-facing field. Done properly, cleansing is built on a few non-negotiable principles.

The output of a properly run cleanse is not just a cleaner file. It is four artifacts: the cleaned, import-ready data file; a data dictionary explaining every column; the per-cell change-log; and a findings report with the short list of items only you can decide. Together they let you load the catalog with confidence and defend every value in it.

Cleansing vs enrichment vs data entry

These three terms are often used interchangeably, but they solve different problems. The distinction matters when you scope a project, because doing them in the wrong order wastes effort.

Cleansing is the foundation. It is the step that makes a migration succeed, a webshop launch ship on time, and any later enrichment worth doing. It is a craft built on judgment and an audit trail — not a volume keystroke service.

About the author

Written by Faraz Naqvi, founder of CatalogSmith, a focused product-catalog data cleanup and pre-PIM data readiness service. CatalogSmith returns clean, import-ready catalogs with audit-grade transparency — every change logged, every uncertainty flagged.

Frequently asked questions

What is product data cleansing?

Product data cleansing is the process of correcting, standardizing, de-duplicating and validating a product catalog so it is accurate, consistent and ready to import into a PIM, ERP, webshop or marketplace. It fixes inconsistent units, malformed identifiers, duplicate records and mixed formats without inventing missing facts.

What is the difference between data cleansing and data enrichment?

Data cleansing corrects and standardizes the data you already have so it is accurate and import-ready. Data enrichment adds new attributes, descriptions or media that were not in the source. Cleansing makes existing values trustworthy; enrichment expands coverage. A clean catalog is the prerequisite for reliable enrichment.

How long does product data cleansing take?

For a typical catalog export, a full cleanup is usually returned in about five business days, depending on size and how many fields need standardizing. A free 50-SKU sample clean is returned faster, so you can see the method and the change-log before committing to the full catalog.

Will product data cleansing change my data without telling me?

No. Done properly, nothing is changed silently. Every edit is recorded in a per-cell change-log showing the original value, the cleaned value and the reason. Anything uncertain is flagged for you to confirm rather than guessed, and every row is reconciled in and out so nothing is quietly dropped or invented.

What file format do you need for cleansing?

Any catalog export works, however messy. Excel and CSV are the most common, exported from an ERP, an old PIM, a spreadsheet, or a webshop back end. You send the file as-is; the return is a clean, standardized, import-ready file mapped to the structure your target system expects.

See it on your own catalog

The fastest way to understand product data cleansing is to see it run on your data. Send a messy export and get a free 50-SKU sample clean back — with the change-log and findings report — so you can judge the method before committing to anything. Request your free 50-SKU sample clean, or read more about pre-PIM data readiness and our cleanup services.