How to fix duplicate SKUs in a product catalog
To fix duplicate SKUs in a product catalog, group the rows that describe the same product, then resolve each group down to one record using explicit field-by-field rules — keeping the best value for each attribute and logging every change — rather than blindly deleting or merging rows. The reliable sequence is: normalize the data first so hidden duplicates surface, match products on a stable identifier such as a validated GTIN or manufacturer part number instead of the SKU text alone, resolve each duplicate cluster with survivorship rules, and flag anything ambiguous for you to decide instead of guessing. Done this way you remove the duplicates without silently losing a correct price, attribute or description.
Updated 24 June 2026 · 8 min read
The quick answer: 6 steps to fix duplicate SKUs safely
- Normalize first. Trim whitespace, standardize casing, and split units out of names so rows that differ only by formatting become recognizable as duplicates.
- Match on a stable key. Cluster products by a validated GTIN or manufacturer part number — not the raw SKU text, which is often the field that is inconsistent.
- Cluster the matches. Group every row that points to the same physical product, including near-duplicates that are not character-for-character identical.
- Resolve with survivorship rules. Decide which value wins per field (most complete, most recent, most trusted source) — not which whole row wins.
- Flag, do not guess. Where two rows genuinely conflict and no rule decides it, mark it for you to confirm — never auto-pick.
- Log every change. Record original → kept value → why, so nothing is removed silently and you can audit any decision later.
Why you can't just delete duplicate SKUs
The instinct is to sort by SKU and delete the repeats — but that is exactly how a cleanup quietly destroys good data. Two rows sharing a SKU are rarely identical: one often holds the correct weight while the other holds the correct price, an attribute filled in one is blank in the other, and the "duplicate" you delete may be the only row with a valid barcode. Deleting a whole row to remove a duplicate throws away whatever that row got right.
The second trap is that most real duplicates are not exact matches. The same product appears as ACME Bolt M6 (trailing space), acme bolt m6 and Acme Bolt M6, or with a part number stored as AB-1024 in one row and AB1024 in another. An exact-match de-duplication — including Excel's Remove Duplicates button — sees these as different products and leaves every one of them in the file. You have to normalize first, or you only catch a fraction of the duplicates.
Blind merge vs survivorship de-duplication
The difference between a risky cleanup and a safe one comes down to how each duplicate group is resolved:
| Blind delete / merge | Survivorship de-duplication | |
|---|---|---|
| How it decides | Keeps the first or last row | Picks the best value field by field |
| Data-loss risk | High — drops valid values silently | Low — best value kept, rest logged |
| Catches near-duplicates | No — exact match only | Yes — normalizes first |
| Handles conflicts | Auto-overwrites | Flags for you |
| Audit trail | None | Per-row change-log |
Step by step: how to fix duplicate SKUs
1. Normalize the data so duplicates surface
Before you match anything, clean the fields you will match on. Trim leading and trailing spaces, collapse double spaces, standardize casing, and pull units or sizes out of free-text names into their own columns. Preserve leading zeros on identifiers by treating them as text — a spreadsheet that reads a GTIN as a number silently drops the leading zero and breaks the match. After this pass, rows that were "different" only because of formatting line up as the duplicates they are.
2. Match on a stable identifier, not the SKU text
SKUs are frequently the least reliable field, because they are often re-keyed or invented per system. Where possible, cluster on a validated GTIN — a UPC or EAN barcode with its GS1 check-digit verified, so you are not matching on a typo — or a normalized manufacturer part number. Use the SKU only as a secondary signal.
3. Resolve each cluster with survivorship rules
For every group of matched rows, decide the winning value per attribute, not per row. Typical rules: keep the most complete description, the most recently updated price, the non-blank attribute, the value from the source you trust most. The surviving record is assembled from the best of each row — what data teams call a golden record — so a correct value never gets deleted just because it lived in the "loser" row.
4. Flag conflicts instead of guessing
Sometimes two rows give different, equally plausible values and no rule settles it — two different prices, two contradictory weights. Do not let the tool pick. Flag the cluster for a person to decide, because a wrong auto-choice here is exactly the kind of error that surfaces months later in a customer-facing field.
5. Keep a change-log
Record what happened to every row: which records were merged, which value won each field, and why. This is what lets you trust the result and defend it — and what separates a real de-duplication from an afternoon of overwriting.
Common mistakes when removing duplicate SKUs
- Matching on raw SKU text only — misses every near-duplicate that differs by spacing, case or punctuation.
- Using Excel's Remove Duplicates for real consolidation — it deletes the extra rows silently, keeps only the first, and catches exact matches only.
- Merging whole rows instead of per-field — keeps one row's good data and discards another row's good data.
- Letting a spreadsheet strip leading zeros from GTINs — turns a valid barcode into a broken one and a real match into a miss.
- Auto-resolving conflicts — guessing a winner where the data genuinely disagrees.
- Keeping no record — with no change-log, you cannot audit, reverse or trust the cleanup.
Doing it in Excel vs a proper cleanup
For a few dozen rows you can do this by hand: a helper column that normalizes the match key, a sort to group likely duplicates, and careful manual merging with a notes column. It works, but it is slow and error-prone, and it does not scale past a few hundred SKUs — which is exactly when duplicates matter most, during a migration or a webshop launch. A proper cleanup automates the normalization, matching, survivorship and logging across the whole file, and returns a per-cell change-log so every decision is auditable.
About the author
Written by Faraz Naqvi, founder of CatalogSmith, a focused product-catalog data cleanup and pre-PIM data readiness service. CatalogSmith returns clean, import-ready catalogs with audit-grade transparency — every change logged, every uncertainty flagged.
Frequently asked questions
What causes duplicate SKUs in a product catalog?
Duplicate SKUs usually come from system migrations, merged supplier feeds, and manual entry over time. The same physical product gets added more than once, and the rows differ by a trailing space, a different brand spelling, or a part number entered with or without dashes. Because the text is not identical, an exact-match check does not catch them, so they accumulate unnoticed until a migration or webshop import forces the issue.
Can Excel remove duplicate SKUs?
Excel's Remove Duplicates button only deletes rows that are exactly identical, keeps the first occurrence, and leaves no record of what was removed. That makes it risky for real de-duplication: it misses near-duplicates that differ by formatting and can drop a row that held the only correct price or attribute. It is fine for spotting obvious exact copies, not for safely consolidating product records.
What are survivorship rules in de-duplication?
Survivorship rules decide which value wins, field by field, when duplicate records are merged. Instead of keeping one whole row and deleting the others, you keep the best value for each attribute — the most complete description, the most recent price, the value from the most trusted source. This prevents data loss, because a correct value in a row that would otherwise be deleted is preserved in the surviving record.
How do I merge duplicate products without losing data?
Normalize the data first so hidden duplicates surface, match products on a stable identifier such as a validated GTIN or manufacturer part number rather than the SKU text, then resolve each cluster with field-by-field survivorship rules so the best value for every attribute is kept. Flag any genuine conflict for you to decide instead of guessing, and keep a per-row change-log of original to kept value and why.
Should I delete or merge duplicate SKUs?
Merge, do not blindly delete. Two rows with the same SKU often hold different valid information — one has the correct weight, the other the correct price — so deleting either loses something. The safe approach is to merge them under survivorship rules that keep the best value per field and record the decision, and to flag rows for review when the right answer is genuinely unclear.
See it on your own catalog
The fastest way to see safe SKU deduplication is to watch it run on your data. Send a messy export and get a free 50-SKU sample clean back — with the change-log showing exactly how each duplicate was resolved — so you can judge the method before committing to anything. Request your free 50-SKU sample clean, or read more about product data cleansing and pre-PIM data readiness.