NC Cardinal Support and Staff Education
  • Home
  • Submit a Request
  • Check on a Request
  • Knowledge Books
    • About NC Cardinal
    • Acquisitions in Evergreen
    • Administration Manual for Libraries
    • Cataloging Bibliographic Records
    • Cataloging Items/Copies and Holdings
    • Circulation in Evergreen
    • Evergreen Upgrades
    • Holds Management in Evergreen
    • Libraries Migrating into NC Cardinal
    • Offline Transactions
    • Patron Account Management
    • Reports in Evergreen
    • Resource Sharing
    • Serials in Evergreen
    • Student Access Initiative
    • Summon Documentation
    • Troubleshooting in Evergreen
HelpSpot help desk software

Home → Cataloging Bibliographic Records → Deduplication and Clean Up Projects → Deduplicating Matching Records

11.3. Deduplicating Matching Records

Last Updated 03/09/2026


The Deduplication Process and Auto-Merging Records


On This Page

Once the Waves clean up process has finished, and Catalogers have reviewed and signed off on the results, the actual deduplication process can begin. This process compares a number of MARC fields between records to determine if they are a good match for merging or not. Strong matches will be merged automatically (after a review period), while weaker potential matches will be added to a "Needs human" list for review and edits. 


Once the Waves clean up process has finished, and Catalogers have reviewed and signed off on the results, the actual deduplication process can begin.  This process compares a number of MARC fields between records to determine if they are a good match for merging or not. Strong matches will be merged automatically (after a review period), while weaker potential matches will be added to a "Needs human" list for review and edits.

Results Categories

In the resulting spreadsheets for the deduplication process, there are two categories present:

  • Converted
    • For the deduplication, these are records that the process is confident enough about that they will be merged, with one record selected as the lead bib and the other as the sub, unless we see a problem with the logic.
    • So, essentially, the “Auto” sheets show what the deduplication process would do if we were to run them now as is.
    • “Auto” is listed as “Converted” once the Waves clean up has been performed and the format changes have been made.
  • Needs Humans
    • For the deduplication, these are records that the process has a guess about but is not sure enough to actually go ahead and merge.

What MARC Fields are Compared During the Deduplication Process?

From the LDR field, the following data elements are taken into consideration:

  • Type of record
  • Bibliographic level

From the =008 field, the following data elements are taken into consideration:

  • Form of item
  • Date 1

From the =020 field, the following subfield is taken into consideration:

  • $a (ISBN)

From the =100, =110, =111 fields, the following subfield, specifically the first occurrence, is taken into consideration:

  • $a (Personal name; Corporate name; Meeting name)

From the =245 field, the following subfields are taken into consideration:

  • $a (Title)
  • $b (Subtitle)
  • $h (Medium/GMD)
  • $n (Number of part/section of work)
  • $p (Name of part/section of work)

From the =250 field, the following subfield is taken into consideration:

  • $a (Edition statement)

From the =264 _1 field, the following subfield is taken into consideration:

  • $c (Date of publication)

From the =700 field/s, the following subfield is taken into consideration:

  • $a (Added entry/Personal name)

In addition to the above fields and subfields, the deduplication process also checks to see if a bib record is an audio format or a video format, because those require different scoring.

How Candidate Bibs are Selected

Candidate bibs are selected based upon Evergreen's fingerprint, containing the following fields:

  • Form of item
  • Date 1
  • Type of record
  • Bibliographic level
  • Title (=245$a)
  • Subtitle/Remainder of title (=245$b)
  • Medium/GMD (=245$h)
  • Number of part/section of work (=245$n)
  • Name of part/section of work (=245$p)
  • Edition statement (=250$a)
  • Author (=100$a, =110$a, =111$a)
  • Added entry/Personal name (=700$a)
  • Audio format
  • Video format
  • Date of publication
  • Normalized ISBNs

Each candidate for merging gets a new fingerprint. The fingerprints are compared, and records with the exact same fingerprint are considered to be duplicates. This is the set of bibliographic records that will be merged.  The winning bib is decided based upon a complex "score."  The bib with the highest score is marked as the Lead and the other bibs are merged onto that one.

Reviewing the Deduplication Process “Auto” Sheet

After the format icons have been updated by the Waves clean up process, Mobius will generate a list of records that would be merged if the deduplication process were to be run immediately.  This information will be provided in a spreadsheet for catalogers to review.  Members of the Cataloging Interest Group will review the matched records in the “Auto” sheet and see if there is anything that jumps out to them as a problem.

The reviewers’ role at this stage is to review the "Auto" sheet and see if the records marked Lead Bib and Sub Bib are a good match that in fact should be merged on to the Lead Bib.  If not, it will then be important to figure out how to differentiate the records as non-duplicates, so that the criteria used to consider two records as duplicates can be tweaked.  Basically, catalogers must confirm that the process is making good determinations regarding what records should be matched and merged.

What Happens When Two Bibs are Merged?

=020 (ISBN) is merged onto the final bib.  Any unique =020 is “merged/melted” onto the winning bib, such that the final bib may have multiple =020 fields.

=035 (OCLC number) is merged onto the final bib.  Any unique =035 is “merged/melted” onto the winning bib, such that the final bib may have multiple =035 fields.

=037 (source of acquisition) is merged onto the final bib.  Any unique =037 is “merged/melted” onto the winning bib, such that the final bib may have multiple =037 fields.

=086 (government document classification number) is merged onto the final bib.  Any unique =086 is “merged/melted” onto the winning bib, such that the final bib may have multiple =086 fields.

=856 (URL to electronic resource) is merged onto the final bib.  Any unique =856$u along with any accompanied $9 are "merged/melted" onto the winning bib, such that the final bib may have multiple =856 fields.

What Is Not Merged?

Any of the bibs listed here that have an OPAC Icon of:

  • Serial
  • DVD
  • VHS
  • Blu-Ray
  • Microform
  • Software

Will NOT be automatically merged.

Because they are too often false positives.  They need to be handled by hand, so they are part of the "Needs Humans" results.  The rest of the bibs are merged.  The merging process is fairly involved (internally) - it takes into account all of the holds and the metarecords issues.  They are reconciled and merged as well.

Knowledge Tags
dedupe  /  deduplication  /  deduplication process  /  merge records  /  merging  /  bibliographic record  / 

This page was: Helpful | Not Helpful


NC Cardinal is supported by the Institute of Museum and Library Services under the provisions of the federal Library Services and Technology Act (LSTA), as administered by the Library of North Carolina, a division of the Department of Natural and Cultural Resources.