Home → Cataloging Bibliographic Records → Deduplication and Clean Up Projects → Deduplicating Matching Records
Last Updated 03/09/2026
On This Page
Once the Waves clean up process has finished, and Catalogers have reviewed and signed off on the results, the actual deduplication process can begin. This process compares a number of MARC fields between records to determine if they are a good match for merging or not. Strong matches will be merged automatically (after a review period), while weaker potential matches will be added to a "Needs human" list for review and edits.
Once the Waves clean up process has finished, and Catalogers have reviewed and signed off on the results, the actual deduplication process can begin. This process compares a number of MARC fields between records to determine if they are a good match for merging or not. Strong matches will be merged automatically (after a review period), while weaker potential matches will be added to a "Needs human" list for review and edits.
In the resulting spreadsheets for the deduplication process, there are two categories present:
From the LDR field, the following data elements are taken into consideration:
From the =008 field, the following data elements are taken into consideration:
From the =020 field, the following subfield is taken into consideration:
From the =100, =110, =111 fields, the following subfield, specifically the first occurrence, is taken into consideration:
From the =245 field, the following subfields are taken into consideration:
From the =250 field, the following subfield is taken into consideration:
From the =264 _1 field, the following subfield is taken into consideration:
From the =700 field/s, the following subfield is taken into consideration:
In addition to the above fields and subfields, the deduplication process also checks to see if a bib record is an audio format or a video format, because those require different scoring.
Candidate bibs are selected based upon Evergreen's fingerprint, containing the following fields:
Each candidate for merging gets a new fingerprint. The fingerprints are compared, and records with the exact same fingerprint are considered to be duplicates. This is the set of bibliographic records that will be merged. The winning bib is decided based upon a complex "score." The bib with the highest score is marked as the Lead and the other bibs are merged onto that one.
After the format icons have been updated by the Waves clean up process, Mobius will generate a list of records that would be merged if the deduplication process were to be run immediately. This information will be provided in a spreadsheet for catalogers to review. Members of the Cataloging Interest Group will review the matched records in the “Auto” sheet and see if there is anything that jumps out to them as a problem.
The reviewers’ role at this stage is to review the "Auto" sheet and see if the records marked Lead Bib and Sub Bib are a good match that in fact should be merged on to the Lead Bib. If not, it will then be important to figure out how to differentiate the records as non-duplicates, so that the criteria used to consider two records as duplicates can be tweaked. Basically, catalogers must confirm that the process is making good determinations regarding what records should be matched and merged.
=020 (ISBN) is merged onto the final bib. Any unique =020 is “merged/melted” onto the winning bib, such that the final bib may have multiple =020 fields.
=035 (OCLC number) is merged onto the final bib. Any unique =035 is “merged/melted” onto the winning bib, such that the final bib may have multiple =035 fields.
=037 (source of acquisition) is merged onto the final bib. Any unique =037 is “merged/melted” onto the winning bib, such that the final bib may have multiple =037 fields.
=086 (government document classification number) is merged onto the final bib. Any unique =086 is “merged/melted” onto the winning bib, such that the final bib may have multiple =086 fields.
=856 (URL to electronic resource) is merged onto the final bib. Any unique =856$u along with any accompanied $9 are "merged/melted" onto the winning bib, such that the final bib may have multiple =856 fields.
Any of the bibs listed here that have an OPAC Icon of:
Will NOT be automatically merged.
Because they are too often false positives. They need to be handled by hand, so they are part of the "Needs Humans" results. The rest of the bibs are merged. The merging process is fairly involved (internally) - it takes into account all of the holds and the metarecords issues. They are reconciled and merged as well.