QA catalogue for analysing library data

KBR (Koninklijke Bibliotheek van België/Bibliothèque royale de Belgique)
    number of records: 4,840,846     timestamp of analysis: 2024-12-20 00:20:35 (00:10:15)
en | de | pt | hu

Thompson—Traill completeness

These scores are the implementation of the following paper:

Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828

Their approach to calculate the quality of ebook records comming from different data sources.

Histogram

  • y axis: number of records
  • x axis: total score of a record

Each record get a score based on a number of criteria. Each criteria result in a positive score. The final score is the summary of these criteria scores.

Record Element MARC field/position/subfield How counted
1. ISBN 020 1 point for each occurrence of field
2. Authors 100, 110, 111 1 point for each occurrence of field(s)
3. Alternative Titles 246 1 point for each occurrence of field
4. Edition 250 1 point for each occurrence of field
5. Contributors 700, 710, 711, 720 1 point for each occurrence of field(s)
6. Series 440, 490, 800, 810, 830 1 point for each occurrence of field(s)
7. Table of Contents and Abstract 505, 520 2 points if both fields exist; 1 point if either field exists
8. Date (MARC 008) 008/07 1 point if valid coded date exists
9. Date (MARC 26X) 260$c, 264$c 1 point if 4-digit date exists; 1 point if matches 008 date.
10. LC/NLM Classification 600, 610, 611, 630, 650, 651, 653 1 point if any field exists
11. Subject Headings: Library of Congress 1 point for each field up to 10 total points
12. Subject Headings: MeSH 600, 610, 611, 630, 650, 651, 653 1 point for each field up to 10 total points
13. Subject Headings: FAST 600, 610, 611, 630, 650, 651, 653 1 point for each field up to 10 total points
14. Subject Headings: GND
(This was not part of the original algorithm)
600, 610, 611, 630, 650, 651, 653 1 point for each field up to 10 total points
15. Subject Headings: Other 600, 610, 611, 630, 650, 651, 653 1 point for each field up to 5 total points
16. Description 008/23, 300$a 2 points if both elements exist; 1 point if either exists
17. Language of Resource 008/35 1 point if likely language code exists
18. Country of Publication Code 008/15 1 point if likely country code exists
19. Language of Cataloging 1 point if either no language is specified, or if English is specified
20. Descriptive cataloging standard 1 point if value is “rda”

components

The histograms of the individual components:

1. ISBN

2. Authors

3. Alternative Titles

4. Edition

5. Contributors

6. Series

7. Table of Contents and Abstract

8. Date 008

9. Date 26X

10. LC/NLM Classification

11. Subject Headings: Library of Congress

12. Subject Headings: Mesh

13. Subject Headings: Fast

14. Subject Headings: GND

15. Subject Headings: Other

16. Online

17. Language of Resource

18. Country of Publication

19. Language of Cataloging

20. Descriptive cataloging standard is RDA

analysis parameters
files kbr-0.xml.gz
kbr-1.xml.gz
kbr-2.xml.gz
kbr-3.xml.gz
kbr-4.xml.gz
marcVersion KBR
marcFormat XML
dataSource FILE
limit -1
offset -1
id
defaultRecordType BOOKS
alephseq false
marcxml true
lineSeparated false
trimId true
recordIgnorator {conditions: —, empty: true }
recordFilter {conditions: —, empty: true } json: {"conditions":null,"empty":true}
ignorableFields {fields: [590, 591, 592, 593, 594, 595, 596, 659, 900, 911, 912, 916, 940, 941, 942, 944, 945, 946, 948, 949, 950, 951, 952, 953, 954, 970, 971, 972, 973, 975, 977, 988, 989 ], empty: false }
stream
defaultEncoding
alephseqLineType
picaIdField 003@$0
picaSubfieldSeparator $
picaSchemaFile
picaRecordTypeField 002@$0
schemaType MARC21
groupBy
groupListFile
solrForScoresUrl
fileName tt-completeness.csv
replacementInControlFields #
marc21 true
unimarc false
pica false
mqaf.version 0.9.3
qa-catalogue.version 0.8.0-SNAPSHOT
numberOfprocessedRecords 4840846
duration 00:10:15
analysisTimestamp 2024-12-20 00:20:35