These scores are the implementation of the following paper:
Kelly Thompson and Stacie Traill (2017) Implementation of the scoring algorithm described in Leveraging Python to improve ebook metadata selection, ingest, and management, Code4Lib Journal, Issue 38, 2017-10-18. http://journal.code4lib.org/articles/12828
Their approach to calculate the quality of ebook records comming from different data sources.
Each record get a score based on a number of criteria. Each criteria result in a positive score. The final score is the summary of these criteria scores.
Record Element | MARC field/position/subfield | How counted | |
---|---|---|---|
1. | ISBN | 020 | 1 point for each occurrence of field |
2. | Authors | 100, 110, 111 | 1 point for each occurrence of field(s) |
3. | Alternative Titles | 246 | 1 point for each occurrence of field |
4. | Edition | 250 | 1 point for each occurrence of field |
5. | Contributors | 700, 710, 711, 720 | 1 point for each occurrence of field(s) |
6. | Series | 440, 490, 800, 810, 830 | 1 point for each occurrence of field(s) |
7. | Table of Contents and Abstract | 505, 520 | 2 points if both fields exist; 1 point if either field exists |
8. | Date (MARC 008) | 008/07 | 1 point if valid coded date exists |
9. | Date (MARC 26X) | 260$c, 264$c | 1 point if 4-digit date exists; 1 point if matches 008 date. |
10. | LC/NLM Classification | 600, 610, 611, 630, 650, 651, 653 | 1 point if any field exists |
11. | Subject Headings: Library of Congress | 1 point for each field up to 10 total points | |
12. | Subject Headings: MeSH | 600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 10 total points |
13. | Subject Headings: FAST | 600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 10 total points |
14. | Subject Headings: GND (This was not part of the original algorithm) |
600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 10 total points |
15. | Subject Headings: Other | 600, 610, 611, 630, 650, 651, 653 | 1 point for each field up to 5 total points |
16. | Description | 008/23, 300$a | 2 points if both elements exist; 1 point if either exists |
17. | Language of Resource | 008/35 | 1 point if likely language code exists |
18. | Country of Publication Code | 008/15 | 1 point if likely country code exists |
19. | Language of Cataloging | 1 point if either no language is specified, or if English is specified | |
20. | Descriptive cataloging standard | 1 point if value is “rda” |
The histograms of the individual components:
1. ISBN |
2. Authors |
3. Alternative Titles |
4. Edition |
5. Contributors |
6. Series |
7. Table of Contents and Abstract |
8. Date 008 |
9. Date 26X |
10. LC/NLM Classification |
11. Subject Headings: Library of Congress |
12. Subject Headings: Mesh |
13. Subject Headings: Fast |
14. Subject Headings: GND |
15. Subject Headings: Other |
16. Online |
17. Language of Resource |
18. Country of Publication |
19. Language of Cataloging |
20. Descriptive cataloging standard is RDA |
files |
kbr-0.xml.gz
kbr-1.xml.gz kbr-2.xml.gz kbr-3.xml.gz kbr-4.xml.gz |
marcVersion | KBR |
marcFormat | XML |
dataSource | FILE |
limit | -1 |
offset | -1 |
id | — |
defaultRecordType | BOOKS |
alephseq | false |
marcxml | true |
lineSeparated | false |
trimId | true |
recordIgnorator | {conditions: —, empty: true } |
recordFilter | {conditions: —, empty: true } json: {"conditions":null,"empty":true} |
ignorableFields | {fields: [590, 591, 592, 593, 594, 595, 596, 659, 900, 911, 912, 916, 940, 941, 942, 944, 945, 946, 948, 949, 950, 951, 952, 953, 954, 970, 971, 972, 973, 975, 977, 988, 989 ], empty: false } |
stream | — |
defaultEncoding | — |
alephseqLineType | — |
picaIdField | 003@$0 |
picaSubfieldSeparator | $ |
picaSchemaFile | — |
picaRecordTypeField | 002@$0 |
schemaType | MARC21 |
groupBy | — |
groupListFile | — |
solrForScoresUrl | — |
fileName | tt-completeness.csv |
replacementInControlFields | # |
marc21 | true |
unimarc | false |
pica | false |
mqaf.version | 0.9.3 |
qa-catalogue.version | 0.8.0-SNAPSHOT |
numberOfprocessedRecords | 4840846 |
duration | 00:10:15 |
analysisTimestamp | 2024-12-20 00:20:35 |