Introduction to components
Arkkiivi and the components it offers have been developed in the project Improving the quality and usability of digital materials by means of artificial intelligence (2021–2023).
Is your material packed with post-it notes that cover the text of documents? When scanning, do you too often end up with folded or torn corners or are there too many blank pages in the material that interfere with the use of digital material? Also, do you need a typeface identifier or would you like to use an automatic metadata and text language identifier to enrich your metadata?
You can try these components that detect scanning errors and content in the Arkkiivi interface! Welcome to learn about the components!
- Blank pages detection: Classifies pages as blank or having content.
- Identification of post-it notes: Identifies post-it notes on document pages and indicates their page numbers
- Folded corner detection: Detects folded or torn corners and indicates their page numbers
- Metadata identification: Identifies name identities, generates index terms, and recognises language. N.B. For the time being, name identity identification is possible with typed materials in Finnish and English, and automated subject indexing also works for Swedish texts. Components are unlikely to work correctly with materials in other language and/or handwritten materials.
- typeface recognition: Classifies pages by typeface: handwritten, typed, and a combination of these.
You can test the components by clicking the “Try” button.
N.B. Arkkiivi.fi is a demo/trial platform, which is not suitable for production use. Component codes and trained templates can be found on Github and are freely available and customisable (released under MIT licence). Also note that some components utilise typewritten text recognition, so they do not work with handwritten material. Read more in the component descriptions!