Data can be imported in Arkkiivi in the following file formats: png, tif, tiff, pdf, xml and txt. Metadata identification supports typed text only.
The user can download the results in csv format.
The actual test environment works on top of the still unencrypted HTTP connection. Therefore, the security policy of some organisations may prevent the transfer of data. We also aim to have the test environment behind a secure HTTPS connection during the project.
No, anyone with access to the server can view the files. However, Arkkiivi does not save the uploaded files anywhere but deletes them after the run.
Yes. The Arkkiivi development environment is a demo environment that shows what can be done with the artificial intelligence blocks behind Arkkiivi.
Trained models with source codes can be found on the GitHub website
The models of underlying the components have been taught with specific teaching materials, so it is highly possible that there may be deficiencies in the functioning of the components with the material you are using. The components utilise the Tesseract application intended for the interpretation of typed text in the analysis of the material’s text content. Therefore, handwritten text is usually misidentified, though errors may of course also occur in the interpretation of typed text. Thus, components that identify metadata from texts do not work with handwritten materials. Instead, the component that detects blank pages or scanning errors also works with handwritten material or photos. For born-digital material, the limitations are the precision of the NER and keywording components.
The models will be published on GitHub and can be freely installed from there. If your organisation lacks the necessary skills, help can be obtained from companies in the industry.
The results can be edited afterwards in the csv file.