Research Data

The long-term archiving of research data is a central aspect of good scholarly practice. It is essential for ensuring traceability and verifiability of research results based on the evaluation of such data. In addition, archiving such data offers the opportunity to reuse them in the context of new avenues of research.

Research Data and E-Publishing with HASP

In addition to its e-publishing services for articles, books and journals, FID4SA offers scholars of Asian Studies worldwide the opportunity to have the associated research data permanently archived. These can be linked directly to the online publications at Heidelberg Asian Studies Publishing (HASP). All research data - i.e. images, videos, audio files, tables, and graphics - are assinged a DOI (Digital Object Identifier) making them permanently citable, visible, and explicitly linkable as independent academic achievements.

Images, audio and video data as well as other multimedia objects are either stored on the platform heidICON operated by Heidelberg University Library, or integrated into the Heidelberg digitisation system DWork, which is also sustainably hosted by Heidelberg University Library. Additional data publications are available in HASP@heidDATA and are dynamically integrated into the online publication. In the future, not only the publications themselves, but also the media objects used will be sustainably archived in heiARCHIVE, the OAIS-compatible long-term archiving system, which is currently being set up and has been developed jointly by the University Computing Centre and the University Library as part of the Competence Centre for Research Data (KFD). The software code used in the context of publications can also be sustainably published and archived on heiDATA.

 

HASP@heiDATA

 

 

Research Data and Ground Truth Transcriptions

FID4SA uses the Transkribus platform developed within the READ project for text recognition of South Asian scripts.

Several data models for text recognition of the Devanāgarī script hve been trained with Transkribus and deliver strong recognition results with a character error rate (CER) of approx. 2.3%. These data models are based on so-called ground truth transcriptions. These are 1:1 transcriptions of the text on the document facsimile.

With FID4SA@heiDATA, FID4SA has established a dataverse for archiving ground truth data for South Asian scripts. Interested researchers can download the archived data and use it as training data for their own text recognition models. At the same time, researchers working on text recognition for South Asian scripts are invited to use this archive to make their own ground truth data available and to contribute to the creation of a ground truth data archive at a central site.

 

FID4SA@heiDATA - Ground Truth Data for HTR on South Asian Scripts