Agrotag - III

Agrotags was envisaged as a collection of terms that would be used to tag digital information objects (DIOs) in the agriculture area. The main aim is to normalize tagging process in order to make more efficient and simpler searching and provide most efficient resources to the user.

Agrotags's pedigree has been Agrovoc - the agricultural thesaurus from FAO. The ongoing efforts to enrich Agrovoc to ontology is widely known (AIMS website). Agrovoc is also working on mapping onto leading thesauri such as NAL, CABI and MeSH, this provides documents tagged with Agrotags rich interconnection with documents tagged with other thesauri. The inherent power of Agrovoc to convert a term into 19 languages provides an added advantage. Applications built using Agrotags as an assisting-knowledge layer would have greater reach.

The advantages offered by a semantically-tagged knowledge repository for agriculture was already ascertained by efforts such as the agropedia. Agrovoc has provided the glue for the semantic inference in this endeavor.

Ontogenesis of Agrotags

The development of Agrotags was started by analyzing various tagging options available for research documents especially in the agriculture area. The inherent drawback was realized as documents tagged in other languages were not ‘retrievable’ using the tags supplied. An immediate solution lay in the use of terms from Agrovoc.

Agrovoc contains almost 40,000 terms in the English language alone, this was a huge candidate set for generation of tags. The subject matter experts from ICRISAT and IITK decided that a collection of hand-picked terms would go into the creation of a collection of terms for tagging agriculture related documents.

Initially, the top term creation was based on popular thesauri like NAL, CAB and MeSH but later it was decided to create a hierarchy rooted in the concepts from the knowledge models used in AGROPEDIA. After the top terms were finalized, the team set about creating the hierarchy tree taking care to retain the intended purpose of Agrotags. Terms were also sourced outside Agrovoc to arrive at a comprehensive collection of tags.

Navigating through the 25 top terms of Agrovoc, the team selected terms that were useful for tagging. For example, outbreeding, cultivar selection, mass selection, control methods etc. are narrower term of Agrovoc top term methods with different depth level. However, outbreeding and mass selection associated to crop improvement, cultivar selection to plant production and control methods to plant protection top term of Agrotags.

Terms outside Agrovoc were also included in Agrotags and unique codes were assigned to these terms. In the first phase of Agrotags, almost 15 top terms were created.

Plant production, plant protection, crop improvement etc., formed some of the top-level terms in phase I. For each of these top-terms, a hierarchy was created from the remaining 1500 terms which can be viewed graphically at

Currently Agrotags are available in English, Hindi and French languages. Telugu and Kannada versions are in progress.

Criteria of selection

Only descriptors and more popular terms were selected to create Agrotags from Agrovoc. The non descriptors, scientific/taxonomic names, fishery related terms and geographical terms were not included in the selection process .This can be elaborated taking into account some simple examples like:

‘Rice’ is a term in Agrovoc (termcode-6599) and has non-descriptor ‘paddy’. ‘Rice’ is a term present in Agrotags but the term ‘paddy’ is not present so if our document consists of a keyword ‘paddy’ it will be mapped to ‘Rice’ term of Agrotags.

Similarly, ‘Organic Wastes’ (termcode-35237) is a term in Agrovoc as well as Agrotags. ‘Garden Wastes’ (termcode-35242) is a narrower term (NT) of ‘Organic Wastes’ in Agrovoc but not in Agrotags. Now if our document consists of Garden Wastes as its candidate term it will be mapped to its broader term that is ‘Organic Wastes’. To summarize, the following equation describes the relationship between Agrotags and Agrovoc:

Agrotags = Agrovoc – (Non-Descriptor + Scientific Terms + Geopolitical Terms+ Fisheries)

Top level terms of Agrotags

Agrovoc has 25 top level terms where as Agrotags has 15 Top level terms. Agrotags top level terms are not a subset of Agrovoc top level terms but a subset of the overall Agrovoc.

Use of Agrotags in OpenAgri

OpenAgri an open source repository for agricultural documents developed by IIT-K is accessible through [7]. This repository provides for rich semantic interlinking between document using Agrotags. Documents are also automatically tagged using the Agrotagger algorithm.

Role of Agrotags in Agrotagger

Arotagger uses Agrotags as candidate keywords for documents. As explained earlier Agrotags are a proper subset of Agrovoc – Agrovoc has about 40,000 agricultural concepts and Agrotags has around 4159. The concepts selected in Agrotags are hand-picked based on their utility in a tagging scheme as well as their popularity. Agrotagger identifies the occurrence of Agrovoc terms in the document, replaces them with an equivalent Agrotags term and then chooses the candidate keyword from among them.

