Skip to main content

Agrotagger Version II

Agrotagger

Machines as compared to human give more efficacious result in almost all the domain, but when it comes to natural language understanding, machine driven results can't compete human analysis. But this also has a positive side, extracting a handful of keywords from content potentially seems to be a feasible solution and with that point a pluggable module called Agrotagger is being developed with collaboration of FAO. This module could be used as an add-on to leading repositories such as DSpace and advanced management systems like Drupal and Joomla to automatically tag documents within a controlled vocabulary such as Agrotags. User generated tags together with those that are generated by Agrotagger would help link documents related to agriculture more effectively for faster retrieval and for an enhanced presence in the present flair of the web.

 Need for Agrotagger

With the huge amount of digital documents existing in the internet and their growing panoply with each passing day, keyphrases prove to be an important metadata. Although key phrases can be assigned by the document's author at the time of its creation, the manual process of tagging the documents with keyphrases is not only labor-intensive and time-consuming but also yields poor indexing consistency over the entire document collection.

Indexing a document is not a very new concept indeed- if we take a brief look in the Ancient History, we will find that long back in fourteenth century, the first systematic approach to indexing emerged which was true alphabetical indexing. Later as the technology developed fresh ideas kept coming and alphabetical index became catalogue, catalogue became taxonomy, taxonomy gets converted to thesaurus and then using this vocabulary we get automatically generated keywords from Agrotagger.

Any given document's metadata consists of fields like: author, title, keywords etc. but the most reliable of all is keywords. For example: The title "Options for adaption, though limited do exist" is an article about Marine fisheries from the magazine "The Hindu- Survey of Indian Agriculture 2009". Now the given title has no clue about the actual topic of the article. This is where keywords are crucial.

Role of Agrotags in Agrotagger

Arotagger uses Agrotags as candidate keywords for documents. Agrotags are a proper subset of Agrovoc - Agrovoc has about 40,000 agricultural concepts and Agrotags has around 3057. The concepts selected in Agrotags are hand-picked based on their utility in a tagging scheme as well as their popularity. Agrotagger identifies the occurrence of Agrovoc terms in the document, replaces them with an equivalent Agrotags term and then chooses the candidate keyword from among them.

Workflow in Agrotagger

At the top level, Agrotagger works in three main stages:

Stage 1: Identify all Agrovoc terms in the document - the document now is a bag of Agrovoc terms
Stage 2: For each of these Agrovoc terms, identify an Agrotags term; this reduces the document to a bag of Agrotags terms.
Stage 3: Use statistical techniques to calculate the suitability of these terms for keywords

Usage of Agrotagger

It is currently being used by an open access agricultural research repository called openagri, DSpace Repository at ICRISAT. This repository is a open platform to submit any kind of agricultural published material under a single hood, all a user needs is a username and password which is easily attainable by registering into the site. Once a user registers and submits his document, the Agrotagger running in the background automatically generates keywords.

Agrotagger is also available as a web service. To automatically get keywords for your agricultural document (as of now only pdf's and docs) go to: http://agropedialabs.iitk.ac.in:8080/agroTagger/


Agrotagger version II is coming soon with more refined search


0
Your rating: None

Please note that this is the opinion of the author and is Not Certified by ICAR or any of its authorised agents.