Skip to main content


Posted in

Taxonomy is the organization of a particular set of information for a particular purpose. It comes from biology, where it's used to define the single location for a species within a complex hierarchic. Biologists have arguments about where various species belong, although DNA analysis can resolve most of the questions. In informational taxonomies, items can fit into several taxonomic categories.

Categorization is the process of associating a document with one or more subject categories. So the entry for a page on cross trainer shoes could go into Running, Manufacturing, Sports Medicine, or Rushkoff, Douglas! All of these are legitimate, depending on the context.

Ontology is the study of the categories of things within a domain. It comes from philosophy and provides a logical framework for academic research on knowledge representation. Work on ontologies involves schema and diagrams for showing relationships in Venn diagrams, trees, lattices and so on.

Cataloging and Classification come from libraries, where specialists enter the metadata (such as author, date, title and edition) for a document, apply subject categories to it, and place it into a class (such as a call number) for later retrieval. These tend to be used interchangeably with Categorization.

Clustering is the process of grouping documents based on similarity of words, or the concepts in the documents as interpreted by an analytical engine. These engines use complex algorithms including Natural Language Processing, Latent Semantic Analysis, Bayesian statistical analysis, and so on.

A Thesaurus is a set of related terms describing a set of documents. This is not hierarchical: it describes the standard terms for concepts in a controlled vocabulary. Thesauri include synonyms and more complex relationships, such as broader or narrower terms, related terms and other forms of words.

Your rating: None

Please note that this is the opinion of the author and is Not Certified by ICAR or any of its authorised agents.