9 2 Controlled Terminology, Vocabularies, and Ontologies

Saleh Altaher Group 1
Bian Alhemidi Group 1


The purpose of this article is to give a brief overview of controlled vocabularies, terminologies, and ontologies. It also explains why these resources are essential for knowledge preservation and data mining, as well as how they are created.

1. The Definition of Terminologies, Vocabularies, and Ontologies

When textual data needs to be validated for operational purposes, the need for regulated vocabulary frequently emerges. The improvement of query recall is the primary initial impetus for data entry harmonization. Keywords can be used to accomplish indexation in its most basic form. However, if relying solely on user input, the likelihood of typographical errors rises as the user base grows. The accuracy of search results is harmed over time as a result of these inevitable events, which is why sets of predefined values are provided. The noise is diminished.

This may, however, come at the expense of accuracy because the established phrases could not apply to every situation. Additionally, user mistake in selecting the wrong phrase is not avoided and results in a different kind of error.

A regulated terminology is a prescriptive group of phrases with defined spellings that may also include definitions, synonyms, editors, versions, and licenses defining usage restrictions.

Word metadata is a group of details about a particular regulated terminology term. Because terms in a controlled terminology appear as a flat list, no formal representation of the relationships between the entities the controlled language represents is made. Controlled terminologies, which are frequently created to serve a data model or an application, have this as their primary flaw and restriction.

An ontology on the other hand, is a formal representation of a domain knowledge where concepts are organized hierarchically. The qualifier formal refers to a set of axioms and rules based on logic (e.g. first order logic) to structure, organize and check the consistency of the term hierarchy. As one can sense right away, ontologies are often a more sophisticated artefact, supported by more advanced theoretical frameworks and dedicated tools to develop them (e.g. Protégé, TopBraid Composer, OBO foundry INCAtools or Robot tool).

A taxonomy and an ontology both serve the purpose of characterizing a certain area of knowledge, making them comparable. An ontology officially represents ideas that explain things and their relationships using a regulated language or concept identifier.

2.What makes them helpful?

A regulated terminology is most immediately used to maintain data entry uniformity, as was mentioned in the beginning. Controlled terminologies are crucial tools for enhancing query recall and data indexing. Beyond this initial use, ontologies and regulated vocabularies are useful. Biomedical ontologies' primary goal is to organize knowledge in a way that allows software agents to manipulate it.

It's important to recognize that the two processes coexist and run concurrently. New findings are made when additional tests are conducted. In order for the new ideas to be utilized to annotate the outcomes of prior trials in the context of retrospective analysis, this new information has to be recorded in the domain ontology.

One resource that is frequently used to explain molecular processes, biological functions, and molecular components is the Gene Ontology (GO). The Genome Wide Gene Ontology Annotations are also made available by the Gene Ontology Consortium, which also maintains the regulated vocabulary. These are resources that link GO keywords to genes and genomic traits discovered in those genomes. These are really valuable tools, particularly when it comes to genome-wide analyses like transcriptomics profiling study.

In order to identify which biological processes are most impacted by certain circumstances and deviations from the predicted probability distribution in an expression profile, enrichment analysis, a specific sort of analysis, depends on the availability of such annotations.

There are many uses for them. The requirement for datasets that are suitable for machine learning and the need to accelerate dataset preparation will only increase the significance of ontologies for organizing information. This is the main goal of FAIR.

Therefore, ontologies are especially useful for the following tasks:

Enhance search recollection: When given a "search string," a search index can obtain material that has been tagged with a synonym by using a resource that stores synonyms.

Facilitate query exploration: Because ontologies are organized in a hierarchical (parent/child) manner, a search index that takes use of this knowledge can return any datasets annotated with a child term of items matching the input search phrase. Using the term "breast cancer" in a search against an ontology-aware search index, for instance, may produce results containing annotations for Paget's disease or ductal carcinoma in situ (DCIS), both of which are mammary gland cancers.

Create knowledge graphs: Ontology languages may be used to describe example datasets as nodes in a graph and connect resources, but the same technologies also make great tools for representing domain knowledge and creating reference terminology.

3.Types of Ontology

Domain Ontology:
An ontology for a particular domain, such as biology or politics, represents ideas that fall under that subject's purview. Typically, each domain ontology models words' domain-specific definitions. The term "card," for instance, may signify several distinct things. The term "playing card" would be modelled in an ontology about the poker domain, but "punched card" and "video card" would be modelled in an ontology about the computer hardware domain.

Since various authors create diverse domain ontologies, they express concepts in highly particular and distinctive ways, and are frequently incompatible with one another within the same project. As domain ontologies-dependent systems grow, they frequently need to integrate domain ontologies by manually fine-tuning each item or by utilizing a combination of software merging and manual hand-tuning. This poses a problem for the person who designed the ontology. Different ontologies in the same domain develop as a result of various languages, ontology intended uses, and domain perceptions (based on cultural background, education, ideology, etc.)

Upper Ontology:
A representation of the typically shared relations and objects that are generally relevant across a large variety of domain ontologies is called an upper ontology (or foundation ontology). It often makes use of a core glossary that covers the words and object descriptions that go along with them as they are used in various pertinent domain ontologies.

BFO, BORO approach, Dublin Core, GFO, Cyc, SUMO, UMBEL, the Unified Foundational Ontology (UFO), and DOLCE are a few examples of standardized higher ontologies that can be used. Some people have utilized WordNet as an upper ontology and as a linguistic tool for learning domain ontologies.

Hybrid ontology:
An illustration of an upper and a domain ontology coming together is the Gellish ontology.

Contemporary ontologies share many structural similarities, regardless of the language in which they are expressed. Most ontologies describe individuals (instances), classes (concepts), attributes and relations. In this section each of these components is discussed in turn.

Common components of ontologies include:

Instances or objects (the basic or "ground level" objects)
Sets, collections, concepts, classes in programming, types of objects or kinds of things
Aspects, properties, features, characteristics or parameters that objects (and classes) can have
Ways in which classes and individuals can be related to one another

Function terms
Complex structures formed from certain relations that can be used in place of an individual term in a statement

Formally stated descriptions of what must be true in order for some assertion to be accepted as input

Statements in the form of an if-then (antecedent-consequent) sentence that describe the logical inferences that can be drawn from an assertion in a particular form

Assertions (including rules) in a logical form that together comprise the overall theory that the ontology describes in its domain of application. This definition differs from that of "axioms" in generative grammar and formal logic. In those disciplines, axioms include only statements asserted as a priori knowledge. As used here, "axioms" also include the theory derived from axiomatic statements

The changing of attributes or relations
Ontologies are commonly encoded using ontology languages.

5. Conclusion

Selecting an ontology and semantic resources is a difficult decision that needs to be well thought out, taking into account the research context of the data creation workflow and any applicable regulatory constraints. The decisions made affect a dataset's integrative potential as well as its level of interoperability. It is obvious that disclosing the semantic resources used to annotate a dataset has an impact on its findability and reusability, and doing so is a good idea since it enables potential users to estimate the amount of mapping effort that could be needed to merge two datasets.


Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License