Knowledge Exploration Technologies Laboratory

Relational databases and ontologies

Information can be stored in a variety of formats. Traditional paper-based mediums employ a variety of visual cues (typeface, indentation, numbering) to structure information in repositories such as encyclopedias, dictionaries or thesauri. The information is then retrieved via several access points, using table of contents, page numbers, sections and indices in the back of the book.

Modern computerised storage prefers a simple system of syntactic markers (such as XML or comma separated lists) over visual cues. The markers can be human-readable (such as XML) or can be machine-readable only (such as databases and hashfiles) in which case the information must be entered and retrieved through specialised interfaces. The system of syntactic markers is usually combined with a conceptual model that supports structuring and retrieval of the information. Examples are entity relationship models for relational databases and ontology languages for the Semantic Web.

Currently the most popular open-source relational database management systems are MySQL and Postgres.

Ontologies provide a formal description of knowledge structures for the purposes of knowledge use, sharing and analysis as required, for example, by the Semantic Web. Ontologies consist of

  • classes, concepts or types
  • relations or relationships (which are special, n-ary classes)
  • attributes, features, properties, slots or roles
  • instances, individuals, objects or entities

Classes provide a container for their instances and for the attributes that apply to these instances. They form a hierarchy so that subclasses inherit attributes from superclasses but have fewer instances than their superclasses. For example, if Poodle is a subclass of Dog, then all instances of Poodle must be Dogs and have all Dog attributes. But Poodles can have additional attributes, which Dogs do not have and there are instances of Dog which are not Poodles. Such class hierarchies are mathematically modelled in Formal Concept Analysis. Classes can have other relationships with other classes apart from the class hierarchy relationship. These types of relationships can be modelled in entity relationship models, semantic networks or Conceptual Graphs.

The notion of class hierarchies and conceptual graphs is basis for a variety of tools and techniques that focus on different applications but are essential similar in nature:

  • relational databases -- optimised for storage and retrieval of large amounts of information;
  • object-oriented programming -- focusing on methods and user interfaces (often supported by UML and similar languages);
  • ontologies -- focusing on knowledge sharing, use and analysis;
  • library thesauri -- from the era of paper-based information, focusing on retrieval;

There are numerous tools for ontology editing as evidenced by a 2002 Overview of ontology editors. One of the more popular current ontology tools is Protege. A paper by Noy and McGuinness explains Protege's functionality. Modern commercial relational databases incorporate object-relational features, which renders them fairly similar to ontologies as evidenced by this Tutorial for Oracle 9i.

Information visualisation

In contrast to traditional paper-based information displays (maps, graphics, lists) that are often carefully designed and elaborated, but completely static, computerised information is potentially infinitely flexible. For example, the relational database language SQL is essentially equivalent to First Order Logic which means that any query that is expressible in First Order Logic can be formed. Unfortunately, human natural language and cognition can be quite different from formal logic and many users find it quite difficult to form complex queries in a logical format. Graphical queries and visual information representation is often preferable to pure logical interfaces. Such interfaces are usually more flexible than paper-based displays (because many different displays can be generated at the click of a button). But it is still a challenge to translate any logically possible query into a graphical representation.

Edward Tufte wrote several books about "envisioning information" mostly for paper-based formats. He argues that graphical displays facilitate "visual reasoning" as long as information is displayed in a manner facilitating "local comparisons within eye-span". Although the designers are selecting and structuring information to be displayed, a good display ensures that viewers have maximum control of the information. (Tufte's view on "slideware" illustrates some of these points.)

Presumably, computer environments should also aim at giving users maximum control of the information but at the same time prohibit misleading displays (due to faulty queries or misunderstandings on the user's side) without requiring users to undergo extensive training. Again presumably this requires some underlying conceptual modelling of information in form of ontologies or similar that is known to users and implemented in the visualisation software.

Toolkits that provide maximum flexibility, such as Graphviz, Infovis, Piccolo, Katy Börner's Infovis Cyberinfrastructure or these tools usually exist in the form of software libraries that require a substantial amount of programming and experience. These tools require to be customised for applications.

If there ever was a tool which could supply instant visualisations at the click of a button for any data in a relational database, it would require a detailed understanding of information at a semantic (or semiotic) level. A very first step into such a direction consists of classifying types of visualisation tasks and techniques so that they can be mapped to each other. The OLIVE (On-line Library of Information Visualization Environments) taxonomy provides a first step in that direction. Another example of such an approach is a paper by Priss and Old that connects types of conceptual structures with cardinalities of relations and visualisation tasks (in this case in the area of lexical database applications).

In many cases, collections of examples, such as Cybergeography, might serve as a starting point for any visualisation task. Once the type of visualisation task is established a secondary challenge consists of determining how to fit the selected information onto an eye-span-size screen. For example, tree hierarchies represent a common type of visualisation, which can be represented using file hierarchy displays (as used by computer operating systems for navigating files and directories/folders), or Fisheye expansion techniques or Hyperbolic Space. General network visualisations can often be optimised for display using a Touchgraph algorithm.

Search Roget's Thesaurus | Search WordNet | Home | Background | Research | Glossary | Linux Links | Contact

Copyright Uta Priss 2006