About LepTree's Technology

To support community coordination and data-sharing, our site combines a full-featured content management system, Drupal, with technologies enabling us to use, create, and share information on the Semantic Web.

Drupal content management system

Drupal is a PHP-based, highly configurable, open source content management system. We take advantage of its built-in user management tools, and its content creation tools, which make it easy for non-programmers to easily create pages and upload and organize images. We also use Drupal’s discussion forum and comment features, because we want active participation everywhere in the site. Role management (who has the power to do what on the site) is easily handled by the simple_access module. Other Drupal modules we find particularly useful include img_assist, image_gallery, latest_news, pathauto, pathfilter, site_map, statistics, statistics_trends, and xstatistics.

We currently run Drupal 4.7.3 using a MySQL database.

Semantic web technologies

We want to maintain our information in machine-harvestable semantically-rich formats. The Semantic Web vision involves placing online documents encoded not only in HTML, which is readable by humans, but also in RDF (Resource Description Framework) or OWL (Web Ontology Language). These are are XML-based W3C standard languages for representing concepts and relationships in a way that computers can understand the meaning of a statement and perform logic. Making documents available in these formats makes it possible to automate difficult processes such as retrieving and meaningfully integrating data from multiple, dynamic sources.

For example, a computer does not know what to do with a text string like “Lepidoptera have scaled wings” but it is possible to use OWL to place Lepidoptera into semantic context (a kind of animal, in a particular biological hierarchy) and to specify that “scaled wings” is a characteristic having to do with appendages used for flying. Much of the semantic data currently available on the web involves the simple “Friend of a friend” case that describes people and their social networks. In biology, much of the information currently available in ontological format is in the biomedical or molecular biology domains. In these fields, researchers take advantage of semantics to make new discoveries or to make it easier to find, retrieve, and integrate information.

We use customized forms that populate a Sesame triple store (version 1.2.4) using Phesame (a PHP5 interface to Sesame) and the ontologies listed below. PHP-driven web pages dynamically retrieve information from the triple store to display this information to visitors. The references and community directory are handled entirely semantically. We modified Drupal’s profile module to populate the person ontologies, so that information about people could be associated with those projects (and eventually references) with which they are involved. Provenance for these objects is also tracked semantically. Finally, we manage our biological names and evolutionary trees using ontologies we created (see below for details).

We extended the following ontologies for our own purposes:

  • Bib, by Richard Newman (Holygoat), for bibliographic references.
  • RGB ontologies, by Filip Perich (Ebiquity), for tracking people and projects.
  • ETHAN, by the Spire team, for managing information about biological names and evolutionary trees.

Under development are new ontologies and forms to populate them that will build a knowledge base of Lepidoptera characteristics. This endeavor will result in OWL-formatted descriptions that will then be available on the semantic web for anyone to use. They will also be integrated with a process sharing our descriptions with the Arizona Tree of Life site and with other online data archives. Finally, we will be building morphological glossaries that more fully build the ontologies about physical characteristics towards understanding homology across Lepidoptera. For more information on how we plan to use these ontologies, see our proposal.

In the future we plan to manage metadata about our images and other files semantically so that we can track provenance about those objects and display them using LepTaxonTree.

Exposing information in our triple store on the web is an important next step.

LepTaxonTree

LepTaxonTree displays our current understanding of Lepidoptera relationships by querying the Sesame Triple Store. Written in Java, LepTaxonTree is a customized version of SpaceTree, and related to TaxonTree. We currently use information on names and tree structure from two sources: ITIS and the Arizona Tree of Life. These have been imported manually early in 2006, but eventually there will be processes to keep our version of the names and trees synchronized with those and other sources. As TaxonTree provides a graphical searching and browsing interface to information at the Animal Diversity Web, LepTaxonTree will allow visitors to navigate our site by interacting directly with the Lepidopteran evolutionary tree.

Status matrix

The Status Matrix is an interactive presentation of the current status of our molecular project. It is implemented in javascript. The data are stored in MySQL, and are periodically refreshed by using a Java program to transform a spreadsheet maintained by our project personnel.