Data Transformation

In general terms, if you have some data that you wish to make available but it is held in some format that is not used or understood by the community at large, or is incompatible with other formats or dialects, then there are two alternatives.

  1. The community can adopt the format that this data is published in as their shared dialect.
  2. You can develop some collection of mapping rules which convert the private dialect to a shared dialect.

The choice you make will depend upon the circumstances that you find yourself in. If there is no standard community dialect, or no common understanding of the subject, then your community may want to adopt your standard. However, there are many political issues that go along with this, and whole communities rarely adopt the standard of one group without an extremely good reason to do so.

On the other hand, if there is already a shared understanding of the domain, and you are trying to work with the same 'type' of data which has been published from different legacy datasources which use different models, then it is worth investing the time to develop a shared dialect to describe this understanding, and "mapping" these legacy datasources into this shared dialect. The Fluxion architecture is designed to make this as easy to acheive as possible.

What you need

  • An OWL ontology describing the data model of the original data.
  • An OWL ontology describing the shared data model that the community use.

The OWL ontology describing the data model of your original data can be acquired automatically from a datasource, assuming you produced one as described by the Exposing a Datasource guide.

The OWL ontology that describes the domain - or the shared data model of your community - must be separately developed, if it does not already exist.

Creating a Transformation Engine

First, you will need to develop a "Mapping Rules" document. This is an XML file that can be produced using the available tool in Protégé. This tool is a simple drag-and-drop interface, which allows you to describe how entities from the data-model ontology "map" into the domain ontology.

The Runcible GUI for describing mapping rules:

The Runcible GUI.

The image above shows a small subset of a mapping rules document. This shows concepts from the data-model ontology on the top, and concepts from the domain ontology on the bottom. Each node here represents a collection of entities - either individuals or data values - in OWL. Hovering over one of these nodes in the GUI would give you the OWL description of this set of items (as shown by the visible pop-up menu). By selecting a node on the top half of the screen and dragging it down to a node on the bottom, you can describe a new mapping.

In this example, the screenshot shows us that each entity from ?g, which is a collection of individuals from the class "Gene", maps to three separate entities in the domain ontology - ?pgp, ?drr and ?cr, which are collections of some individuals from the classes "Physical_Genetic_Part", "DNA_Region_Representation" and "Chromosome_Representation" respectively.

This tool allows rules to be described in a structured way, so that links are preserved whilst "moving through" the datasource. This makes it easy to build fairly complex rules, capturing links by means of "nesting" of mapping rules.

More information on this tool will be available as the tool develops

Once the mapping rules for a datasource have been produced, you can build a transformation engine datasource. This datasource performs the transformations described by the rules document on any incoming data, and returns data in the transformed format. This means that you can pass the engine some data from the underlying datasource, and it will be returned in the domain ontology format, or you can perform a query in the domain ontology and have this query transformed into the format of the underlying datasource

What next?

Once you have a transformation engine datasource, it is likely that you will wish to expose this as a fluxion webservice. This process is the same as creating a webservice for producing a publisher webservice (see Exposing a Datasource), by using the skeleton webapp-component module. This wraps up a new webservice containing your transformation engine datasource and bootstraps itself to use this new datasource.

This data can then be consumed directly by semantic-web applications built upon the Fluxion data integration stack. Typically, multiple data publishers will be integrated by a single Integrator to answer an end-user query. Once you deploy your Data Transformer, it becomes part of the pool of resources which the integrators can work over.

 
2010 ©