Axiell WebAPI home page > Documentation > OAI (Open Archives Initiative) support

OAI (Open Archives Initiative) support

The OAI-protocol, based on HTTP and XML, is a metadata harvesting protocol, intended to make the normally invisible contents of internet databases accessible and searchable by search engines through a submitted term.
The oai.ashx api (which is a standard part of the Axiell WebAPI package from version 3.0.21154.1) supports OAI-protocol 2.0. Through an OAI protocol request to a repository (a server on which e.g. the WebAPI has been configured for the OAI-protocol), (meta)data can be extracted from your database, and be made available on the internet where it can be indexed by service providers such as search engines. A full description of the OAI-protocol can be found on https://www.openarchives.org/
The difference between data and metadata in an Axiell Collections database is a matter of agreement. Normally, we regard the information in the database as data, but the description of an object or book, can just as easily be seen as metadata because the content of the book itself is not contained in the database. Collections databases store information about other data or objects: call it data or metadata.
The metadata resulting from an OAI query, may comply to a number of specified standards. The Dublin Core metadata standard is the standard that is provided with the Axiell implementation of OAI. Click here for the XML.-schema.
The standard consists of 15 elements (comparable to fields in a Collections application) in which the metadata is passed on. Since Collections databases contain many more fields than Dublin Core has elements, a limited amount of information is selected from a retrieved Collections record. But you can still define more than 15 elements. Dublin Core is really a narrow base that can be supported by anyone; this makes it easier to exchange information between differently structured data.
In principle, oai.ashx returns all fields from a Collections record (in AdlibXML format). But especially for OAI, oai.ashx uses an XSLT stylesheet (with an Collections-field-to-Dublin-Core-element mapping) set in the web configuration file to transform that search result to the proper metadata format (also in XML), before sending it to the harvester. With this, the metadata search result is transformed to a so-called OAI record (one OAI record per Collections record). Such a record primarily consists of a header and the selected metadata. The header consists of a unique identifier for the retrieved record, and of a date stamp that indicates when the record was last modified. The metadata is of course the retrieved data after transformation.
So the output format of an OAI search result is determined by an XSLT stylesheet on the server. Our standard Dublin Core stylesheet for this purpose is: oai_dc.xsl. Possible other output formats can be based on this stylesheet.

Settings for the adlibweb.xml file

The use of OAI must be configured in the adlibweb.xml file that is also used by the WebAPI. By default, there is no OAI configuration in adlibweb.xml. Adjust the example below (of a complete OAI configuration) to your own situation and add it to adlibweb.xml:

<OAIConfiguration>
  <OAI_REPOSITORY_NAME>My Museum</OAI_REPOSITORY_NAME>
  <OAI_ADMIN_EMAIL>oai@mymuseum.uk</OAI_ADMIN_EMAIL>
  <OAI_ADMIN_EMAIL>admin@mymuseum.uk</OAI_ADMIN_EMAIL>
  <DM>dm</DM>
  <OAISETS>
    <OAISET>
      <SetSpec>collect</SetSpec>
      <Name>Default</Name>
      <Database>collect</Database>
      <SearchStatement>all</SearchStatement>
      <OAI_METADATAPREFIXES>
        <OAI_METADATAPREFIX>
          <Name>oai_dc</Name>
          <StyleSheet>oai_dc.xslt</StyleSheet>
<!--      <Schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</Schema>
          <MetadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</MetadataNamespace>-->
        <OAI_METADATAPREFIX>
        <OAI_METADATAPREFIX>
          <Name>oai_adlib</Name>
          <!-- use oai_adlib.xslt as stylesheet if you want adlibXML as output -->
          <StyleSheet>oai_adlib.xslt</StyleSheet>
<!--      <Schema>http://www.adlibsoft.com/adlibXML.xsd</Schema>
          <MetadataNamespace>http://www.adlibsoft.com/adlibXML</MetadataNamespace>-->
        <OAI_METADATAPREFIX>
      </OAI_METADATAPREFIXES>
    <OAISET>
  </OAISETS>
</OAIConfiguration>

Explanation:
  • OAI_REPOSITORY_NAME: your repository name (choose a sensible one).
  • OAI_ADMIN_EMAIL: administrator e-mail address.
  • DM: specify the Collections field tag containing the modified date of the record, usually this is dm. In a lot of databases, in a date-of-modification field, the date on which a record was last edited, is stored. When you make your database available on the internet by means of the Open Archive Initiative, you can use the DM variable in the adlibweb.xml file to provide the tag of said field, so that search engines or other clients can only retrieve the records (or index them) that have been changed after a certain date, for instance after the previous visit to your OAI server. An OAI request with a date selection for a database in which no DM is specified, returns all records.
  • OAISETS: you may specify one or more OAI sets, each in its own OAISET, to address different OAI harvest requests. If no set is called in the harvest request, the first one specified in here, in the adlibweb.xml file, will be used.
  • OAISET: specify the details for a single harvest request, including its allowed metadata prefixes.
  • SetSpec: an identifying name without spaces for this OAI set (choose a sensible name), optionally to be used in harvest requests.
  • Name: (underneath OAISET) a descriptive name for this OAI set, only returned with verb=listSets
  • Database: the name of a <databaseConfiguration> elsewhere specified in the adlibweb.xml file, (so not the name of an .inf file).
  • SearchStatement: a WebAPI style search statement (using the same character escaping) which will be executed when the current set is addressed in the harvest request. See the WebAPI search command topic for a full description (including how to request the records from a saved search (aka pointer file).
  • OAI_METADATAPREFIXES: specify the allowed metadata prefixes for this OAI set.
  • OAI_METADATAPREFIX: the specification of a single metadata prefix.
  • Name: (underneath OAI_METADATAPREFIX) set a name with which to address the current metadata prefix in a harvest call.
  • StyleSheet: specify the stylesheet to be used to format the result from the harvest request. Some ready-made stylesheets are present in the WebAPI folder containing the adlibweb.xml file as well.
  • Schema: optionally, the URL to the .xsd specifying the intended XML schema of the harvest result, for validation purposes. The Axiell OAI server does nothing with this information though, except passing it to the client if requested with the listMetaDataFormats verb.
  • MetadataNamespace: optionally, the URI of the metadata namespace for the XML schema. As with the <Schema> element, the Axiell OAI-PMH implementation does nothing with this element, except for passing it to the client in the listMetaDataFormats response. To set the namespace of the <metadata><record> nodes in the returned XML, use the default namespace (xmlns) in the XSLT like so: <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns="http://www.adlibsoft.com/2002/adlibXML">. Also, when in the XSLT <xsl:copy> is used for copying complete elements, this will add an empty namespace attribute to the returned elements (<priref xmlns="">), which may cause issues with harvesting software.

Function call

When the keyword verb is entered in the function call of oai.ashx, the server will process the search query as an OAI-request.
In principle, the search result of an OAI-request is yielded as an XML file, and so the client has to convert it to an HTML page first in order to show the result as a web page. Nevertheless, browsers can also present an XML file in code, which is sufficient for testing purposes. The syntax of a search query and the ways you can enter them are the same as for a normal search query for the WebAPI, be it that behind the question mark in the CGI string a number of special variables has to be entered due to the OAI protocol. To make for instance an Identify-request, use the following syntax:

…oai.ashx?verb=Identify

The Identify value provides information about the relevant repository which has been made available through the OAI protocol, such as the repository name (e.g. the name of your collection), the base URL to address OAI search queries to, the protocol version that is supported, the e-mail address of the repository system administrator, and any additional information.
A standard OAI protocol request has at least one name/value pair to specify the request by the client. The above Identify request is an example of this, but instead of Identify, each of the standard OAI protocol requests can be used. The number and the nature of extra name/value pairs depend on the arguments for the specific protocol request, e.g.:

…oai.ashx?verb=GetRecord&identifier=3&metadataPrefix=oai_dc

for a GetRecord call of the record with local identifier (Collections record number/priref) 3. The metadataPrefix oai_dc specifies the use of the Dublin Core <OAI_METADATAPREFIX> output format, as specified in the relevant OAI_SET.
Also when you submit a ListIdentifiers or ListRecords request (see further in this topic), then with the metadataPrefix parameter you specify (indirectly) which stylesheet should be used.
Some ready-made stylesheets are present in your main WebAPI folder, including oai_dc.xslt for the XML transformation to Dublin Core. On the basis of this stylesheet you could make you own stylesheet with another name, if you desire. Save it in the same folder.

Protocol requests

A client can submit various requests to a repository. Every request is the value of a verb, namely: verb=<request> and follows directly behind the question mark in a search query.
The most important requests are summed up underneath. Detailed information about their syntax and the responses they trigger can be found here. (Requests are case-sensitive.)

  • GetRecord: This verb is used to extract an individual metadata record from a repository. Mandatory arguments specify the identifier of the requested record, and the output format of the metadata (e.g. oai_dc).
  • Identify: Identify gathers information about the repository. No arguments.
  • ListIdentifiers: This verb is used to gather the local identifiers (prirefs) of all records that are available in the repository. Optional arguments enable selective gathering, for instance based on the date they were last modified.
  • ListRecords: This verb is used to gather all* the records available in the repository. Optional arguments enable selective gathering, for instance based on the date they were last modified.
    * 501 records are maximally retrieved per request. Subsequent partial lists of the same search result may be retrieved via so-called resumptiontokens (see further down).
  • ListSets: lists the setSpec and setName of the OAI set(s) specified in the OAI configuration of the repository. No arguments.

Resumption tokens

When the harvest search result is a very long list, only a limited portion of it will be returned by default (501 records) and accompanied by a resumptionToken (at the bottom of the XML result). Such a resumption token is necessary to open the next part of the list, as an argument in a subsequent OAI request.
Note that if in a request you use a resumptionToken, you must not provide a metadataPrefix, nor any other argument: the last used arguments will automatically be applied again.
Suppose you’ve entered a first ListRecords request, which resulted in a list with for instance 10 records, and a resumption token “97EODx6A3phYxIY”:

../oai.ashx?verb=ListRecords&metadataPrefix=oai_dc

The next 10 records of the search result are now retrieved with:

../oai.ashx?verb=ListRecords&resumptionToken=97EODx6A3phYxIY

In the new list you’ll find another resumptionToken if more records are available. A token expires after 24 hours.

OAI server validation

Use an online OAI validator like OAI-PMH validator to test and validate your oai.ashx server.

Excluding certain records from OAI search results

With the oai.ashx server you open up a database for OAI searches from the internet. However, it may be that not all records in that database should be publicly accessible. Luckily it’s possible to exclude specific records from OAI search results.

We typically use a separate Active Directory account named cmuiisuser to indicate anonymous internet users and this account name must then also be registered as a user in the Axiell Collections application which must use the record authorisation mechanism to exclude certain users. This user name must then be entered per record to be excluded from an OAI search result, prior to any harvesting. See the Use the authorisation functionality paragraph in the User authentication and access rights topic in the Axiell Designer Help for an explanation about setting up this type of access restriction.