LOTERRE - Linked Open TERminology REsources

What can be done with the “Transform” Service?

The “Transform” service enables to obtain terminology in SKOS-XML format or to convert a terminology initially in SKOS-XML format into another format.
The modules offered by this service can be grouped into three types: correction, enrichment and conversion.

Correction

These modules are in particular intended to correct anomalies previously detected by the “Control” service.

Remove term duplicates in a SKOS/RDF-XML file

Files containing “skos:Concept” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Concept’]]” are processed by this service.

At the level of each concept, this service operates as follows:

  • All the preferred labels are kept;
  • Alternative labels are compared with each other and with the preferred label of the same language:
    • if the same term appears several times as an alternative label of the same language, only one occurrence is kept;
    • if the same term appears both as an alternative label and the preferred label of the same language, only the preferred label is kept.
  • Hidden labels are compared with each other, with the preferred label, and with alternative labels of the same language:
    • if the same term appears several times as a hidden label of the same language, only one occurrence is kept;
    • if the same term appears both as a hidden label and the preferred label of the same language, only the preferred label is kept;
    • if the same term appears both as a hidden label and an alternative label of the same language, only the alternative label is kept.

At the end of this process, check the file again using the Controlling a SKOS/RDF-XML file at the concept level service to ensure that there are no more duplicates.

Correction of symmetry anomalies of related concepts in a SKOS/RDF-XML file

Files containing “skos:Concept” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Concept’]]” are processed by this service.

If a concept A is associated to a concept B through the skos:related property, the concept B must be associated with the concept A because the relation is symmetrical. Cf. SKOS Reference (Axiom S23).

If this condition is not checked, this service allows inserting the missing “skos: related” property.

Note that this treatment does not apply to any sub-properties of the skos: related property.

Insertion of specific concepts

Files containing “skos:Collection” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Collection’]]” or “rdf:Description[rdf:type[@rdf:resource=’http://purl.org/iso25964/skos-thes#ConceptGroup’]]” are processed by this service.

The hierarchical relationship between a collection A and a collection B is expressed using the “isothes:superGroup” property. The presence of an “isothes:subGroup” property (which is the inverse relationship) at the level of collection B is not mandatory because it is inferred from the “isothes:superGroup” property.

However, the proper functioning of some applications (like Skosmos) requires the presence of both relationships. This service allows to insert at the level of the broader collection as many “isothes:subGroup” properties as narrower collections of this collection.

Enrichment

Insertion of a ‘ConceptScheme’ class and a licence in a SKOS/RDF-XML file

Files containing “skos:Concept” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Concept’]]” are processed by this service.

This service inserts two classes at the beginning of a SKOS/RDF-XML file:

– a “cc:License” class with the default CC-BY 4.0 Creative Commons license that should be changed if the resource is released under a different license.

– a “skos:ConceptScheme” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#ConceptScheme’]]” class with:

  • an URI derived from concept identifiers;
  • properties for metadata to be completed / modified by the user at the output file level:
  • English, French and Spanish titles (dc:title),
  • English, French and Spanish descriptions (dc:description),
  • English, French and Spanish subjects (dc:subject),
  • creator name (dc:creator),
  • license name (cc:license),
  • English, French and Spanish names of organization / institution to which the resource must be attributed (cc:attributionName),
  • web site of organization / institution to which the resource must be attributed (cc:attributionURL),
  • top-concepts (skos:hasTopConcept) if the resource is highly structured,
  • resource languages as calculated from language tags of preferred labels of concepts (dcterms:language with lexvo/ISO 639-3 code attribute),
  • creation date (dcterms:created),
  • last modification date (dcterms:modified),
  • version (owl:versionInfo).

After the fields have been generated, their textual content must be completed and validated by the user.

Insertion of a property ‘hasTopConcept’

 Files containing “skos:ConceptScheme” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#ConceptScheme’]]” are processed by this service.

This service inserts a “skos:hasTopConcept” property into the “ConceptSCheme” block for each concept that does not have a “skos:broader” property.

Do not use this service for unstructured or loosely structured resources.

Insertion of a property ‘topConceptOf’

Files containing “skos:Concept” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Concept’]]” are processed by this service.

This service inserts a “skos:topConceptOf” property into each concept that does not have a “skos:broader” property.

Do not use this service for unstructured or loosely structured resources.

Insertion of narrower collections

Files containing “skos:Collection” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Collection’]]” or “rdf:Description[rdf:type[@rdf:resource=’http://purl.org/iso25964/skos-thes#ConceptGroup’]]” are processed by this service.

The hierarchical relationship between a collection A and a collection B is expressed using the “isothes:superGroup” property. The presence of an “isothes:subGroup” property (which is the inverse relationship) at the level of collection B is not mandatory because it is inferred from the “isothes:superGroup” property.

However, the proper functioning of some applications (like Skosmos) requires the presence of both relationships. This service allows to insert at the level of the broader collection as many “isothes:subGroup” properties as narrower collections of this collection.

Assignation of ARK identifiers

Files containing “skos:Concept” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Concept’]]” are processed by this service.

This service allows the replacement of the identifiers (URI) of a SKOS/RDF-XML file by ARK identifiers built according to the recommendations of the California Digital Library (CDL).

An ARK identifier has the following syntax:

  • The NMA (Name Adressing Authority), its role is to make the URL clickable in a web browser,
  • The actual ARK identifier which consists of:
    • the “ark:/” label,
    • a NAAN (Name Assigning Authority Number) identifying the naming organization which is attributed on demand by the CDL.

The transformation is performed in two stages:

1- Replacement of the resource URI (at the level of the concept scheme) by the following generic URI: http://my_site.fr/ark:/NAAN/ABC. The old URI is kept in a “dc:identifier” field.

At the concept level, an 8-character alphanumeric sequence followed by a dash and then a “check sum” completes this prefix and constitutes a unique ARK identifier for each concept of the resource.

Prefix Unique identifier
http://my_site.fr/ark:/NAAN/ABC -CGT6ZZBQ-F

2- URI recalculation for:

  • each of the “skos:broader”, “skos:narrower”, “skos:related” relations,
  • the possible “skosxl:prefLabel”, “skosxl:altLabel”, “skosxl:hiddenLabel” properties,
  • the members of possible collections,
  • the possible “skosxl:Label” elements.

To generate ARK identifiers that comply with CDL recommendations (see details here), the generic URI must be replaced as follows:

  • Replace the sequence “http://my_site.fr” (Adressing Authority) by the good URL,
  • Keep the “/ark:/” label,
  • Replace “NAAN” (Name Assigning Authority Number) by organization NAAN,
  • Replace “ABC” by an alphanumeric short code corresponding to the resource itself.

Here is a real example: http://data.loterre.fr/ark:/67375/1WB

Note that in the absence of NAAN, the URI can not be considered an ARK identifier but can nevertheless be used without the ark:/NAAN/ part, the last part being a unique identifier.

Conversion

Loterre offers various conversion modules.

Transform a CSV file into a SKOS/RDF-XML file

This transformation allows to generate a SKOS file from a spreadsheet (Excel, OpenOffice, etc.) saved as CSV.

Loterre offers two variants of this service, depending on whether the field separator in the CSV file is a semicolon or a comma:

  • Transform a CSV file whose separator is a semicolon into a SKOS/RDF-XML file
  • Transform a CSV file whose separator is a comma into a SKOS/RDF-XML file

n.b.: with a CSV file whose separator is a semicolon use double quotation marks (” / quote) as text delimiter for fields that contain semicolons as ponctuation signs. Add quotation marks around such fields to avoid spliting of text at semicolon. If text contains quotes, they must be doubled.

The input file must:

  • use this separator «§§» for multi-valued fields (example: hormone§§drug),
  • use the following labels for the different fields:
Terminological data Label to use
xx = 2 digit ISO code for language (*)
Comment
Preferred label prefLabel_xx A “preflabel_en” is expected
Alternative label altLabel_xx
Hidden label hiddenLabel_xx
Definition definition_xx
Note note_xx
Scope note scopeNote_xx
Editorial note editorialNote_xx
History note historyNote_xx
Change note changeNote_xx
Example example_xx
Broader term broader_xx A “broader_en” is expected
Related term related_xx A “related_en” is expected
Group (collection) group_xx A “group_en” is expected
Exact match exactMatch
Close match closeMatch
Broad match broadMatch
Narrow match narrowMatch
Related match relatedMatch

(*) Replace “xx” by 2 digit ISO code for language; example “prefLabel_en” for the English preferred label. See list of ISO 639-1 codes.

The data is transformed as follows:

  • A SKOS/RDF-XML file is created to hold the entire terminological resource.
  • Each line except the first one becomes a “skos:Concept” , if an identifier is present, it is attributed to the concept; otherwise, a temporary URI is assigned to it in the “rdf:about” attribute.
  • The labels in the first line are converted to their SKOS counterpart, for example, prefLabel_en becomes “skos:prefLabel” with an attribute “xml:lang=”en””.
  • The content of each cell is put into the appropriate SKOS property. If the content is multi-valued, it is split into as many properties as values separated by the separator “§§”.
  • The related and broader relationships are processed in two stages: firstly, a “skos:related” or “skos:broader” property is generated for each related or broader terms then in a second step, it is the URI of the concept corresponding to the terms in question which is put in the attribute “rdf:resource” .
  • If the file has groups, a “skos:Collection” is created for each group.

In addition, the transformation also inserts two blocks at the beginning of the SKOS/RDF-XML file:

– a “cc:License” block with the default Creative Commons CC-BY 4.0 license that should be changed if the resource is released under a different license.

– a “skos:ConceptScheme” block with:

  • an URI derived from concept identifiers;
  • properties for metadata to be completed / modified by the user at the output file level:
  • English, French and Spanish titles (dc:title),
  • English, French and Spanish descriptions (dc:description),
  • English, French and Spanish subjects (dc:subject),
  • creator name (dc:creator),
  • license name (cc:license),
  • English, French and Spanish names of organization / institution to which the resource must be attributed (cc:attributionName),
  • web site of organization / institution to which the resource must be attributed (cc:attributionURL),
  • top-concepts (skos:hasTopConcept) if the resource is highly structured,
  • resource languages as calculated from language tags of preferred labels of concepts (dcterms:language with lexvo/ISO 639-3 code attribute),
  • creation date (dcterms:created),
  • last modification date (dcterms:modified),
  • version (owl:versionInfo).

If the concepts do not have identifiers, the default URI of the resource is “http: //www.mysite/vocabs/ABC”. It is also the root of the URI of concepts, relationships and possible collections. It must be replaced as follows:

  • Replace “http://www.mysite/” by the correct URL.
  • Keep “/vocabs/”.
  • Replace “ABC” by a short alphanumerical code that will identify the resource.

At the concept level, the URI is a concatenation of the resource’s URI with a unique identifier; at the collection level, the URI is a concatenation of the resource’s URI with the group name by replacing the spaces with “_”.

To switch to ARK identifiers, use the transformation “Assign ARK identifiers to a valid SKOS/RDF-XML file”.

Transform a SKOS/RDF-XML file to a CSV file

Files containing “skos:Concept” or “rdf:Description[rdf:type[@rdf:resource=’http://www.w3.org/2004/02/skos/core#Concept’]]” are processed by this service.

Loterre offers two variants of this service, depending of the field separator desired in the resulting CSV file:

  • Transform a SKOS/RDF-XML file to a semicolon-separated CSV file”
  • Transform a SKOS/RDF-XML file to a comma-separated CSV file”

The output file can be imported into a spreadsheet (Excel, LibreOffice, etc.) for editing (see the import procedure in Excel below).

The data are transformed as follows:

A first row “column headers” is created from the elements (skos or other properties) used to describe the different concepts of the SKOS/RDF-XML file:

  • An “ID” tag is created for concept identifiers.
  • Properties with an “xml:lang” attribute are listed by concatenating the element name (without namespace) with the language code (for example, “skos:prefLabel/@xml:lang=”en'” gives the label “prefLabel_en”).
  • For properties that have an attribute other than “xml:lang”:
    • those corresponsding to the semantic relations (“skos:broader”, “skos:narrower” et “skos:related”) are translated into “broader_en”, “narrower_en” and “related_en”,
    • the others (mapping properties, etc.) are output with the name of the element only (without namespace, for example,”exactMatch” for “skos:exactMatch”).
  • Properties that have no attributes are output with the element name only (without namespace).
  • If the file contains collections, a “group_en” label is created. This label can be redundant if the concepts contain properties reflecting their belonging to groups (domain, microthesaurus, etc.).

Then, a line is generated for each concept of the file:

  • the value of the “rdf:about” attribute is put in the “ID” column,
  • the content of the textual elements (terms, definitions, notes, etc.) is put in the column corresponding to that element and to the language code of that element,
  • hierarchical and associative relations (links) are replaced by the corresponding English preferred terms,
  • the content of the other elements is output as is,
  • if the concept belongs to a collection, the English name of the collection is put in the “group_en” column.

It should be noted that:

  • the contents of the different fields are put between quotes (quotation marks) to avoid the problems of separation when these contents contain the semicolon as element of punctuation,
  • if the content of a field contains quotes, they are doubled to protect them,
  • the contents of multiple-occurrence fields (for example, “skos:altLabel”) are dropped into the same “cell” but separated by this separator “§§”.

To import a CSV file to Excel:

  • Create a new file in Excel (“File” / “New”).
  • Click on “Data” menu, choose “From Text” and then choose the file to import.
  • Import the file (“Import” button).
  • At the Text Importation Wizard:
    • choose “Delimited”,
    • at the “File origin” menu, choose “65001 : Unicode (UTF-8)”
  • Click the “Next” button:
    • At the “Delimiters” column, select “Semicolon”,
    • Keep quotes (“) as “Text qualifier”,
    • Check the imported data with the “Data Preview”,
  • Click the “Finish” button.

The file modified in Excel and saved as CSV file can be transformed into SKOS using the service “Transform a semicolon-separated CSV file into a SKOS/RDF-XML file” or “Transform a comma-separated CSV file into a SKOS/RDF-XML file” depending on the type of separator used while saving the CSV file.

Transform a SKOS-XML into a HTML file

This transformation allows to generate an HTML file from a valid SKOS file. It processes files containing “skos:Concept” or “rdf:Description” of type “Concept”.
Two variants are proposed by Loterre, depending on the language version chosen:

  • Convert a valid SKOS/RDF-XML file into an html file – French version
  • Convert a valid SKOS/RDF-XML file into an html file – English version

The terminology entries are presented in the alphabetical order of preference (French or English):

  • the terms (preferred and synonyms) in the chosen language
  • definitions and notes in the chosen language
  • relationships (generic, associated and specific terms)
  • preferential terms in other languages
  • possible membership groups
  • alignments
  • any bibliographical references
  • the source(s) of the concept

The richness of the information displayed will depend on the content and structuring of the original SKOS file.

Transform a SKOS-XML into a PDF file

This transformation generates a PDF file from a valid SKOS file.

Two variants are proposed by Loterre, depending on the language version chosen for the resource:

  • Transform a SKOS/RDF-XML file into a PDF file corresponding to the French version of the resource
  • Transform a SKOS/RDF-XML file into a PDF file corresponding to the English version of the resource

Several sections are produced depending on the content and structure of the file:

  • Alphabetical index
  • Detailed terminology entries (in French or English) with:
    • terms, definitions, notes
    • relationships (generic, associated and specific terms)
    • preferred terms in other languages
    • potential membership groups
    • alignments
    • any bibliographical references
    • the source(s) of the concept
  • The list of entries with:
    • the French preferential
    • the English preferential
    • the page
  • The tree structure if the resource is structured.
  • Collections if the resource contains groups.

Additional pages are inserted:

  • simple cover page with the title of the resource (French or English)
  • cover page with:
    • title (French or English) of the resource
    • version
    • last update date
    • description (French or English)
    • legend for detailed entries
    • CC-BY 4.0 license plus logo
  • 4th cover with:
    • title (French or English) of the resource
    • description (French or English)

Note that the cover pages can be replaced by editing the final file with a PDF editor such as PDF Sam Basic.