Issue 363: Form and persistence of RDF identifiers

ID: 
363
Starting Date: 
2018-01-24
Working Group: 
2
Status: 
Done
Closing Date: 
2019-10-22
Background: 

Following  the dialog for issue 361, Richard Light posted on 18/1/2018

Gordon,

Looking at the RDF XML for F10, I see (a) that you make F10 equivalent to the full F10_Person, as the core CRM does in its RDFS Schema and (b) when subclassing from the CRM core, you use the full form E21_Person:
<rdf:Description rdf:about="http://iflastandards.info/ns/fr/frbr/frbroo/F10">
...
<rdfs:subClassOf rdf:resource="http://www.cidoc-crm.org/cidoc-crm/E21_Person"/>
<owl:sameAs rdf:resource="http://iflastandards.info/ns/fr/frbr/frbroo/F10_Person"/>
</rdf:Description>

So I think there are still issues to resolve in this area for FRBRoo.

posted by Richard on 18/1/2018

This is alarming.  I have always assumed that a superseded class or property would simply be flagged as "deprecated" and a new one minted to replace it. There is absolutely no need to re-use numbers, and I am hoping someone will come forward to say that this was a mistake, and not a change which accords with CRM-SIG policy.  Otherwise, as you say, we can have no confidence in the CRM as a persistent RDF framework, whether or not the class and property identifiers include a textual component.  Is this an isolated case, or does anyone know of other cases where domain and range (and indeed meaning) of a class or property has been changed after its initial publication?

(The textual component is, in any case, only meant to be guidance and is explicitly stated not to be unique: 'is identified by' below is a good example of this.)
 

posted by Gordon Dunsire on 18/1/2018Richard

I guess we were waiting for this discussion; we can only use what is documented in the CRM itself.

posted by Gordon Dunsire on 18/1/2018

[My first response was blocked because the thread was “too long”; here it is again]

I agree with Philip [and Richard]

If the domain or range of a FRBRoo property is changed, or there was a significant change in definition, we would deprecate the old version and declare a new URI. This hasn’t happened yet, but would beg the question of what to use as a new URI – perhaps add a version number to the alphanumeric part. For that reason we would advise the FRBR Review Group to mint a new alphanumeric designation.

posted by Martin on 19/1/2018

We never continue an alphanumeric designation when there is a significant change in definition. You can take for granted that continuing the
designation means that the change is not significant.

The case below (P148) should be due to an internal processing problem, and will never reoccur. It is characteristically the last property of this edition.
The reason, if I am not wrong, was that we got out of sync with the ISO version with the latest changes. Since the ISO team does in general not respect our
continuity concerns when there was parallel work, we had some times the bitter choice between our continuity and not to create a different branch from ISO for
typical reasons. Probably should have been explicitly justified.

If you sport any other reuse of an alphanumeric code, please inform us.

Since we have discussed for years the issues with changing labels, I repeat quickly the reasons:
Labels are taken for mnemonics, and people associate, even they shouldn't, semantics with it.
Therefore labels change when they render better the concept and serious misunderstandings can be reduced following explicit community requests.
The fact that the alphanumeric code is continued, marks absolutely clear that this is a change of name and not meaning.
Labels are also translated, and work as mnemonics of the respective language.
Therefore labels are not part of the definition.

The rest are considerations of use, and a question of utilities, which cannot dictate our practice.
Anyone working in an IT environment should have access to someone doing the trivial task of mapping label changes in his S/W,
if he has chosen to include labels in the URIs without "same_as" statements. Please consider in your requirements, that continuity of meaning is as important as comprehensibility. We cannot follow advise which considers only one side of the medal.

F10 was deliberately declared as "F" in FRBRoo to be an FRBRoo concept "same as" E21, for didactic reasons. There is no continuity break.

Please let me know if there is anything wrong with this.

Posted by Richard Light on 22/1/2018

On 19/01/2018 13:36, Martin Doerr wrote:
> Dear All,
>
> We never continue an alphanumeric designation when there is a significant change in definition. You can take for granted that continuing the
> designation means that the change is not significant.
>
> The case below (P148) should be due to an internal processing problem, and will never reoccur. It is characteristically the last property of this edition.
> The reason, if I am not wrong, was that we got out of sync with the ISO version with the latest changes. Since the ISO team does in general not respect our
> continuity concerns when there was parallel work, we had some times the bitter choice between our continuity and not to create a different branch from ISO for
> typical reasons. Probably should have been explicitly justified.

OK, thanks for the explanation.  Though I don't understand why 'ISO' (who, exactly?) was doing active development work on the CRM.  I thought that they simply took the SIG's work through the ISO formalization process.

> Since we have discussed for years the issues with changing labels, I repeat quickly the reasons:
> Labels are taken for mnemonics, and people associate, even they shouldn't, semantics with it.
> Therefore labels change when they render better the concept and serious misunderstandings can be reduced following explicit community requests.
> The fact that the alphanumeric code is continued, marks absolutely clear that this is a change of name and not meaning.
> Labels are also translated, and work as mnemonics of the respective language.
> Therefore labels are not part of the definition.
>
> The rest are considerations of use, and a question of utilities, which cannot dictate our practice.
> Anyone working in an IT environment should have access to someone doing the trivial task of mapping label changes in his S/W,
> if he has chosen to include labels in the URIs without "same_as" statements. Please consider in your requirements, that continuity of meaning is as important as comprehensibility. We cannot follow advise which considers only one side of the medal.

I think that this argument is perfectly valid for the 'Definition of CRM' document.  However, by publishing an RDFS expression of the CRM we are moving, whether we like it or not, into the realm of 'utilities'.  People are picking up and using our RDFS definitions in a variety of ways.  In this particular implementation context, I would argue that we should ensure that there is a label-free version of each CRM class and property.  Also, our guidance on the use of our RDFS implementation should recommend the use of this label-free version, on the grounds that we cannot guarantee the stability of the version which includes a label.

This talk of preferred labels and your mention of the labels in other languages leads me to wonder whether anyone has produced a SKOS version of the CRM.  This might be a useful exposition of the logic of the CRM, expressed in a format which is widely used and supported.  We could have 'preferred labels' for each concept in as many languages as we like.  A SKOS version would be no use for instance data, because each SKOS concept is itself an instance, in OWL terms, but it might be a powerful tool for expressing relationships between concepts in different schemes, i.e. exactly the purpose for which the CRM was originally created.  Thoughts, anyone?

Posted by Robert Sanderson on 22/1/2018

An interesting investigation would be to try and reuse existing terms from well-known ontologies, rather than creating yet another one.

To Martin’s point about just renaming all the things … that sounds easy in theory, but in the distributed real world of implementations and datasets, in practice it means that everyone needs to support all of the different permutations as there’s always some product or some piece of data that hasn’t updated to the most recent version.

One small benefit would be that new serializations like JSON-LD would have more liberty to assert their own mappings over top of the alphanumeric designations, rather than feeling beholden to the labels.  Of course for every other serialization it’s going to be completely unintelligible and thereby unusable.

Posted by Martin on 22/1/2018

Dear Robert,

On 1/22/2018 7:12 PM, Robert Sanderson wrote:
>

>
> An interesting investigation would be to try and reuse existing terms from well-known ontologies, rather than creating yet another one.
>

>
> To Martin’s point about just renaming all the things … that sounds easy in theory, but in the distributed real world of implementations and datasets, in practice it means that everyone needs to support all of the different permutations as there’s always some product or some piece of data that hasn’t updated to the most recent version.

It is not about renaming all things, it is about not excluding renaming while preserving the identification code.
An unchanged standard is an illusion, a fiction not worthwhile sticking to. I use to present it this way:

Making Standards

The good with standards is there are so many!

When you have a standard,

You need to transform to the standard

You need to renew and adapt the standard

You need to transform to the renewed standards

Why not just transform data?

There are too many transformations, you need a standard

> One small benefit would be that new serializations like JSON-LD would have more liberty to assert their own mappings over top of the alphanumeric designations, rather than feeling beholden to the labels.  Of course for every other serialization it’s going to be completely unintelligible and thereby unusable.
The challenge is a) to understand that mapping, not the standard is the ultimate solution
b) how to standardize the mapping
c) to minimize the needs to map  

Posted by Martin on 23/1/2018

Dear Richard,

On 1/22/2018 4:37 PM, Richard Light wrote:
>
> On 19/01/2018 13:36, Martin Doerr wrote:
>> Dear All,
>>
>> We never continue an alphanumeric designation when there is a significant change in definition. You can take for granted that continuing the
>> designation means that the change is not significant.
>>
>> The case below (P148) should be due to an internal processing problem, and will never reoccur. It is characteristically the last property of this edition.
>> The reason, if I am not wrong, was that we got out of sync with the ISO version with the latest changes. Since the ISO team does in general not respect our
>> continuity concerns when there was parallel work, we had some times the bitter choice between our continuity and not to create a different branch from ISO for
>> typical reasons. Probably should have been explicitly justified.

> OK, thanks for the explanation.  Though I don't understand why 'ISO' (who, exactly?) was doing active development work on the CRM.  I thought that they simply took the SIG's work through the ISO formalization process.
ISO working group decisions supersede ours. They will listen to arguments of our liaison people, but often it is better to accept than to risk another year of discussions about a label.
>

>> Since we have discussed for years the issues with changing labels, I repeat quickly the reasons:
>> Labels are taken for mnemonics, and people associate, even they shouldn't, semantics with it.
>> Therefore labels change when they render better the concept and serious misunderstandings can be reduced following explicit community requests.
>> The fact that the alphanumeric code is continued, marks absolutely clear that this is a change of name and not meaning.
>> Labels are also translated, and work as mnemonics of the respective language.
>> Therefore labels are not part of the definition.

>>
>> The rest are considerations of use, and a question of utilities, which cannot dictate our practice.
>> Anyone working in an IT environment should have access to someone doing the trivial task of mapping label changes in his S/W,
>> if he has chosen to include labels in the URIs without "same_as" statements. Please consider in your requirements, that continuity of meaning is as important as comprehensibility. We cannot follow advise which considers only one side of the medal.

> I think that this argument is perfectly valid for the 'Definition of CRM' document.  However, by publishing an RDFS expression of the CRM we are moving, whether we like it or not, into the realm of 'utilities'.  People are picking up and using our RDFS definitions in a variety of ways.  In this particular implementation context, I would argue that we should ensure that there is a label-free version of each CRM class and property.  Also, our guidance on the use of our RDFS implementation should recommend the use of this label-free version, on the grounds that we cannot guarantee the stability of the version which includes a label.
The issue was decided in the 27th meeting, as documented in the agenda. We had produced label-free definitions with language labels, as you propose, which caused an outcry from implementers that saw only numbers and had not tools showing the display labels. Since there is no new evidence to the issue, I'd propose to stay as we are and I'll try to make the respective discussion thread accessible, so that all the old arguments can be read again.

The current RDFS reads, e.g.:
<rdfs:Class rdf:about="E2_Temporal_Entity"><rdfs:label xml:lang="fr">Entité temporelle</rdfs:label><rdfs:label xml:lang="en">Temporal Entity</rdfs:label><rdfs:label xml:lang="ru">Временная Сущность</rdfs:label><rdfs:label xml:lang="el">Έγχρονη  Οντότητα</rdfs:label><rdfs:label xml:lang="de">Geschehendes</rdfs:label><rdfs:label xml:lang="pt">Entidade Temporal</rdfs:label><rdfs:label xml:lang="zh">时间实体</rdfs:label><rdfs:comment>

as outcome of a long-standing discussion...........
>
> This talk of preferred labels and your mention of the labels in other languages leads me to wonder whether anyone has produced a SKOS version of the CRM.
Your suggestions well taken, but I do not see what this would offer in contrast to the current international display labeling as shown above.

>   This might be a useful exposition of the logic of the CRM, expressed in a format which is widely used and supported.  We could have 'preferred labels' for each concept in as many languages as we like.  A SKOS version would be no use for instance data, because each SKOS concept is itself an instance, in OWL terms, but it might be a powerful tool for expressing relationships between concepts in different schemes, i.e. exactly the purpose for which the CRM was originally created.  Thoughts, anyone?
CRM classes are not terms. The CRM is an ontology of relationships. Classes are only auxiliary for relationships. Therefore we delete classes without relationships. The classes belong to a completely artificial language. Therefore I'd argue there is nothing like a "preferred label". People must understand the scope notes, nothing else.  The purpose for which the CRM was and is created is to mediate data structures, i.e. between relations connection "fields", not between "terms". If this is not clear enough from the CRM introduction, please let us know how to improve the text.

Therefore, to my opinion, it is impossible in SKOS to represent the logic of the CRM. A pure class-class-mapping is usually misleading. However the X3ML mapping language can map relationships in a declarative way.
 

Posted by Martin on 23/1/2018

Dear All,

Thank you very much for your engagement in these issues!
Let me remark, for all those that find our practices alarming, that nobody of us is paid for the maintenance of the CRM.
It is exclusively an engagement of volunteers and engagement of organizations for a common good.
What is really alarming for me is the lack of users offering active work beyond criticism.

We are now in the 22th year of development. If you want to have a CRM in which you can find some practices alarming in the future, better engage now and support us by coming to the meetings, learn understanding the methods and do editing work, tools development, didactic material etc;-).

Besides inviting people to our meetings and learning in the discussions, we'll be very glad to offer intensive training in our methods
and principles to anybody interested. Without the one or the other, some e-mail discussions may repeat old arguments in a fragmented way,
never convincing, because the overall logic is not exposed. The art is balancing all practical requirements and a crystal-clear separation
between the intellectual and technological levels.

Interested people may be domain professionals with a long-term data modeling and standards mission, consultants, but in particular also
post-graduate students that can combine their subjects with methodological research and become trainers themselves.

So I hope some of you are alarmed enough to join us actively:-D! 

 

Posted by Richard on 23/1/2018

On 23/01/2018 16:07, Martin Doerr wrote:

>> I think that this argument is perfectly valid for the 'Definition of CRM' document.  However, by publishing an RDFS expression of the CRM we are moving, whether we like it or not, into the realm of 'utilities'.  People are picking up and using our RDFS definitions in a variety of ways.  In this particular implementation context, I would argue that we should ensure that there is a label-free version of each CRM class and property.  Also, our guidance on the use of our RDFS implementation should recommend the use of this label-free version, on the grounds that we cannot guarantee the stability of the version which includes a label.
>>

> The issue was decided in the 27th meeting, as documented in the agenda. We had produced label-free definitions with language labels, as you propose, which caused an outcry from implementers that saw only numbers and had not tools showing the display labels. Since there is no new evidence to the issue, I'd propose to stay as we are and I'll try to make the respective discussion thread accessible, so that all the old arguments can be read again.
I would be interested to (re-)read the arguments.  However, I repeat my assertion above that there should also be a label-free version of each CRM class and property identifier.  I am perfectly happy that this should be flagged as being the same concept as a variant whose identifier includes a label (i.e. "E2 owl:sameAs E2_Temporal_Entity").  I'm even happy for the label-free identifier not to be the "preferred" form.

Implementers who want their CRM RDF to be valid into the long-term future must surely realise that the convenience of a "human-readable" identifier is negated by the possibility that the CRM SIG might change that identifier at some point in the future - which you are claiming the right to do - and so invalidate their RDF.  (Maybe they just need better tools.  Linked Data resources with opaque URIs are hardly unusual: take Geonames and the Getty vocabularies as examples.)  Has this specific argument been made before, and, if so, how was it rebutted?

> The current RDFS reads, e.g.:
> <rdfs:Class rdf:about="E2_Temporal_Entity"><rdfs:label xml:lang="fr">Entité temporelle</rdfs:label><rdfs:label xml:lang="en">Temporal Entity</rdfs:label><rdfs:label xml:lang="ru">Временная Сущность</rdfs:label><rdfs:label xml:lang="el">Έγχρονη  Οντότητα</rdfs:label><rdfs:label xml:lang="de">Geschehendes</rdfs:label><rdfs:label xml:lang="pt">Entidade Temporal</rdfs:label><rdfs:label xml:lang="zh">时间实体</rdfs:label><rdfs:comment>
>
> as outcome of a long-standing discussion...........

>>
>> This talk of preferred labels and your mention of the labels in other languages leads me to wonder whether anyone has produced a SKOS version of the CRM.
> Your suggestions well taken, but I do not see what this would offer in contrast to the current international display labeling as shown above.
>
>> This might be a useful exposition of the logic of the CRM, expressed in a format which is widely used and supported.  We could have 'preferred labels' for each concept in as many languages as we like.  A SKOS version would be no use for instance data, because each SKOS concept is itself an instance, in OWL terms, but it might be a powerful tool for expressing relationships between concepts in different schemes, i.e. exactly the purpose for which the CRM was originally created.  Thoughts, anyone?

> CRM classes are not terms. The CRM is an ontology of relationships. Classes are only auxiliary for relationships. Therefore we delete classes without relationships. The classes belong to a completely artificial language. Therefore I'd argue there is nothing like a "preferred label". People must understand the scope notes, nothing else.  The purpose for which the CRM was and is created is to mediate data structures, i.e. between relations connection "fields", not between "terms". If this is not clear enough from the CRM introduction, please let us know how to improve the text.
In that case it might be argued that the multilingual labels are not helpful.  And, in fact, we could not meet the uniqueness requirement for SKOS preferred labels, since our labels are not guaranteed to be unique.  What we should have are translations of the full scope notes for each class and property.

> Therefore, to my opinion, it is impossible in SKOS to represent the logic of the CRM. A pure class-class-mapping is usually misleading. However the X3ML mapping language can map relationships in a declarative way.
Having thought about this some more, and started to write an RDF-to-SKOS transformation, I have come to the same conclusion.  Also, I didn't know about X3ML, which I agree is exactly the right sort of approach for expressing mappings declaratively.
 

Posted by Robert Sanderson on 23/1/2018

Hi Martin,

Could you lay out, beyond costly and lengthy in person meetings, could you describe how best someone might actively participate? In our experience of bringing issues that have arisen in the linked.art work to the list, there has been some discussion, but no useful resolutions that might become editorial work that could be taken on.  The text of the specification, by being managed in a word document (it would appear, from the PDF), is not able to be edited by a distributed team of volunteers.

Modern specification efforts, including the W3C and IIIF, with linked.art following their lead, use github to manage issues, changes and publication of the content. A modernization of the specification management practices might enable volunteers to be more active, with their efforts recognized and tracked.  If this was decided to be a good way forwards, I would be happy to volunteer the time to set up the repository and publishing platform.

Posted by Richard on 23/1/2018

On 23/01/2018 16:39, Martin Doerr wrote:
> Dear All,
>
> Thank you very much for your engagement in these issues!
> Let me remark, for all those that find our practices alarming, that nobody of us is paid for the maintenance of the CRM.
> It is exclusively an engagement of volunteers and engagement of organizations for a common good.
> What is really alarming for me is the lack of users offering active work beyond criticism.

I think the recent discussions about RDF and the CRM go well beyond just offering criticism.  You'll find the document which I started on Google Docs:

https://docs.google.com/document/d/1zCGZ4iBzekcEYo4Dy0hI8CrZ7dTkMD2rJaxa...

which is an attempt at a self-contained 'how to' guide for CRM RDF implementers, and which reflects the recent discussions on this topic. I'm happy to develop this document further, and would welcome input from others on this list.  Conversely, if this document doesn't meet a real need, I would be equally happy to be told why this is the case.

Posted  by George on 29/1/2018

Dear all,

I think that an official RDF implementation recommendation guide as has been started would be a very useful document. I think the google doc format is a good place for formulating. We should aim to consider the resulting doc at the next SIG, hopefully with as much input from across the community as possible before hand. Having such a document should be a big help in eliminating unnecessary variance in implementation. A nice addition to the document would be to have example accompanying rdf.

Posted by Thanasis Velios on 29/1/2018

I think that offering examples in the form of 3M mappings would also be helpful.

Current Proposal: 

The 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting,the sig decided to merge this issue with 361 since both are referred to encoding in rdf CIDOC classes and properties. Then the sig reviewed the text initiated by Richard Light . They agreed about particular challenges that should be explained and clarified in this text. These are:

1. identifiers - their role and value, labels and reconciliationn in RDFnds, 
2. RDF and the question has the problem of primitive values (strings, names, dates, space, spacetime), (ontology as it exists must deviate what a machine can represent)
3. how to do properties of properties,
4. identifiers - their role and value in RDFnds,  (questions of reconciliation and instance managing)
5. Types SKOS recommendation and other possible ontological extensions
6. Identifiers of CRM classes and n relations themselves% (update processing)
7. Recording string
8. Statement expressing the translation of the ontology into RDFS using IsA and other mechanisms

Also the sig discussed about the creation of manuals like this for other applying other formats
In addition, the sig took the following decisions and recommendations about the CRMbase:
(a)  create a new issue ‘has content’ that will work from E90 and allow the semantic capture of the actual content of a symbolic object. To be modelled on the R33 property of FRBRoo. HW to MD for formulation of this property (issue 383)
(b) it is recommended that all nodes in rdf should have labels. If someone need to track appellation, he can capture the content through the new property of E90. 
(c) Should create a general section recording symbolic objects (to talk about the content question) and then the name recording section can reference this.
(d) should create a list of recommended data types for the primitive types

RL, MD, GB, OE, TV should  review the text about encoding CRM in rdf.

Lyon, May 2018

Posted by Martin on 21/11/2018

Dear All,

Here is my new version of the document, now in a coherent form, all previous comments taken into account.

Still missing, a guideline for the limits of Dimension values:
https://docs.google.com/document/d/1NdrWpzo7EFChryh4Qg-Ue8WLvnwejHx20eiwdJuZEck/edit#

Posted by Martin on 25/11/2018

Dear All,

Here the new, completely reworked version of the document about implementing CRM in RDF, I hope I have taken all comments into account. A guideline for value limits of dimensions is still missing, and a guideline about implementing multiple instantiation.

See https://docs.google.com/document/d/1NdrWpzo7EFChryh4Qg-Ue8WLvnwejHx20eiw...

Comments most welcome!

In the 42nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 35th FRBR - CIDOC CRM Harmonization meeting, following the decision to put together a document that offers guidance on recommended techniques for using the CIDOC-CRM in an RDF implementation, the crm-sig reviewed the text drafted by MD and RL (Encoding the CIDOC Conceptual Reference Model in RDF).

The sig accepted the content (i.e. scope, table of contents, guidelines given) of the document as a draft and decided that the document be uploaded under Best Practices, with its reworked title (Implementing the CIDOC Conceptual Reference Model in RDF), credited to M.Doerr and R.Light.
The scope of this document was explicitly stated –it serves as an implementation guideline, it’s not part of the crm as such.
HW: RL is assigned with collecting comments and doing editorial work on the document.
It was decided to edit the guideline “For Appellations being described indirectly via a URI, we recommend the use of E41 Appellation> P72 has language> E56 Language.”, found under the section Language of an Appellation, to “…E33 Linguistic object > P72 has language> E56 Language” instead.

The proposal to document a subclass of E41 Appellation, namely “Linguistic Appellation”, in the rdf representation of the crm was accepted.

Also the sig decided that after this meeting we should have an rdf expression of the 6.2.5 CRM version following these guidelines and we should also define the mappings for the deprecated classes and properties.

Berlin, November 2018

Posted by Martin on 5/12/2018

Dear All,

I propose this paragraph to be added to the implementation guidelines for RDFS:

"About implementing multiple Instantiation

Knowledge representation models and more generally semantic networks differ fundamentally in one aspect from data structures, such as XML, Relational database schemata and data structures in all programming languages, including the object-oriented one:

·       Knowledge representation starts with an item in the real world regardless its nature, assigns an identifier to it in order to be able to make assertions about it, and then accumulates statements (assertions, propositions) about it.

·       Data structures start with a set of templates, a set of foreseen kinds of statements dedicated to a particular category each (class, entity), to be filled in by a user.

Consequently, knowledge representation may assign multiple classes to a given identifier without any problem. The associated processing software will then allow for asserting for this identifier all properties applicable to each assigned class. This process is called “multiple instantiation. For instance, the “weapon” with all its characteristics may also be a “ceremonial object”.

A system based on data structures must create a different instance of the respective templates for each class an item belongs to. It may later the link the different instances describing aspects of the same thing, in order to simulate the mechanism. In particular the very successful “encapsulation principle” of object-oriented programming languages requires dedicated data structures and constitutes a fundamental mismatch with the Open-World modeling of semantic relationships (see, for instance Schnase 1993). Fundamental to semantic data integration are also superproperties, which are not provided by data structures either.

The CRM as ontology relies heavily on multiple instantiation: Classes that use to co-occur on things simultaneously “incidentally”, without being associated with properties only applicable to the combination of such classes, are not modelled individually as subclasses of multiple parent classes. The latter would be called “multiple IsA”. To avoid multiple IsA in such cases is an important normalization principle to keep the ontology very compact and unambiguous.

Most implementations on top of RDF still use RDF as if it were a fixed schema and repeat in the UI code all the schema. Therefore, the promise of RDF and other semantic models to be able to accommodate dynamically new properties often does not work. It is still as if they were using Relational systems. Generic XML editors do adapt already to the schema, but usually the rendering paradigms they employ, without additional parameters, are too poor for good UI code. One can however write code that reads the RDF schema used at run-time and that extends data entry and display by the actual properties found. This functionality is foreseen by SPARQL, but most programmers still do not appreciate the utility of querying the schema. Even if fixed templates are used, the data entry system should foresee the same thing to be described by multiple templates, relatively freely selectable by the user.

In the specification modules of mapping software used to transform data into a CRM-compatible form, care must be taken to foresee and allow the user to combine RDF classes systematically. It may be useful to develop tools for specific guidance that show users how a valid path from a given domain class to a certain range class can be created by using multiple instantiation (and, by the way, also by using subclasses of the domain class), such as combining E41 Appellation with E33 Linguistic Object in order to reach E56 Language via P72 has language.

In a local system, another workaround for multiple instantiation can be the creation of classes that replace all candidate cases for multiple instantiation by subclasses using multiple IsA. For good reasons, the compatibility with the CRM is defined at the import/export/query level and not at the system internals. Therefore, such internal workarounds do not affect the interoperability: Whereas the query compatibility of this solution with the standard is immediate, the respective import/export system simply needs to make the trivial replacements of the respective class combinations with their multiple IsA counterparts and vice-versa.

So, partially, problems with multiple instantiation are a question of programming practice. On the other side, it is also a question of user training and extended good practice. Users may provide feedback about frequent cases where multiple instantiation is used, in order to guide users to these modelling cases. These could systematically be entered into the CRM RDF implementation, without requiring the CRM standard itself to repeat them."

John L. Schnase, (1993). "Semantic Data Modelling of Hypermedia Associations", in: ACM Transactions on Information Systems, Vol.11,No.1, January 1993, p 45.

Comments welcome!

Posted by Richard Light on 5/12/2018

Martin,

Please explain why you think that this text is needed in the RDF implementation guidelines. To me, it seems quite generic, and doesn't offer specific guidance as to what implementors should do about the issue that their existing systems may be incapable of expressing certain RDF features. I think it would actually detract from the usefulness of the document, because it would confuse and puzzle the typical reader.  [Maybe we need to stop and think about who the 'typical reader' would be, and what they would want from this document.]

 

Posted by Martin on 6/12/2018

This was a proposal by Robert . It may be useful for implementers not used to semantic technologies.

What do other people think?

Posted by Martijn van Leusen   on 6/12/2018

Hi Martin,
Not sure if you would regard me as a typical reader, but I find this text very hard to read and understand without having at least one good worked example to guide me through it. It presupposes so much specialised knowledge about the various types of data management and knowledge organisation systems that, in its current state, only a small group of specialists might find it useful...

 

Posted by Martin on 6/12/2018

Right. It is very dense. I tried to justify multiple instantiation in the same text and give practical advice. I am not sure who finds it an issue. In the principles of the CRM we describe it again, but may be here it would be useful just to make people aware of it, and make an example in the Annex. Or omit allover.

Opinions?

Posted by Florian Kräutli   on 7/12/2018

Hi Martin,

I agree with the previous comments that the text is a bit dense and assumes a quite specific prior knowledge, i.e. it might be confusing to include it as a general guideline.

When I started with the CRM however, I somehow refrained from doing multiple instantiation. I don't think it is actively discouraged anywhere, but I was under the impression that I should rather find a more fitting entity than 'overloading' one with several classes. So an indication that multiple instantiation can be ok, and a set of examples of where it makes sense might be useful to include somewhere.

The example of using E33 to reach P72 is a good one I think. I also use it together with F22.

Posted by Robert Sanderson on 10/12/2018

I agree that it’s very dense, but also very informative!

A shorter version might check both boxes – informative to folks that are more used to programming or traditional data management platforms, and instructive on how to work around their limitations?

For example, something as short as the following might be sufficient:

The CRM can be implemented in RDF using a technique called “multiple instantiation”.  This means that instances would participate in the IsA relationship (rdf:type) multiple times, thereby instantiating multiple classes at the same time. In the abstract model this is very appropriate, as an instance can very legitimately be thought of as an Appellation and a Linguistic Object at the same time in the case of a name that is in a human language. However, many implementations at their core are not natively RDF or even graph-based and would run into difficulties trying to create relationship representations or classes in an object oriented programming language that instantiated multiple ontological classes.  Instead, the projection from the abstract CRM into RDF includes artificial “merge” classes such as E41_E33_Linguistic_Appellation when use cases are sufficient to demonstrate the value of these constructions.  Use of these artificial classes is intended for situations where the implementation is a challenge, rather than being an ontologically rigorous pattern.

 

Posted by Steve Stead on 11/12/2018

I like Robert’s text as it gives enough info to point people in the right direction. However, it is brief. On the other hand Martin’s text is longer and needs some editorial input to make it read less “Martinish”. I would be happy to do that over Xmas but, if  Robert’s text is sufficient then I will not expend the time.

Posted by Richard Light on 11/12/2018

Hi,

Unless I have misunderstood, both versions came from Robert. I still think that we need to consider what actually needs to be in the RDF document. In my view it should be the absolute minimum to 'do the job': the only question is what 'the job' should be. 

Posted by Detlev Balzer on 11/12/2018

I'm also wondering if we actually need such explanation. If the concern is that

> many implementations at their core are not natively RDF or even graph-based and would run into difficulties trying to create relationship representations or classes in an object oriented programming language that instantiated multiple ontological classes.

then this is certainly true for "classical" relational databases without any level of object-relational mapping. However, anyone embarking on a certain degree of object-oriented design will be (or soon become) aware of these limitations, and of the various solutions discussed at length in the developer community.

 

Posted by Christian Emil Ore on 11/12/2018

I like Robert's text. I can see some problems with the use of "merged classes" since there is a large number of possible classes. A "merged" class is simply a way to reformulate same as at the class level.

In a relational datebase one would need a common series of identifier for  the primary key in all involved tables which is uncommon but ok since one in principle need only one sequence giving unque identifers for an entire database (or all databases in the world)

Posted by Martin on 11/12/2018

I think Robert's text needs a bit more background. Would someone try?

Multiple instantiation is fundamental to the CRM and semantic models. Not just a technique.

We also observe that programmers writing code for selecting applicable properties as data entry into RDF/OWL based systems or mapping systems use to forget multiple instantiation, and to alert users that a suitable subclass may have the property needed.
It's not only a problem with other kinds of data models.

posted by Martin on 22/3/2019

Dear All,

here my heavy reworking and reduction of the debated paragraph about multiple instantiation. I hope this has settled all concerns, linguistic improvement not withstanding. Please comment!

-------------------------------------------------------------------------------------------------------------

About implementing multiple Instantiation

Knowledge Representation models can assign multiple classes to a given instance identifier. After that, all properties of each assigned class are applicable for this identifier. This construct is called “multiple instantiation. For instance, a calligraphy is an “image” and a “linguistic object”, having a language and a painting style. This is not possible with Relational data structures, because instance identification is limited to the entity (class) or with XML-like data structures, because instance identification is by structural position (additional identifiers can be used for linking).

 

Therefore many users are not aware of this feature, and even KR tools do not systematically guide users to use it: Once an instance is classified by one class, the tool should not allow for using a property of another class, but most likely will not advise the user that she could add the additional class to the instance. Nevertheless, it is a key feature of KR models that facilitating modularizing ontologies and the often advertised ability to combine different ontologies.

The CRM as ontology relies heavily on multiple instantiation: Combination of classes that are applicable to some instances only incidentally and have no properties specific to this combination are not modelled in the CRM individually as subclasses of multiple parent classes. The latter would be called “multiple IsA”. To avoid multiple IsA in such cases is an important normalization principle to keep the ontology very compact and unambiguous.

In the specification modules of mapping software used to transform data into a CRM-compatible form, care must be taken to foresee and allow the user to combine RDF classes systematically.

Some combinations of classes may more frequently occur, such as combining E41 Appellation with E33 Linguistic Object in order to reach E56 Language via P72 has language. In a local system that does not easily support multiple instantiation, the candidate cases for multiple instantiation may be combined in subclasses using multiple IsA. For their labels, we recommend to aggregate the class identifier codes as in: “E41_E33_Linguistic Appellation”. Such a replacement is query compatible with the standard. A respective import/export system simply needs to make the trivial replacements of the respective class combinations with their multiple IsA counterparts and vice-versa in order to achieve import/export compatibility.

Users may provide feedback about frequent cases where multiple instantiation is used, in order to guide users to these modelling cases. These could systematically be entered into the CRM RDF implementation, without requiring the CRM standard itself to repeat them.

Outcome: 

In the 45th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 38th FRBR – CIDOC CRM Harmonization meeting, the sig decided to close this issue, as its title is irrelevant to its content now. Discussions on the content of “Implementing the CIDOC Conceptual Reference Model in RDF” should be carried out in a separate issue.

As part of this issue 443, the sig approved the changes by GH on the document “Implementing the CIDOC Conceptual Reference Model in RDF” and the addition of the paragraph on Multiple Instantiation (by MD). The document is to be given a version number and every time there is change in its content, it should be assigned a new version status. The current version is 1.0.

GH is to appear in the list of authors (together with RL and MD).

The issue closed

Heraklion, October 2019

 
 

Reference to Issues: