Issue 266: Reified association vs sub-event

Starting Date: 
2014-10-14
Working Group: 
2
Status: 
Done
Closing Date: 
2015-02-12
Background: 

posted by Vladimir   on29/9/2014

Hi everyone! 
(This is particularly for Martin and Dominic, but comments from everyone are welcome)
 
The BM mapping uses two patterns to express the relation of an entity (typically person) to an event:
 
1. use reification over the relation (bmo:EX_Association is a subclass of CRM Attribute Assignment):
 
2. make sub-event (e.g. Production part) and put the relation type there:
  This is not well illustrated in the CRMPrimer:
  p17 shows a sole event part, and p18 shows two parts but without P2_has_type.
  But you get the idea
 
2 is used more often in the mapping (see the page above).
1 is used less often: for Influenced/Motivated relations (not for P14 carried out by), and to express uncertainty.
Specifically: Acquired Through (contributor), Probably/Unlikely Produced By, (production) Influenced By, Production Motivated By, Probably Produced At, Made For Place
 
Martin and Dominic have said that 2 is more open-world while 1 is more close-world.
Could you please explain this to me?
It's very important for me as I move closer to Getty ULAN and CONA modeling.

Posted by Richard Light   on 15/10/2014

Vladimir,

I can't answer your question on the openness or closed-ness of the two approaches.  However, that won't stop me from commenting, since no-one else has.

This is an example of the famous "property of a property" issue, which has proved to be a challenge for the CRM in an RDF context.  In the original [abstract] object-oriented CRM data model, we cheerfully allow properties to have properties [1], and this is an accepted way, for example, to qualify the role which a person plays in relation to an activity.  However, the way in which this more precise role is normally specified in practice (which the CRM document goes on to give examples of) is by declaring the more specific property to be a subproperty of the original property.

If you do this, the subproperty simply takes the place of the original more generic property in an RDF expression of the statement, and the result is a meaningful RDF triple.  If, instead, you try to express "property of a property" as RDF, you find that you are trying to construct a triple with a predicate as its object; something RDF does not allow.

As I understand it, the BM tried the "subproperty" strategy first, and found that it led to an explosion in the size of their data model, and didn't sit well with their actual data, e.g. their roles termlist.  So they investigated an alternative approach and came up with the "reified association" strategy.  It took me a while to get my head around this, not least because of diagrams like the one at the top of p.15 of the Primer [2], where the arc in red is clearly nonsense in a simplistic RDF-modelling sense.  However, I now believe.

This particular problem has exercised me for some years.  In our XML-based Modes system (which harks back to the original MDA Data Standard in its structuring approach), we routinely record multiple people as being associated with an Activity, each playing distinct role(s).  We don't quite get it right there, from a strictly logical PoV:

<Production>
<Person><Role>designer</Role><Name>Light, R.B.</Name></Person>
<Person><Role>engraver</Role><Name>Smith, J.</Name></Person>
...

The point is that each person will play many roles in their life, and the role that is recorded here is only meaningful in the context of this particular activity.  So the role isn't a property of the person: it's a property of the person-playing-a-role.  So, my suggestion is that we create a new class RolePlayer, which could be defined as "one or more Actors playing one or more specified Roles in relation to an Activity".  Then we could model what we are trying to say, elegantly and precisely.

The trouble with the "sub-event" strategy is in my view two-fold: it is creating sub-events where there are none, simply to address a modelling problem with people having multiple roles; and it is falsely associating the role with the sub-event when that role is actually a property of the person involved in the sub-event.

Apologies if this has all been discussed before.  It does seem like rather a basic point, and I do vaguely remember the concept of "RolePlayer" from the CIDOC Relational Data Model days.

Richard

[1] "Properties may themselves have properties that relate to other classes", CRM Reference v5.1.2, p.ix
[2] http://www.cidoc-crm.org/docs/CRMPrimer_v1.1.pdf

Posted by Simon Spero on   15/10/2014

On Oct 15, 2014 11:45 AM, "Richard Light" <richard@light.demon.co.uk> wrote:

I. Properties of properties.

> If you do this, the subproperty simply takes the place of the original more generic property in an RDF expression of the statement, and the result is a meaningful RDF triple.  If, instead, you try to express "property of a property" as RDF, you find that you are trying to construct a triple with a predicate as its object; something RDF does not allow.

This paragraph may confuse some people so I would add some clarifications.

1. It's perfectly ok to make property assertions whose subjects are properties, in both RDF and in OWL 2. These assertions are about the property itself, rather than any particular use of the property.

2. It is possible to make property assertions whose value is a property, in both RDF and OWL. For example one could state that a class has subclasses that are partitioned based on the value of the specified property.

3. In OWL 2 it is possible to add annotations to a property assertion axiom. These annotations are only about the particular act of assertion, rather than what is being asserted.

4. In RDF it is possible to make assertions about an RDF statement by using the RDF reification mechanism. RDF reification is generally considered to be pretty bad (a reified statement does not even entail the original statement).

II. Subproperties vs. Reified associations

1. Using subproperties instead of reified entities makes it easier to use off-the-shelf reasoners.  For example, if there are constraints that apply to a particular role, it may require creating one or more new subclasses to which the constraints may be applied (these can, of course, be anonymous, but that may not make things easier to use).

Additionally there may be optimizations for retrieval of subproperties that are not otherwise available.

2. Using reified associations labelled with concepts from a version of SKOS supporting hierarchical relationships does not automatically entail that hierarchy for the associations.

III.  Roles and subevents.

It is possible to treat subevents as a subclass of roles, but the typical motivation would be if the sub-event was an event in its own right. See eg. http://www.cyc.com/tutorials/roles-and-event-predicates

IV. Other meanings of "Role" in applied ontology.

Some schools of thought use the term Role to refer to things like a being a Producer. Because some person may not always be or have been a Producer, they do not consider it appropriate for that individual to be an instance of Producer.
DOLCE and related work tend to follow this approach.

An alternative is to treat the person-as-producer as a subpart of the person, or to treat class membership as holding in an interval.

Posted by Martin 15/10/2014

Dear All,

The issue has been discussed in CRM-SIG in Heraklion. If we need 3ary relations, because
the vocabularies for these roles are not fixed at schema definition time, the only solution
is to introduce an RDF class for the relationship. It's no problem in ER, ooER and other metamodels.

Then we can play around with solutions, in which we regard this class being an E13, a reiification, a subevent or whatever, and in no case RDF will recognize the semantics.
The utility of reusing or abusing classes like E13 is questionable, in particular if we want to use
E13 to describe epistemological situations distinct from the default authors of the knowledge base.

The cleanest way appears to be, following the last discussion/proposal in CRM-SIG,
to introduce classes for all 3ary properties by a standard naming convention,
such as "R14Node_carried_out_by" , and declare by OWL rules the inferences based on
that, in particular, if roles form strict IsA hierarchies. Then, R14 is infered from R14Node by rule,
etc.

Opinions?

Best,

Martin

Posted by Martin  15/10/2014

Dear Simon,

Yes I agree.

I think in general, "role" can be 4 kinds of things, (or even more):
1) A permanent property of a person = E55 Type.
2) An office with distinct identity and unity from the individual fulfilling the role = E74  Group
3) A incidental relationship between an Actor and an Activity  = P14.1 and all the discussion here.
4) A functional specification of the default interaction between members of a structured group,
currently in the CRM expressed via membership P107.1. One could argue, that "membership" is a state, as such a kind of Temporal entity in its own right. If a type of such as state is equivalent to P107.1, is ontologically debatable.

3) and 4) and even other things boil down to the need to represent 3ary relations. I think we should stop discussing half-hearted work-around as if the problem would not exist.

Cheers,

Martin

Posted by Carlo Meghini      on 16/10/2014

> 4. In RDF it is possible to make assertions about an RDF statement by using the RDF reification mechanism. RDF reification is generally considered to be pretty bad (a reified statement does not even entail the original statement).
>
Well, I perceive this general sense of distaste for RDF reification, but I must confess I do not understand it.

By reifying a triple (s p o) and assigning a URI u to it, one merely says that in the world there exists a linguistic resource that is a statement and is identified by u. Why should this imply that the statement is true? As we all know, the world is full of statements that are not necessarily true (like for instance this message), and we talk about these statements using names for them. Reification just supports this.

If one wants to say that the triple, in addition to exist as a resource in its own right, also happens to be true, then one just asserts the triple (s p o) by adding it to the graph. But this is a separate operation. It would be in my opinion wrong to assert the triple just because it exists.

Carlo

Posted by Maria Theodoridou   on 16/10/2014Dear All,

In http://www.ics.forth.gr/isl/CRMext/Roles.pdf
I tried to present schematically three options for dealing with the "property of a property" issue in rdf.

What Martin calls "R14Node_carried_out_by" is "PC14 carried out by" in my figures.

Best,
Maria

Posted by Vladimiri on 16/10/2014Richard> the famous "property of a property" issue, which has proved to be a challenge for the CRM in an RDF context.

This and similar issues are of course not unique to CRM, and various approaches have been adopted in the RDF community.
See http://www.w3.org/TR/swbp-n-aryRelations/.
Approaches include:
- domain-specific mechanisms, e.g.
-- skosxl:Label reifies a label in its own node
-- PROV has both direct roles, and "qualified" (i.e. reified) roles:
  http://www.w3.org/TR/prov-o/#description-qualified-terms
  http://www.w3.org/TR/prov-o/#qualified-terms-figure
- statement reification (rdf:Statement),
- property reification vocabulary (that allows to express any domain-specific reification in a machine-accessible way)
  http://smiy.sourceforge.net/prv/spec/propertyreification.html
- or named graphs.
These methods allow you to add any extra data (attributes, provenance, etc) to relations.
I would suggest that before we adopt any modeling decision, we should study these existing practices.

See this paper "Types and Annotations for CIDOC CRM Properties" at http://www.ontotext.com/research-publications/?yr=2012
where I analyze some of the approaches in relation to CRM.

The already quoted https://confluence.ontotext.com/display/ResearchSpace/BM+Association+Map...
shows the actual patterns used in the BM mapping.

E13_Attribute_Assignment is rather fundamental to CRM because it's the mother of all long-paths (e.g. has_dimension is a shortcut, Measurement is a long-path).
It almost matches RDF reification (rdf:Statement with rdf:subject, rdf:object and rdf:predicate),
but it doesn’t have a way to point to the property (rdf:predicate).
Ergo the need for its extension bmo:EX_Association and bmo:PX_property.

Martin once made what I think is a productive remark, but it hasn't been taken seriously and investigated:
use E13_Attribute_Assignment.P2_has_type to hold the property.
Of course, this will make properties be E55_Type.

BTW, MARC Relators are defined in this way: they are both skos:Concept and owl:ObjectProperty.
E.g. see http://id.loc.gov/vocabulary/relators/abr.nt:
  <http://id.loc.gov/vocabulary/relators/abr> a owl:ObjectProperty.
    rdfs:subPropertyOf <http://id.loc.gov/vocabulary/relators/role>, <http://purl.org/dc/elements/1.1/contributor>.

I still can't make up my mind whether that’s an awful transgression or a neat trick.
Simon, what do you think?

Simon> Using reified associations labelled with concepts from a version of SKOS supporting hierarchical relationships
> does not automatically entail that hierarchy for the associations

Yes of course, the consequent facts would have to be asserted separately.
But do you see more troubles with such approach than that?

> declaring the more specific property to be a subproperty of the original property. 
> BM tried the "subproperty" strategy first, and found that it led to an explosion in the size of their data model,
> and didn't sit well with their actual data, e.g. their roles termlist

Yes, this has the following problems:
- huge ontology compared to the core it uses
- volatile ontology: every time role terms are modified, so needs to be the ontology
- you can't say more about the role instance (e.g. who when asserted it is so; probability/uncertainty, etc) without turning again to Reification
- while role instances are reflected in CRM (through subproperty inference), Reified statements *are not*.
  This means that a model using Reification over extension properties *is not* (fully) a CRM Extension as defined in the standard.

Richard> The trouble with the "sub-event" strategy is in my view two-fold:
> it is creating sub-events where there are none, simply to address a modelling problem with people having multiple roles;
> and it is falsely associating the role with the sub-event when that role is actually a property of the person involved in the sub-event.

That's not quite correct.
- The ability of CRM to always breakdown an event into subevents is quite powerful.
  E.g. you may consider "engraver", "printer" and "publisher" as 3 roles of the same production event,
  but I may very naturally consider them as 3 subevents of the production: "engraving", "printing" and "publishing".
  Then the standard carried_out_by is enough: the person who carried out the "engraving" is clearly the "engraver" of that CHO (object).
  No falsehood here.
- On the other hand, the BM has gone too far into breaking up events,
  since unfortunately in their database there's no info correlating the various fields of events (e.g. facts about production).
  So the BM mapping emits every fact in its own subevent: something that I don't think other museums should follow.
- And in some cases I agree the breaking up into subevents goes too far.
  E.g. a Change of Ownership where one owner got money (sold) but the other did not (donated).
  If that's modeled as subevents, shouldn't we also model fake (ideal) parts of the object that the two people owned before the transaction?

Martin> I think we should stop discussing half-hearted work-around as if the problem would not exist.

I agree, it's an important problem that has connections to one of the most important CRM constructs:
shortcuts vs long-paths.

Martin> introduce classes for all 3ary properties by a standard naming convention,
> such as "R14Node_carried_out_by"

Specific classes have both advantages and disadvanteges compared to a general class like E13.
These should be studied carefully...
I tend more towards a generic solution (currently that is).

Maria> http://www.ics.forth.gr/isl/CRMext/Roles.pdf

In "Solution 2", an important question is where do we put extra info: "PC14 carried out by" or "E7 Activity"

Carlo> I perceive this general sense of distaste for RDF reification, but I must confess I do not understand it.

Me neither. I use it e.g. in Getty to represent historic info on relations:
http://vocab.getty.edu/doc/#Applying_to_Relations_and_Place_Types
Maybe because it's non-economical:
If oen already has a domain-specific class that reifies the relation, one should use that.
But if there's no such class, I think rdf:Statement is ok.

Simon> In OWL 2 it is possible to add annotations to a property assertion axiom.
> These annotations are only about the particular act of assertion, rather than what is being asserted.

These are isomorphic to rdf:Statement and I still don't quite grok the difference.
Guess it's the same difference as between data/object properties, and annotation properties.
But rdfs:label is an annotation property, yet the whole world uses it for labels of objects.

Cheers!

Posted by Richard Light  on 16/10/2014

On 16/10/2014 12:08, martin wrote:
>
> I'd like to ask you to be focussed in your messages.
While we're being focused, could I point out that Vladimir hasn't yet received any guidance on his original question?

This related (IIUC) to a suggestion made by Martin and Dominic that, as an approach, sub-events are more "open world", while reification is more "closed world".

Richard

Posted by Karl Grossner   on 17/10/2014

This thread spurred me to finally revisit some work I did in 2010 that departed from both CIDOC and DOLCE by reifying a participated relation to get at roles among other things. Just wrote a blog post about it, with links and figures, and plan to convert the model soon from FOL and an object-relational schema to something more appropriate for 2014, like OWL2.  (http://kgeographer.com/wp/stuff1a/)

My (probably naive) view is that reification enables sensical open world systems, by permitting attribution of individual statements. Or if open world is strictly AAA, without identifying who Anyone is, what use would it be?
 

 

Posted by Carlo Meghini

Without sounding polemic, I’d like to comment on the “something more appropriate for 2014”. Please note that I am a peaceful guy and, on top of that, a great fan of description logics, which I have been using for twenty years (alas).

I think the appropriateness of logic is not time-related, but rather purpose-related. In setting off for a logical analysis of the CRM, my purpose is not implementation but rather understanding. My first understanding from the yet incomplete exercise, is that no OWL implementation is going to be equivalent to the CRM. So, if one is interested in understanding the CRM, he should NOT look at an OWL implementation. He may look at the current specs, but, if in need of some formal account, I would not know where to look.

The exercise proves difficult. Just to mention one source of difficulty, in a CRM KB one finds (1) facts encoded in triples, (2) (possibly) the same facts encoded in propositional objects about which other facts are expressed in triples, and (3) knowledge about the process that produced (1) and (2). Trying to linearise these representations, for instance to do sound reasoning on them, is in my opinion a challenging exercise.

And then there are extensions: the CRM is being extended in a number of ways and I believe it is better to analyse these extensions in the neutral language of logic, entirely free of any expressive limitations. Two interesting extensions are:

(1) handling different propositional attitudes in the same KB, i.e.: regarding some triple as categorical knowledge and other triples as beliefs, or hypothesis, and being able to reason about both.

(2) handling uncertainty. 

Posted by    Karl Grossner   on 18/10/2014

> Without sounding polemic, I’d like to comment on the “something more
> appropriate for 2014”. Please note that I am a peaceful guy

thank goodness, me too. ;^)

> and, on top of that, a great fan of description logics,
> which I have been using for twenty years (alas).
> I think the appropriateness of logic is not time-related, but rather
> purpose-related.

“something more appropriate for 2014” was an ironic dig at myself for publishing my modeling patterns written in FOL, a form few can read, and not directly usable in systems -- as opposed to RDFS and OWL, which may be sufficient, and are accessible to more colleagues in the DH projects I work on. I love logic and the promise of inference and enthusiastically support analysis of CRM and its potentially more formal expression! I was 'not up to DL' at the time -- not critical of it.

> In setting off for a logical analysis of the CRM, my
> purpose is not implementation but rather understanding. My first
> understanding from the yet incomplete exercise, is that no OWL
> implementation is going to be equivalent to the CRM. So, if one is
> interested in understanding the CRM, he should NOT look at an OWL
> implementation. He may look at the current specs, but, if in need of some
> formal account, I would not know where to look.
>

I see now that you and Martin had a paper on CRM in FOL. I had missed this, and wasn't commenting on the worth of the effort at all! The representation of my own ontology design patterns in FOL gave me understanding (and clarified still unanswered dilemmas). Ultimately I'm very attuned to implementation right now.

> And then there are extensions: the CRM is being extended in a number of ways
> and I believe it is better to analyse these extensions in the neutral
> language of logic, entirely free of any expressive limitations.

I guess I agree, since that's what I did for my own. They were/are more extensions of DOLCE, but CRM was always in the mix.

I will read the paper and follow this work with great interest!

Posted by Christian - Emil  on 19/10/2014

Dear all,
I have to admit that I was somewhat skeptical when I first heard about the work on defining CIDOC-CRM in first order logic. The museum sector may find the CRM definition hard enough as it is. When I read the draft paper before the CRM-SIG meeting earlier this month, it stroke me that the formulation of CRM in first order logic was compact but very clarifying to me (I have to add that I have worked with logic and type theory before converting into Digital Humanities).  I don't demand that everybody should read  statements in first order logic and  I don't demand that everybody should understand the long and complex formulation in OWL (in fact much more difficult to understand than the plain notation of FOL).  A nice effect of the FOL formulation of CRM is that it will serve as a concise specification for  the groups developing OWL  (and DL) implementations of the standard, eg in Erlangen. 

Current Proposal: 

In the 32nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 25th FRBR - CIDOC CRM Harmonization meeting, the sig accepted the model proposed by MD about  the properties of properties and decised  to be Implemented  in RDFS as a separate module of CIDOC CRM . The same should be done for FRBRoo. Also it is decided  to exist an official version of CRM in OWL. The issue is closed.

Oxford, February 2015