Issue 257: Shortcut semantics

Starting Date: 
2014-08-31
Working Group: 
2
Status: 
Done
Closing Date: 
2015-02-12
Background: 

Posted by Martin on 31/8/2014 

Dear All, 

I work together with Carlo Meghini on a formalization of CIDOC CRM in first order logic, in order to have an encoding neutral, compact form for OWL implementations and other reasoning services. We'll present this work for discussion in the next CRM-SIG meeting. 

One issue that occurred is the meaning of shortcuts. 
In all cases, the extended path implies the shortcut, but only in a few cases the shortcut implies a particular path so that the existence of the intermediates can be inferred. (such as "rights held by"). 

Who would volunteer to scan the latest version for that? 

Posted by Mark Fichtner 1/9/2014 

Dear Martin, 

Some time ago we also had a discussion about this topic on the Erlangen CRM mailing list. Perhaps some of this work is already done, have a look here: 
https://groups.google.com/forum/#!msg/erlangen-crm/ojF1l8EhoPc/2bzmgIbW3sQJ - List of Shortcuts 
https://groups.google.com/forum/?hl=en#!topic/erlangen-crm/tJ4JDk8tcQI - Discussion about Shortcuts 

Posted by Vladimir 11/09/2014

a few cases the shortcut implies a particular path so that the existence of the intermediates can be inferred. (such as "rights held by").

Hi Martin! 
In my opinion, a shortcut should never infer a long-path. 
– IMHO the purpose of a long-path is to provide additional info (e.g. a date), but the shortcut cannot infer that 
– the purpose of a shortcut is to allow simpler representation, so why also infer a more complex but incomplete representation (no additional details)? 
– in cases when both shortcut and long-path are provided explicitly, an inferred long-path would be superfluous (duplicate), or a system needs to go through extra effort to somehow correlate the inferred to an existing long-path 

 

Posted by Martin 11/09/2014 

a few cases the shortcut implies a particular path so that the existence of the intermediates can be inferred. (such as "rights held by"). 
Hi Martin! 
In my opinion, a shortcut should never infer a long-path.

Hi Vladimir, 

My question was about logic, and not about a knowledge base. We intend to separate these two strictly. 

In case a longpath can be inferred, it logically exists. If one or more of the intermediate nodes can be inferred to exist, they are potentially present in other information we are interested in integrating with it. Then, we can find new links the shortpath did not provide. 

That is the idea. 

Posted by Detlev Balzer 11/9/2014 
Dear Vladimir, 

I'd second Martin's view from a very practical perspective. 

You've probably seen dozens of databases containing statements such as artifact A was created by agent B. If we assume that all of these statements imply the existence of a creation event, then we have a clear migration path in cases where additional information needs to find a suitable representation. 

Another use case is integrating "low-res" and "high-res" knowledge bases, where "low-res" statements have to be translated into a more complex representation even if no information is added during the process. 

Posted by Christian Emil on 12/09/2014 
II also support the view of Martin. Detlev has very good and clear example. If we assume that a property in CRM is a predicate in the first order logic sense then a shortcut definition S is a shortcut via P1 and P2 , is a something like 

S(x,y) => Exist z( P1(x,z) & P2(z,y) 

Does the implication also go in the opposite direction? Intuetivele it should, but maybe I am wrong here. 

Posted by Vladimir on 19/09/2014 

Martin> My question was about logic, and not about a knowledge base. We intend to separate these two strictly.

If you encode your ideas in OWL, certain inferences will follow in a very practical sense. 
I've read the paper
I'll wait for a concrete OWL axiomatization before making detailed comments. 

But here are some very brief comments: 
– There is no class rdf:Literals (the spelling is rdf:Literal) 
– E41 should not be made Literal!! E62 String should be 
– "E50 Date finds a natural correspondent in xsd:date. Since datatypes lie outside OWL proper"... I think that's false, see http://www.w3.org/TR/owl2-syntax/#Time_Instants. And in other places you do use datatypes 
– "SubClassOf(owl:real crm:E60)": I don't think you can make supertype assertions over datatypes. 
Here are the datatype constructs allowed: http://www.w3.org/TR/owl2-syntax/#Data_Ranges 
- http://www.w3.org/TR/owl2-syntax/#Real_Numbers.2C_Decimal_Numbers.2C_and...
"The owl:real datatype does not directly provide any lexical forms." 
Which means it can't be used in practice. 
Instead, use xsd:decimal which is an infinite-precision number. 

Christian> S(x,y) => Exist z( P1(x,z) & P2(z,y) 

In practical terms, this will infer an unknown resource (blank node) z. 
Trying later to locate this blank node so you can attach more info to it when integrating richer databases... is a lost proposition. It's 

OWL2 itself infers many such "Skolem variables". E.g. 
– if you say Father is someone male who has a child, Peter is a father, and there is no child instance of Peter, then OWL semantics will infer an unknown child. 
– if you say Legal Objects must have at least one Right (through a restriction axiom), OWL will infer an unknown Right for each Legal object 

The utility of such Skolem variables was criticized strongly a 3-4 years ago the "OWL 2 Much?" seminar .

Does the implication also go in the opposite direction? Intuetivele it should, but maybe I am wrong here.

I think it should, using owl:propertyChainAxiom.

Detlev> artifact A was created by agent B. If we assume that all of these statements imply the existence of a creation event, then we have a clear migration path in cases where additional information needs to find a suitable representation.

When you have extra info, you can make the longpath and extra node. 
Before you have that info, why make the node? 
Are you familiar with the "You are not gonna need it" (YAGNI) principle? A modern view on Occam's Razor

Another use case is integrating "low-res" and "high-res" knowledge bases, where "low-res" statements have to be translated into a more complex representation

CRM already forces everyone to make "high res" statements in many cases. 
E.g. you don't say "birth date" but make a Birth event, same for Production, etc. 
But when CRM provides a more economical representation, why LOGIC should force everyone to also use a more long-winded representation, that adds no value because it has no extra data? 

Integrating a "high-res" database to a forcefully-Skolemized "low-res" database is not going to be any fun. 
How do you locate the unknown nodes (blank nodes), and how can you be sure you're talking of the same event, so it's legitimate to merge them? 

The whole system of shortcuts is more complex than most people think. 
– *all* properties are shortcuts of E13 Attribute Assignment. 
** Does that mean we should infer an unknown E13 for *every* statement? 
** How about E13's own properties P140 and P141: should we reify those with E13, ad infinitum? 
– There are longcut chains that cross over, e.g.: 
P8 is shortcut of P7-E53-P87-E46-P58i 
P59 is shortcut of P58-E46-P87i 
So P87 is a common link in both: I have not analyzed what would be a consequence of that (if any)... 
– there are some unstated shortcuts. E.g. P57 has number of parts is clearly the result of some measurement... 
But what is the Type (items, volumes, pages, paragraphs)? 

Posted by Martin 19/09/2014 
Hi Alexiev,

On 19/9/2014 12:19 μμ, Vladimir Alexiev wrote: 
Martin> > My question was about logic, and not about a knowledge base. We intend to separate these two strictly. 
If you encode your ideas in OWL, certain inferences will follow in a very practical sense.

To be more explicit about what I mean. The CIDOC CRM is an ontology, a conceptualization of possible states of affairs in this (real) world. 
The exercise we do here is to define the real world logic, and then to control to which degree OWL or whatever can describe it. We are not interested in "how nice what OWL can do for me", but what are the limits with respect to the logic we hold to assume for the real world. The second step will be, to define what knowing means. Standard example: 
We know everybody has exactly one father (categorical), but our knowledge (factual) is about zero, one or many. We will discuss in the next meeting with Carlo Meghini a quite successful theory to do that. That's a philosophical problem, not a question of encodings. 
From that, we should be able to provide more objectively justified implementation guidelines for certain scholarly tasks. 

 

I've read the paper (http://www.cidoc-crm.org/docs/OWL_formalisation_of_the_CRM.pdf)
I'll wait for a concrete OWL axiomatization before making detailed comments. 
But here are some very brief comments: 
– There is no class rdf:Literals (the spelling is rdf:Literal)

How important

- E41 should not be made Literal!! E62 String should be

To be discussed in the meeting.

– "E50 Date finds a natural correspondent in xsd:date. Since datatypes lie outside OWL proper"... 
I think that's false, see http://www.w3.org/TR/owl2-syntax/#Time_Instants. 
And in other places you do use datatypes 
– "SubClassOf(owl:real crm:E60)": I don't think you can make supertype assertions over datatypes. 
Here are the datatype constructs allowed: http://www.w3.org/TR/owl2-syntax/#Data_Ranges 
http://www.w3.org/TR/owl2-syntax/#Real_Numbers.2C_Decimal_Numbers.2C_and...
"The owl:real datatype does not directly provide any lexical forms." 
Which means it can't be used in practice. 
Instead, use xsd:decimal which is an infinite-precision number.

For all datatypes with explicit lexical forms, we have the problem that they are not complete with respect to what we want to cover with the CRM. They represent useful subsets of the logical forms, such as with real numbers. Here ontology in the true sense and database schema diverge. No simple answers possible. 

Christian> S(x,y) => Exist z( P1(x,z) & P2(z,y) 
In practical terms, this will infer an unknown resource (blank node) z. 
Trying later to locate this blank node so you can attach more info to it when integrating richer databases... is a lost proposition. It's

How that? (Blank nodes are an implementation choice)

OWL2 itself infers many such "Skolem variables". E.g. – – if you say Father is someone male who has a child, Peter is a father, and there is no child instance of Peter, then OWL semantics will infer an unknown child. 
– if you say Legal Objects must have at least one Right (through a restriction axiom), OWL will infer an unknown Right for each Legal object The utility of such Skolem variables was criticized strongly a 3-4 years ago the "OWL 2 Much?" seminar .

Well, such criticisms may be done under certain assumption of implementation environments, capacities of database engines and user funtionality, or in the confusion of real world semantics with the latter. Not helpful without the underlying assumptions. Again, we discuss real world semantics here. 

Does the implication also go in the opposite direction? Intuetivele it should, but maybe I am wrong here. 
I think it should, using owl:propertyChainAxiom. 

Detlev> artifact A was created by agent B. If we assume that all of these statements imply the existence of a creation event, then we have a clear migration path in cases where additional information needs to find a suitable representation. 
When you have extra info, you can make the longpath and extra node. Before you have that info, why make the node?

You have not understood: the question was, if the intermediate exists, and if it is unique or not. Not, if you materialize it. 
 

Are you familiar with the "You are not gonna need it" (YAGNI) principle? A modern view on Occam's Razor 
Another use case is integrating "low-res" and "high-res" knowledge bases, where "low-res" statements have to be translated into a more complex representation 
CRM already forces everyone to make "high res" statements in many cases. 
E.g. you don't say "birth date" but make a Birth event, same for Production, etc. 
But when CRM provides a more economical representation, why LOGIC should force everyone to also use a more long-winded representation, that adds no value because it has no extra data?

Logic doesn't force any implementation. Only utility does. Integrating a "high-res" database to a forcefully-Skolemized "low-res" database is not going to be any fun. How do you locate the unknown nodes (blank nodes), Instance matching S/W.

and how can you be sure you're talking of the same event, so it's legitimate to merge them?

That makes the difference between a scholar and a techie .

The whole system of shortcuts is more complex than most people think. 
*all* properties are shortcuts of E13 Attribute Assignment. 
** Does that mean we should infer an unknown E13 for *every* statement? 
** How about E13's own properties P140 and P141: should we reify those with E13, ad infinitum?

Well, that's why we need a theory of knowing, and who knows what's in a database. That's why a database does not contain knowledge, only information. 
[…]

posted by Vladimir on  29/9/2014

 

>> Before you have [extra] info, why make the node?
> the question was, if the intermediate exists, and if it is unique or not. Not, if you materialize it.
 
Of course it's not unique. 
Two different long-paths may lead to the same shortcut.
E.g. consider two people making independent measurements, and coming up with the same result; 
or making independent statements that happen to agree about the value.
 
This is in fact present in ULAN:
there's a table BIOGRAPHY that has birth/death place/date, and Contributor (i.e. who said it).
That corresponds to several E13 Attribute Assignments, which may well be over the same triple.
 
>> - *all* properties are shortcuts of E13 Attribute Assignment.
>>    ** Does that mean we should infer an unknown E13 for *every* statement?
>>    ** How about E13's own properties P140 and P141: should we reify those with E13, ad infinitum?
 
You have classified shortcuts into "soft" and "strict". 
Can you please explain the underpinnings of that classification?
 
I really hope the E13 Attribute Assignment long-path is considered "soft" and doesn't lead to expansion ad-infinitum.
Outcome: 

In the 32nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 25th FRBR - CIDOC CRM Harmonization meeting, the crm-sig considered this issue obselete. The sig assigned to Christian Emil to prepare a list of shortcuts in order to be expressed in FOL. This issue is closed. The discussion will continue in issue 276.

Oxford, February 2015

Reference to Issues: