Issue 197: How to represent imprecision(E60 Number and E61 Time Primitive)

Starting Date: 
2011-10-05
Working Group: 
3
Status: 
Done
Closing Date: 
2011-11-17
Background: 

Background Posted by Vladimir on 5/10/2011

Very often in the museum domain measurements are imprecise, so dimensions must be expressed as an interval.

1. Imprecise Dimension
E54 Dimension says "The properties of the class E54 Dimension allow for expressing the numerical approximation of the values of an instance of E54 Dimension".
My understanding is that can only happen through: E54 Dimension. P90 has value: E60 Number E60 Number says "... including *intervals* of these values to express *limited precision*".

Regarding time spans, CIDOC CRM allows imprecision to be expressed in two ways:

2. Imprecise Duration
E52 Time-Span. P83 had at least duration. E54 Dimension
E52 Time-Span. P84 had at most duration. E54 Dimension

IMHO this pair of properties is unnecessary, since:
- E54 Dimension already accomodates (or should accommodate) imprecision, see 1
- If we have this pair, then shouldn't we also split P43 has dimension in two (has minimum dimension, has maximum dimension)?
- The pair allows "P91 has unit" of the two Dimensions to differ, which I think is unnecessary 
("between 1 and 2 cm" is used often, but who'd say "between 1 cm and 1 meter"?)

3. Imprecise Start/End As depicted in the CRM Tutorial (online at http://personal.sirma.bg/vladimir/crm-tutorial/#slide27) two properties allow to express the Outer & Inner bounds of a Time-Span:
E52 Time-Span. P81 ongoing throughout: E61 Time Primitive (outer bound) 
E52 Time-Span. P82 at some time within: E61 Time Primitive (inner bound)
Each of the bounds has start/end. This is confirmed by the spec:
E61 Time Primitive says "... interval logic to express *date ranges*"

Let's see what the current RDFS/OWL implementations of CIDOC CRM offer (neither one allows E54 Dimension to express a numerical approximation, i.e.item 1):

4. OWL2 DL proposal
http://bloody-byte.net/rdf/cidoc-crm/core_5.0.1.rdf 
rdf:about="http://purl.org/NET/cidoc-crm/core#P90_has_value"> 
rdf:resource="http://purl.org/NET/cidoc-crm/core#E54_Dimension"/> 
rdf:resource="http://www.w3.org/2000/01/rdf-schema#Literal"/> 
This property allows an E54 Dimension to be approximated by an E60 Number primitive. 

5. OWL DL http://erlangen-crm.org/current P90_has_value is a Data Property 

6. RDFS http://old.cidoc-crm.org/rdfs/cidoc_crm_v5.0.2.rdfs
"The primitive values "E60 Number"... are interpreted as rdf: 
literal.

Seme4 defined a CRM extension for the British Museum (called BMX), see http://crm.rkbexplorer.com. It defines several extension properties (prefix PX):

7. PX.min_value, PX.max_value as subPropertyOf P90F.has_value.
- If you assert e.g. min_value=35 and max_value=45, that would infer *both* has_value=35 and has_value=45, which I think is strange.Instead I'd leave has_value independent, and set it to the average of min_value and max_value using some calculation
- This implements the requirement 1, but is it faithful to CIDOC CRM? 
CIDOC CRM says the imprecision should be captured in the domain of P90.has_value, not through parallel properties

8. PX.time-span_earliest, PX.time-span_latest as properties of E52.Time-Span.
- (Actually these are defined merely as rdf:Property and don't specify the domain and range).
- these properties are superfluous, given P81 ongoing throughout and P82 at some time within
- they don't allow to capture outer & inner bound, as per 3
- they are unrelated to CIDOC CRM properties, so the extension is not CRM Compatible.
A compatibility condition from the CRM Intro is:
"all properties of the extension are either subsumed by CRM properties, or are part of a path for which a CRM property is a shortcut"
See online here:
http://personal.sirma.bg/vladimir/crm/introduction.html#extensions 
CIDOC CRM leaves an important question (imprecise dimensions) unspecified, hidden in the scope notes of primitives E60 Number and E61 Time Primitive.
This shouldn't be dismissed as "mere RDF implemenattion issue" since it is important for practical CRM interoperability.

What would be the best way to represent imprecision?

9. If we define E60 Number and E61 Time Primitive as RDF classes, that would imply minimal changes to CIDOC CRM.
- E60.Number with dataProperties crm:min_value, crm:max_value, and rdf:value (average or expected)
- E61.Time_Primitive with dataProperties crm:min_date, crm:max_date, and maybe rdf:value
- (see 2) The pair P83.had_at_least_duration and P84.had_at_most_duration should be merged to one property has_duration

10. I'm sure that people who expect P57 has number of parts. to be a simple xsd:integer will be very unhappy to suddenly find a class E60.Number (and rightly so!)But E60.Number also gives examples of complex numbers, 3D coordinates,etc... So it really is not a literal, it needs to be a class


 

Posted by Martin on 6/10/2011
Dear Vladimir, 

Thank you very much for your important questions. As a general remark I'd like to remind you that the CIDOC CRM as a standard is an ontology in the narrower sense, a formal model approximating a human conceptualization, and not a standard database schema. Any implementation, in particular any RDF Schema, is again an approximation of this conceptualization. The CRM has a much wider scope and longer life-cycle than RDF. In Relational Databases, quite different issues occur. 
The Definition of the CIDOC CRM makes very clear that "Primitive Values" are dependent on the capabilities of the respective IT infrastructure. 

These details cannot be standardized in the same way as the CRM, because the change in shorter periods of time than the ones for which we want to have conceptual interoperability, not bitwise interoperability. Therefore the CRM refers loosely to concepts of time and number in a mathematical sense. So far, no database implementation is compatible with all mathematical numerical systems. 
Rather, we can make mathematical models of the database implementations and by that devise algorithms to mediate between different implementations.


Posted by Christian Emil Ore on 6/10/2011
Dear all,
I think it will be very unwise to remove the 
E52 Time-Span. P83 had at least duration. E54 Dimension 
E52 Time-Span. P84 had at most duration. E54 Dimension 

As JOn Holmen and I has shown in a system for time reasoning in connection with archaeology, see the paper at the end of http://www.edd.uio.no/artiklar/arkeologi/holmen_ore_caa2009.pdf these properties are quite useful. In fact, they model the basic way historians work or field archaeologists for that matter.

The idea of dating as a measurement + dimension to express imprecision is fine scientific dating methods as C14. However, for dating based on reading of written sources and historical calendars it is not sufficient. We need both. To take away the P83 and P84 will reduce the expressive power of CRM. It is a little ike removing way models of light because light can be seen as particles.

 

Current Proposal: 

The pair P83.had_at_least_duration and P84.had_at_most_duration should be merged to one property has_duration

Outcome: 

Rejected: The least or at most duration is a state of knowledge, not the imprecision of measuring the true duration.
Use recommendation for RDF implementation of CRM TIME