Issue 397: Dimension Intervals

Starting Date: 
2018-11-14
Working Group: 
3
Status: 
Open
Background: 

Posted by Martin on 7/11/2018

 

Continuing issue 363,

I propose the following:

"Whereas the CRM regards that intervals of primitive values are primitive values by themselves, there is currently no corresponding practice in RDF. Therefore, in analogy to the properties of E52 Time-Span, we define in CRM RDFS two more subproperties of P90 has value: “P90a_has_lower_value_limit” and “P90b_has_upper_value_limit”. The precise guidelines for using these properties are to be given."

Sensor arrays, more and more in use, pose the issue of a single measurement resulting in an array of numbers which altogether form one quantitative statement about the observed. We can describe such structures easily as one complex type of unit (and define an IRI for it), and then regard the value to a matrix of numbers, in which each position obeys subunits as defined in the complex unit type.

Even if we regard complex matrices of numbers as one value for an instance of E54 Dimension, such as RGB image, we can argue that minimal and maximal values exist as two separate matrices of the same structure.

Consequently I propose to deprecate P83, P84, because in competes with an interval interpretation of P90, and :

Introduce instead Pxxx had duration, Domain:  E52 Time-Span, Range: E54 Dimension
and use the P90, P90a, P90b as adequate

or introduce  an Exxx Temporal Duration , subclass of E54 Dimension, and define subproperties in RDFS ending in xsd:duration.

See:

P83 had at least duration (was minimum duration of)

 

Domain:              E52 Time-Span

Range:                E54 Dimension

Quantification:    one to one (1,1:1,1)

 

Scope note:         This property describes the minimum length of time covered by an E52 Time-Span.

 

It allows an E52 Time-Span to be associated with an E54 Dimension representing it’s minimum duration (i.e. it’s inner boundary) independent from the actual beginning and end.

Examples:        

§  the time span of the Battle of Issos 333 B.C.E. (E52) had at least duration Battle of Issos minimum duration (E54) has unit (P91) day (E58) has value (P90) 1 (E60)

 

In First Order Logic:

                           P83(x,y) ⊃ E52(x)

                           P83(x,y) ⊃ E54(y)

P84 had at most duration (was maximum duration of)

Domain:              E52 Time-Span

Range:                E54 Dimension

Quantification:   one to one (1,1:1,1)

Scope note:         This property describes the maximum length of time covered by an E52 Time-Span.

It allows an E52 Time-Span to be associated with an E54 Dimension representing it’s maximum duration (i.e. it’s outer boundary) independent from the actual beginning and end.

Examples:        

§  the time span of the Battle of Issos 333 B.C.E. (E52) had at most duration Battle of Issos maximum duration (E54) has unit (P91) day (E58) has value (P90) 2 (E60)

In First Order Logic:

                           P84(x,y) ⊃ E52(x)

                           P84(x,y) ⊃ E54(y)

Current Proposal: 

Posted by Richard Light on 8/11/2018

While we're looking at this area, I would be grateful if we could also look at Value and Unit.

I have never understood how P90 and P91 are actually meant to be used together. I can see how a single E54 can be represented by a single P90 and a single P91, but how do we represent anything more complex?  An example would be "3 ft 6 inches".  Can that be an E54 Dimension, and if so how do you know which unit applies to which value?

Posted by Robert Sanderson on 8/11/2018

+1 to this issue.

This also happens a lot with MonetaryAmounts.  4 shillings and 6 pence is not 4.6 of any one currency.

Posted by Martin on 8/11/2018

Dear Richard,

It requires a sort of datatype or encoding.

Assume unit = "ft&inches"
               value = <3,6>

would that make sense?

In the xsd datatypes everything is in the value already.

Posted by Franco on 9/11/2018

Martin,

I agree with you, E60 Number is a jack-of-all-trades and can be a couple, a triple, whatever numeric value or set of values as long as it is clear what is what.

So for ancient/nonstandard/local units such as ft & inches or Roman cubitus I would add:

E58 Measurement Unit “ft&inches” P70 is documented in E31 Document “F.W Clarke, Weights Measures and Money of all Nations. Appleton & C. New York 1888”.

Incidentally, Prof. Clarke (from the U. of Cincinnati) wrote in the introduction “Our three sets of weights, our three different gallons, and our two dissimilar bushels, all unrelated to each other, or to the units of length, must soon give way before the simplicity and elegance of the metric system. That this event my soon happen [...] is the sincere wish and hope of the writer.” 130 years have passed since then, at no avail.

Thus, I would at least regard any such unit (system) as local or historical, and therefore needing a reference description: otherwise for me - and for any scientist - that value of 3 ft 6 inches could equally well be the distance of Alpha Centauri from the Earth, or the size of a bacterium.

Posted by Richard Light on 9/11/2018

On 08/11/2018 20:00, Martin Doerr wrote:

> Dear Richard,
>
> It requires a sort of datatype or encoding.
>
> Assume unit = "ft&inches"
>                value = <3,6>
>
> would that make sense?
>
> In the xsd datatypes everything is in the value already.

The XSD datatypes all resolve to single values, so don't give a clear steer from them as to how to deal with the 'multiple units' issue.

I can see what you're saying as regards a 'complex' datatype, but I can't find examples on the Web of how the value would actually be encoded as an RDF value which software agents could do anything useful with.

The best I have come up with is this document from 2002:

http://infolab.stanford.edu/~melnik/rdf/datatyping/

which has some heavy hitters associated with it.  Is this the sort of approach you are proposing?

A slightly more complex example would be a geographical coordinate expressed as latitude and longitude (both expressed as degrees, minutes and seconds).

Posted by Martin on 10/11/2018

Dear Richard, All,

I think we need some expert in the respective kinds of syntax. I hope there is someone on this list working more at the programming level. I am no more working at the programming level.
I believe from a general point of view, it is a non-issue. I regard this not a question of feasibility, but getting an IT guy trained in this. I hope someone on this list knows or knows who knows

The example "feet/inches" and "degrees, minutes and seconds" is mathematically exactly the same as a date composed of "Year/month/day", even more simple, because their are no leap-years etc. So, since the one works, the others must work as well, in analogy.

The geometric primitives, for instance WKT strings, describe points and volumes in 3- or even 4-dimensional spaces. Since this works, any n-dimensional value can be represented in the same way.

A simple way is this:
In a literal, we can store any XML or JASON chunk, and represent the schema as "unit".

Of course, we need to spell this out

In the 42nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 35th FRBR - CIDOC CRM Harmonization meeting, after discussing MD's proposal, the crm-sig decided in favor of introducing a new property Pxx had duration linking an instance of 52 Time-Span to an instance of E54 Dimension. MD was assigned with writing the scope note for that.

Also the sig decided to accept the proposed subproperties of Pxx had duration in CRM-RDFS . These are: P90a_has_lower_value_limit  and P90b_has_upper_value_limit . Consequently, the sig decided to deprecate P83 had at least duration (was minimum duration of), P84 had at most duration (was maximum duration of), because they compete with an interval interpretation of P90.

The domain of new Pxx had duration should be the E52 Time-Span and its range should be E54 Dimension. Migration paths from the deprecated properties are to be made explicit.

The alternative proposal of introducing a class Exx Temporal Duration, such that it is a subclass of E54 Dimension, and define properties it will participate in, was rejected by the crm-sig.
The idea is that duration, defined by time intervals, can be treated as a kind of dimension, and thus be defined by its inner and outer limits -corresponding to the lower and upper value, respectively. Deprecating the specific properties for temporal duration (P83/P84) in favor of a more generically applied set of properties will help increase the consistency of the model.
The alternative proposal by RL, that the said property be called 'has dimension' was not further discussed.

Berlin, November 2018

Posted by Martin 15/2/2019

Dear All

As discussed in Berlin, I proposed to deprecate P83, P84, because in competes with an interval interpretation of P90, and :

Introduce instead Pxxx had duration, Domain:  E52 Time-Span, Range: E54 Dimension
and use the P90, P90a, P90b as adequate or introduce  an Exxx Temporal Duration , subclass of E54 Dimension, and define subproperties in RDFS ending in xsd:duration.

Here my definition:

Pxxx had duration (was duration of)

Domain:              E52 Time-Span

Range:                E54 Dimension

Quantification:    one to one (1,1:1,1)

Scope note:         This property describes the length of time covered by an E52 Time-Span. It allows an E52 Time-Span to be associated with an E54 Dimension representing duration (i.e. it’s inner boundary) independent from the actual beginning and end. Indeterminacy of the duration value can be expressed by assigning a numerical interval to the property P90 has value of E54 Dimension.

Examples:       

§  the time span of the Battle of Issos 333 B.C.E. (E52) had duration Battle of Issos minimum duration (E54) has unit (P91) day (E58) has value (P90) (E60)

In First Order Logic:

                           Pxxx(x,y) ⊃ E52(x)

                           Pxxx(x,y) ⊃ E54(y)

Comments?

------------------------------------------------------------------------------------------------------

See:

P83 had at least duration (was minimum duration of)

Domain:              E52 Time-Span

Range:                E54 Dimension

Quantification:    one to one (1,1:1,1)

Scope note:         This property describes the minimum length of time covered by an E52 Time-Span.

It allows an E52 Time-Span to be associated with an E54 Dimension representing it’s minimum duration (i.e. it’s inner boundary) independent from the actual beginning and end.

Examples:       

§  the time span of the Battle of Issos 333 B.C.E. (E52) had at least duration Battle of Issos minimum duration (E54) has unit (P91) day (E58) has value (P90) 1 (E60)

In First Order Logic:

                           P83(x,y) ⊃ E52(x)

                           P83(x,y) ⊃ E54(y)

P84 had at most duration (was maximum duration of)

Domain:              E52 Time-Span

Range:                E54 Dimension

Quantification:   one to one (1,1:1,1)

Scope note:         This property describes the maximum length of time covered by an E52 Time-Span.

It allows an E52 Time-Span to be associated with an E54 Dimension representing it’s maximum duration (i.e. it’s outer boundary) independent from the actual beginning and end.

Examples:       

§  the time span of the Battle of Issos 333 B.C.E. (E52) had at most duration Battle of Issos maximum duration (E54) has unit (P91) day (E58) has value (P90) 2 (E60)

In First Order Logic:

                           P84(x,y) ⊃ E52(x)

                           P84(x,y) ⊃ E54(y)

 

Posted by Robert Sanderson on 23/2/2019

This becomes problematic, unfortunately, in RDF which does not have a way to natively express a Number that is actually an interval.  The resolution would be to do the same as P81a/b … which would have the same effect as maintaining P83 and P84, just not in the model directly.

While I appreciate the theoretical consistency that this change would add, from an implementation perspective, this would bring more complexity than value.

Overall, I’m not in favor of the deprecation, but am not averse to adding had_duration separately, with the potential to deprecate 83 and 84 if a holistic approach to date and number intervals can be devised.

 

Thanks!

 

Posted by Martin on 23/2/2019

Dear Robert,

On 2/23/2019 1:09 AM, Robert Sanderson wrote:
>

>
> This becomes problematic, unfortunately, in RDF which does not have a way to natively express a Number that is actually an interval.  The resolution would be to do the same as P81a/b … which would have the same effect as maintaining P83 and P84, just not in the model directly.
>

>
> While I appreciate the theoretical consistency that this change would add, from an implementation perspective, this would bring more complexity than value.

I do not understand what increases the complexity: If I have in RDFS two paths  P83-E54-P90 AND P83-E54-P90, and the ambiguity how to use P90a, P90b together with these paths, OR I have a single path Pxxx-E54 that splits into P90a, P90b, then, in the end I have again two paths: Pxxx-E54-P90a AND Pxxx-E54-P90b and no ambiguity to use P83 or P90a.

So where is the added complexity? I see it only reduced, but I may be wrong!

My second question was if, since we have bound the Dimension already to temporal durations in the definition of Pxxx, we should express that by a subclass of E54.

Best,

In the 43rd joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 36th FRBR - CIDOC CRM Harmonization meeting, the sig reviewed MD’s HW on coming up with a property Pxxx has duration (was duration of) [D: e52 Time-Span, R: E54 Dimension] to be used as the equivalent of P90 has value (and P90a, P90b), and the subsequent deprecation of P83 & P84 that compete with an interval interpretation of P90. 
 

DECISION: the sig accepted MDs proposal (the insertion of a new property and the deprecation of P83 & P84), as well as the definition for Pxxx had duration (was duration of), with minor modifications. 
 

DECISION: the duration example should appear under E54 Dimension.  The definition of the new property reads: 
 

Pxxx had duration (was duration of)
Domain:     E52 Time-Span
Range:        E54 Dimension
Quantification:    one to one (1,1:1,1)
Scope note:     This property describes the length of time covered by an E52 Time-Span. It allows an E52 Time-Span to be associated with an E54 Dimension representing duration independent from the actual beginning and end. Indeterminacy of the duration value can be expressed by assigning a numerical interval to the property P90 has value of E54 Dimension.
Examples:        
§  the time span of the Battle of Issos 333 B.C.E. (E52) had duration Battle of Issos duration (E54)
In First Order Logic:
                           Pxxx(x,y) ⊃ E52(x)
                           Pxxx(x,y) ⊃ E54(y)

PROPOSAL: maybe other relevant examples could be used (the Battle of Varus or the WW1 or the WW2 –especially in view of the fact that neither its beginning nor its end occurred at the same time at different parts of the world, like for instance the state of war between Greece and Albania that lasted until 1987).
 

Posted by Robert Sanderson on 11/6/2019

Apologies for missing this back in February …

Before the deprecation of P83 and P84 in favor of P191, it was possible to say that a TimeSpan had a minimum duration of 2 days and a maximum duration of 4 days by using P83 and P84.

Now there is only a single Dimension related via P191, with the intent that the value can be an interval.

Given that in the RDF projection of CRM, the value of a Dimension is a single number (and similarly, the dates are single dates), it is not possible to express the above without some additional constructions in that projection.

Thus it seems like we need at least to define P90a_has_minumum_value and P90b_has_maximum_value as properties of Dimension to be able to express the interval value. This would be more consistent, and provide access to the construction for other uses of Dimension, so I’m happy with the deprecation of the last SIG … but we need to follow through with the corresponding RDF definitions.

I propose the following properties, which could be defined in the same document as P81a/b and P82a/b:

P90a_has_minimum_value

This property allows the lowest possible value of an E54 Dimension to be approximated by an E60 Number primitive.

P90b_has_maximum_value

This property allows the greatest possible value of an E54 Dimension to be approximated by an E60 Number primitive.
 

Posted by Martin on 11/6/2019

Dear Robert,

I may have lost the track. I had published my final version of the guidelines before January. A final approval may be pending, but I have elaborated in much details these properties. Yes, of course, they have to be defined, that was the idea of the deprecation. I thought it was accepted already...

See:

"Whereas the CRM regards that intervals of primitive values are primitive values by themselves, there is currently no corresponding practice in RDF. Therefore, in analogy to the properties of E52 Time-Span, we define in CRM RDFS two more subproperties of P90 has value: “P90a_has_lower_value_limit” and “P90b_has_upper_value_limit”. Even if we regard complex matrices of numbers as one value for an instance of E54 Dimension, such as RGB images, we can argue that minimal and maximal values exist as two separate matrices of the same structure. The precise guidelines for using these properties are given in the section “Guidelines for using P90a, P90, P90b” below."

"

Guidelines for using P90a, P90, P90b

The CRM recommends to approximate numerical values of Dimensions with intervals. The range of the respective property "P90 has value" is defined in the CRM as E60 Number. Whereas the CRM regards that intervals of primitive values are primitive values by themselves, there is currently no corresponding practice in RDF. Therefore, in analogy to the properties of E52 Time-Span, we define in CRM RDFS two more subproperties of P90 has value: “P90a_has_lower_value_limit” and “P90b_has_upper_value_limit”.

The reasons for recommending this approximation are the following: All scientific measurements of non-discrete values are imprecise because of the tolerances of the measurement devices, shortcomings in applying the procedures and the indeterminacy of the measured effect itself. In natural sciences, important results of measurements are associated with possibly complex probabilistic distributions for the true value of the measured effect.

The most complex case relevant for cultural-historical data are the so-called “battleship curves” for calibrated C14 dating data. Many of these distribution models actually extend to infinity with non-zero probability, which is neither practical nor always justified. In the case of C14 however, the actual width of the distribution is often underestimated. Nevertheless, even data with a given probabilistic uncertainty to infinity are typically associated by scientists with narrower “confidence intervals” at one to three “standard deviations”, i.e., with a probability of some 68% – 99.7% for the value to be in the given range (https://en.wikipedia.org/wiki/Standard_deviation).

Whereas querying globally a very large aggregation of cultural-historical data by time intervals is highly relevant, querying and reasoning with different approximations of dimensions is normally restricted to quite narrow questions. For many cases, a medium value without explicit limits is sufficient for the application, such as the length of a museum object in millimeters for packaging it in a box. Nevertheless, querying explicit representation of actual outer limits or at least reasonably wide confidence intervals is computationally highly effective, and therefore a good way to ensure recall at query time, i.e., that the relevant results are contained in the answer to the query, even if it also contains irrelevant ones.

We therefore recommend to use P90_has_value for documenting a medium value or a value without error estimates, when the precision appears to be self-evident or irrelevant.

We recommend to use P90a_has_lower_value_limit for documenting the highest explicit lower limit available for the respective value, even if it provides very wide margins. It is an error to omit the lower limit even if it appears to be overly pessimistic.

We recommend to use P90b_has_upper_value_limit for documenting the lowest explicit upper limit available for the respective, even if it provides very wide margins. It is an error to omit the upper limit even if it appears to be overly pessimistic.

In case of approximating probabilistic distributions, we recommend to keep lower and upper limit at two standard deviations or enclosing the true value with 95% probability.

P90a_has_lower_value_limit should always be used together with P90b_has_upper_value_limit. If they are used, the property P90_has_value may be used as well or be omitted."
 

Reference to Issues:

Meetings discussed: