Guidelines for using P90a, P90, P90b

Detailed question: 

Dimension intervals

The CRM recommends to approximate numerical values of Dimensions with intervals. The range of the respective property "P90 has value" is defined in the CRM as E60 Number. Whereas the CRM regards that intervals of primitive values are primitive values by themselves, there is currently no corresponding practice in RDF. Therefore, in analogy to the properties of E52 Time-Span, we define in CRM RDFS two more subproperties of P90 has value: “P90a_has_lower_value_limit” and “P90b_has_upper_value_limit”.

The reasons for recommending this approximation are the following: All scientific measurements of non-discrete values are imprecise because of the tolerances of the measurement devices, shortcomings in applying the procedures and the indeterminacy of the measured effect itself. In natural sciences, important results of measurements are associated with possibly complex probabilistic distributions for the true value of the measured effect.

The most complex case relevant for cultural-historical data are the so-called “battleship curves” for calibrated C14 dating data. Many of these distribution models actually extend to infinity with non-zero probability, which is neither practical nor always justified. In the case of C14 however, the actual width of the distribution is often underestimated. Nevertheless, even data with a given probabilistic uncertainty to infinity are typically associated by scientists with narrower “confidence intervals” at one to three “standard deviations”, i.e., with a probability of some 68% – 99.7% for the value to be in the given range (https://en.wikipedia.org/wiki/Standard_deviation).

Whereas querying globally a very large aggregation of cultural-historical data by time intervals is highly relevant, querying and reasoning with different approximations of dimensions is normally restricted to quite narrow questions. For many cases, a medium value without explicit limits is sufficient for the application, such as the length of a museum object in millimeters for packaging it in a box. Nevertheless, querying explicit representation of actual outer limits or at least reasonably wide confidence intervals is computationally highly effective, and therefore a good way to ensure recall at query time, i.e., that the relevant results are contained in the answer to the query, even if it also contains irrelevant ones.

We therefore recommend to use P90_has_value for documenting a medium value or a value without error estimates, when the precision appears to be self-evident or irrelevant.

We recommend to use P90a_has_lower_value_limit for documenting the highest explicit lower limit available for the respective value, even if it provides very wide margins. It is an error to omit the lower limit even if it appears to be overly pessimistic.

We recommend to use P90b_has_upper_value_limit for documenting the lowest explicit upper limit available for the respective, even if it provides very wide margins. It is an error to omit the upper limit even if it appears to be overly pessimistic.

In case of approximating probabilistic distributions, we recommend to keep lower and upper limit at two standard deviations or enclosing the true value with 95% probability.

P90a_has_lower_value_limit should always be used together with P90b_has_upper_value_limit. If they are used, the property P90_has_value may be used as well or be omitted."

Entities-Properties per Version: 
Reference to Cidoc Version: 
Post date : 
2019-10-15
Authors: 
Martin Doerr
Type: 
Best Practices