Issue 349: Belief Values
Posted by martin on 3/10/2017
Dear All,
Following a request from Dominic how to deal with uncertain associations, such as "probably author of" I'd like to discuss a solution expanding properties
with the "Property Class" PC and adding a "certainty value" as a ".2" property for all those cases in which the belief is the one of the maintainers of the knowledge base, in contrast to an explicit inference by a particular actor.
Posted by Robert on 3/10/2017
We have dealt with this situation by using AttributeAssignment, as in RDF the .1 (and .2) properties would require reification anyway.
It can also cover “workshop of” or “style of” style attributions which are often uncertainty about the individual.
We resisted trying to quantify uncertainty, as from an interoperability viewpoint, there’s very little to be gained from saying that one person is 5/10 sure of an assertion whereas someone else is 4/10 certain… the temptation is to use the strength of belief as an indicator of likelihood of truth, rather than the state of mind of the asserting agent. The first would be useful but impossible, we consider the second not to be useful for interoperability between public systems.
(Which is not to say it’s not valuable, just not in our scope of work)
Posted by Martin on 3/10/2017
Dear Robert,
In the discussion about co-reference statements and deductions from shortcuts we have understood that reification via Attribute Assignment is a wrong method to extend properties, because it confuses agency of belief when the maintainer of the database describes himself as making an attribute assigment, and in other cases not.
Therefore, I propose the PC construct.
Secondly, I absolutely do agree and do not propose a quantification of belief. There are non-quantitative forms of logic dealing with belief values other than true-false, such as "possible". We can think of other measures of supporting evidence.
WRT to interoperability there is no problem as long as a recall-precision hierarchy is preserved. As long as "true" implies "possible", we get what we need (i.e., querying for possible returns also true, only querying for possible and not true would return only possible).
Posted by Franco on 4/10/2017
Dear all
the issue is extensively discussed in this paper:
Niccolucci, F. & Hermon, S. Expressing reliability with CIDOC CRM, Int J Digit Libr (2016). https://doi.org/10.1007/s00799-016-0195-1
I can send a draft copy to those interested - but not broadcast it for copyright reasons.
Shortly, the idea is to consider the assessment of the assignment as an E14 Measurement, which measures a dimension, the uncertainty or better the reliability of this assignment. The outcome E60 Number of this measurement can be anything: a number, a function, an ordinal value. It is linked to the dimension by P90 has value. We were actually proposing a numeric approach and that’s why we end up with a number.
I tend to disagree with Robert’s statement that quantification is in this case useless for public systems. In my opinion it is instead paramount for data reuse, as the stars in Booking.com reviews are paramount to choose an hotel. It doesn’t matter if the statement “Martin Doerr is an alien from Saturn” has reliability 0.000001 for you and 0.1 for me; people who know you and me can draw conclusions exactly because they know you and me. This, regardless the truth of the statement, which every SIG member knows to be true
Perhaps the explanation of the “subjective" approach to this quantification may provide additional insight. references 7 and 8 in the paper explain this approach in a quite difficult and complicate way, that’s why I quote them.
The paper also addresses how this compares to the CRMinf approach and I6 Belief value. If on this regard something changed in CRMinf after early 2016, it is of course not taken into account.
Finally, there are provision to document who said that, why, and where it is documented.
In the 44th joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 37th FRBR - CIDOC CRM Harmonization meeting, TV mentioned that the degree of confidence in the data is an issue that comes up a lot in conservation studies; hence, he’d like to try summarizing the major discussion points on certainty values –see if a guidelines document could be drafted on the basis of these discussion points. This effort could be part of the work in view of the Linked Conservation Data Workshop, September 2019.
DECISION: The sig assigned TV to see how the discussion in issue 349 carries over to the conservation domain and try to put together a guidelines document. FB has volunteered to collaborate in this effort.
Paris, June 2019
Posted by Thanasis Velios on 19/10/2019
Hello,
I was asked to summarise the discussion about uncertainty with the aim to produce some guidelines on how to deal with it. Francesco has not had a chance to review this as I am sending it the last minute, but Nicola (who has also been considering the problem) had a quick look at it already.
Please find the document here:
https://docs.google.com/document/d/17-VzM8RKtrKJappL3KzTPWz_T_rU7T05Qx1W...
The arguments for or against numerical values of confidence are yet to be included.
Posted by Martin on 19/10/2019
Dear Thanasis, All,
This is a good summary, but I would like to point out, that there is a fundamental distinction between ontology and uncertainty. Uncertainty is a question of knowing (epistemology), and not being (ontology) and pertains to all facts being. Therefore a solution must be generic and independent from the ontology. Another approach to have properties of uncertainty from CRM Entity and Primitive Value, and create combinations of these with the ontological ones. It is equivalent to a .2 property, but computationally efficient, and standard RDF. It only blows up the RDFS, but that can be generated automatically.
E13 introduces an additional agency, and is computationally heavy, as all reification-equivalent constructs.
In the 45th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 38th FRBR – CIDOC CRM Harmonization meeting, the sig reviewed the HW by TV on recording uncertainty (i.e. degree of confidence on data), where 4 alternatives were presented:
- modelling uncertainty as a .2 property;
- through the use of a property of assigning types to attribute assignments, which can in their turn have some ordinal value;
- treating Reliability as a subclass of E16 Measurement and assigning it a confidence measurement through P40_has_value –the activity of assessing the reliability of a statement is treated as a subclass of E13 Attribute Assignment.
- use of properties/classes from CRMinf, namely J4_that and J5_holds_to_be together with I4 Proposition set to assign belief values (I6) to statements obtained through observations
It was suggested that instead of adding property types, maybe the scope note of E54 Dimension should be updated to reflect that the values recovered by means of measurements are, in fact, approximations of the (observable) entities being measured.
DECISION: a new issue is to be formed regarding the scope note of E54 Dimension –changes must include removing of the phrase: An instance of E54 Dimension represents the true quantity, independent from its numerical approximation, e.g. in inches or in cm. A clause stating that error margins on dimensions may sometimes not be relevant can be added to the definition. The issue is closed without additional properties/property types.
The issue closed
Heraklion, October 2019
In the 46th joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 39th FRBR - CIDOC CRM Harmonization meeting; The sig resolved to reopen the issue, as proposed by TV. It was mistakenly closed, it had drifted to the approximation of dimensions, but was originally about uncertainty of statements. As the sig still hasn’t provided guidelines on how to express uncertainty of statements, the issue will remain open.
In the 58th CIDOC CRM SIG & 51st FRBR/LRMoo SIG Meeting, PF gave a summary of the discussions that concern the issue, see here for the slide deck of his presentation.
Decision --How to proceed:
- Determine the extent to which the solutions that make use of E13 Attribute Assignment, R1 Reliability Assessment, or CRMinf constructs are fit to represent the use cases identified.
- To this end, SdS, DO, AG & MA will be sharing use-cases with PF, specifically:
- SDS to look for a former National Monuments’ record, which has free-text data (statements assigned to particular authors, including statements about the plausibility of the original statements –for instance: X said Y, but I don’t believe him/it) AND a Swedish dataset that involves ongoing work
- DO to provide conservation data from the National Archives about inconclusive analyses can be of use, but the probability statements are not for individual assertions
- AG & MA to confer with PF in looking for contingencies with Risk Assessment
For the details of the discussion see here.