Issue 367: E13 Attribute Assignment

Starting Date: 
2018-02-13
Working Group: 
3
Status: 
Open
Background: 

posted by Martin on 13/2/2018

Dear All,

The scope note of E13 must be updated:

A) the property type it refers to should be described by P2 has type of the E13 instance. Then it is isomorphic with an RDF reification statement.

B) The epistemology should be described more precisely: It describes that the maintainers of the knowledge base are not directly responsible for the validity of the statement.

Current Proposal: 

posted by Robert Sanderson on 13/2/2018

Dear Martin, all,

 

If the scope note of E13 will assert that P2 should reference (somehow) the property, how will this interact with subclasses of E13?

Or will those subclasses be deprecated in favor of this more consistent, broader pattern?

 

 

Posted by Martin on 13/2/2018

On 2/13/2018 7:58 PM, Robert Sanderson wrote:
>
>  
>
> Dear Martin, all,
>
>  
>
> If the scope note of E13 will assert that P2 should reference (somehow) the property, how will this interact with subclasses of E13?
>
> Or will those subclasses be deprecated in favor of this more consistent, broader pattern?
The 4 subclasses are all associated with shortcuts that fix the implied the property. There is no reason to deprecate them, because they elaborate additional features. We could think of "default types" for these subclasses, and/or of a subproperty of P2 to denote the reified property type in E13.

Posted by Martin on 16/3/2018

Dear All,
 
Here the old scope note:
E13 Attribute Assignment
 
Subclass of:         E7 Activity
 
Superclass of:      E14 Condition Assessment
 
E15 Identifier Assignment
 
E16 Measurement
 
E17 Type Assignment
 
Scope note:         This class comprises the actions of making assertions about properties of an object or any relation between two items or concepts.
 
 This class allows the documentation of how the respective assignment came about, and whose opinion it was. All the attributes or properties assigned in such an action can also be seen as directly attached to the respective item or concept, possibly as a collection of contradictory values. All cases of properties in this model that are also described indirectly through an action are characterised as "short cuts" of this action. This redundant modelling of two alternative views is preferred because many implementations may have good reasons to model either the action or the short cut, and the relation between both alternatives can be captured by simple rules.
 
In particular, the class describes the actions of people making propositions and statements during certain museum procedures, e.g. the person and date when a condition statement was made, an identifier was assigned, the museum object was measured, etc. Which kinds of such assignments and statements need to be documented explicitly in structures of a schema rather than free text, depends on if this information should be accessible by structured queries.
=====================================================================
Here my new proposed scope note:
 
E13 Attribute Assignment
 
Subclass of:         E7 Activity
 
Superclass of:      E14 Condition Assessment
 
E15 Identifier Assignment
 
E16 Measurement
 
E17 Type Assignment
 
Scope note:         This class comprises the actions of making assertions about properties of an object or any relation between two items or concepts. The type of the property asserted to hold between two items or concepts can be described by the property P2 has type.
 
 This class allows for the documentation of how the respective assignment came about, and whose opinion it was. Note that all instances of properties described in a knowledge base are the opinion of someone. Per default, they are the opinion of the team maintaining the knowledge base. This fact must not individually be registered for all  instances of properties provided by the maintaining team, because it would result in an endless recursion of whose opinion was the description of an opinion. Therefore the use of E13 Attribute Assignment marks the fact, that the maintaining team is in general neutral to the validity of the respective assertion, but registers another ones opinion and how it came about.
 
All properties assigned in such an action can also be seen as directly relating the respective pair of items or concepts. Multiple use of E13 Attribute Assignment may possibly lead to a collection of contradictory values. All cases of properties in this model that are also described indirectly through a subclass of E13 Attribute Assignment  are characterised as "short cuts" of a path via this subclass. This redundant modelling of two alternative views is preferred because many implementations may have good reasons to model either the action of assertion or the short cut, and the relation between both alternative can be captured by simple rules.
 
 In particular, the class describes the actions of people making propositions and statements during certain museum procedures, e.g. the person and date when a condition statement was made, an identifier was assigned, the museum object was measured, etc. Which kinds of such assignments and statements need to be documented explicitly in structures of a schema rather than free text, depends on if this information should be accessible by structured queries. 

Posted by Robert Sanderson on 19/3/2018

Thank you Martin for the addition to the scope note regarding P2.
Just to clarify, the easiest way to refer to a relationship defined by the CRM is via the URI of that relationship.
Thus I assume it is okay to do this:
 
_:aa a E13_Attribute_Assignment ;
 
  P2_has_type <crm:P14_carried_out_by> ;
 
  P141_assigned <ulan:Rembrandt> ;
 
  P140_assigned_attribute_to _:production_of_painting .
 
Asserting that the production of the painting activity was carried out by Rembrandt.

Posted by Martin on 19/3/2018

On 3/19/2018 7:34 PM, Robert Sanderson wrote:
>
>  
>
> Thank you Martin for the addition to the scope note regarding P2.
>
> Just to clarify, the easiest way to refer to a relationship defined by the CRM is via the URI of that relationship.
>
>
> Thus I assume it is okay to do this:
>
> _:aa a E13_Attribute_Assignment ;
>
>   P2_has_type <crm:P14_carried_out_by> ;
>
>   P141_assigned <ulan:Rembrandt> ;
>
>   P140_assigned_attribute_to _:production_of_painting .
>
> Asserting that the production of the painting activity was carried out by Rembrandt.
 
Yes !

Posted by Fraeutli 20/3/2018

Dear Martin,
 
many thanks for this! I would change, or remove, this part
 
"[...] marks the fact, that the maintaining team is in general neutral to the validity of the respective assertion [...]"
 
We see a good use-case for E13 in recording information that is wrong, or information that once used to be thought correct. For example, an artefact that was once thought to have been produced by Person A, but later it emerged that it was made by Person B. In such cases, we want to record the first piece of information using E13, along with its source, to indicate that we are aware of it and to allow people to find it even when they search based on outdated knowledge. We as the maintaining team are therefore not neutral to the validity of the assertion.
 

Posted by Martin on 23/3/2018

Dear Florian,
 
This is what I meant by "in general".
 
I propose to reformulate:
 
Therefore the use of E13 Attribute Assignment marks the fact, that the maintaining team is either neutral to the validity of the respective assertion or has another opinion about it, but registers another ones opinion and how it came about.
 

Posted by Øyvind on 24/3/2018

 
> Am 23.03.2018 um 20:26 schrieb Martin Doerr <martin@ics.forth.gr>:
>
> Dear Florian,
>
> This is what I meant by "in general".
>
> I propose to reformulate:
>
> Therefore the use of E13 Attribute Assignment marks the fact, that the maintaining team is either neutral to the validity of the respective assertion or has another opinion about it, but registers another ones opinion and how it came about.
 
Therefore the use of E13 Attribute Assignment makes the point that the maintaining team is either neutral to the validity of the respective assertion or has another opinion about it. What they register is somebody else's opinion and how it came about.
 

Posted by Maximilian Schich on 24/3/2018

 
Dear Florian and all,
 
Based on quantitative evidence, I'd object to the following to part of your suggestion:
 
"This fact must not individually be registered for all instances of properties provided by the maintaining team, because it would result in an endless recursion of whose opinion was the description of an opinion."
 
=> This would only be correct if the maintaining team would add additional E13 Attribute Assignments to their own E13 statements. Otherwise, in practice, the data would (a) more or less double, plus (b) a non-exploding truncated tail of additional E13 correction statements, where the maintaining team corrects itself.
 
=> Example for (a): In large data sets such as the "Census of Antique Works of Art and Architecture" the "record history" approximately doubles the data set as a whole. Note: The Census "record history" is the place where the maintaining team records their own E13-like attribute assertions (aka assertions of database record authorship). It is important to point out that the record history, where an internal database curator implicitly claims authorship for say an artist attribution in the Census, is conceptually in no way different from an external author providing a differing opinion (both usually have PhDs in art history). Ergo there are two default cases: (1) The internal database curator claims authorship for a direct assertion via a single E13 Attribute assignment in the record history; (2) The internal database curator claims authorship for a cited assertion via an E13 attribute assignment in the record history on top of the original assertion that connects the stated opinion to its external source via another E13 attribute assignment. 
 
=> Example for (b): In large data sets where the multiplicity of opinion is recorded, the number of competing assertions including both record history and external opinions, is usually characterized by a tailed frequency distribution*. This usually means in practice that the data set stays in the same order of magnitude relative to the case where the maintaining team decides to follow one of the alternative assertions.**
* The frequency distributions would look similar to Schich 2010 "Revealing Matrices" Fig. 14-8. Indeed, my pre-publication version of this figure had a column for the record history, not included in the article, as the networks were too large for the preceding figure.
** Yes, we should expect some "assertion cascades" to be exceedingly large, but we can also expect the median cascade length being very short, between 1 and 2 in cultural heritage databases based on personal experience, and still short in very large scale cases, such as spreading rumors on the Web (cf. Friggeri et al. 2014 "Rumour cascades" Fig. 5).
 
=> The recommendation, in my opinion, should be: By default, the maintaining team should establish authorship by adding an E13 Attribute Assignment to each assertion in the data set. Yet, the maintaining team should only add an E13 Attribute Assignment to their own E13 Attribute Assignments in the case of discernible modifications, updates, or corrections. To avoid comment cascades, such alternative E13 statements should be done in parallel(!) not recursively.*** This recommended procedure establishes a record history and granular ability to cite data set contributions by author, yet also avoids a recursive explosion of E13 statements.
*** Parallel, means E13 statements in the internal record history should never be about statements in the record history itself. This can easily be maintained with users being logged in or recorded via IP and timestamp. Working example: The Wikipedia edit history.
 
 
Hope this makes sense.

Posted by Martin on 24/3/2018

Perfect!

Posted by Martin on 24/3/2018

Dear Maximilian,
 
This makes sense to me, but I do not agree with your recommendation as a general rule.
 
There is a fundamental epistemological problem, which has nothing to do with quantitative evidence. The latter,
by the way, cannot detect an endless recursion anyhow, because people would break it.
 
The ramifications of this breaking are huge, as can be seen by your answer.
 
Let us start with a more fundamental construct, a simple CRM-compatible "knowledge graph" with one attribute:
 
"Martin" has residence "Heraklion". 
 
Using an E13, 
"Martin" performed "Attr.Ass.512". has type: "has residence"
                                                                assigned: "Heraklion"
                                                                assigned to: "Martin"
now reading it, I know the knowledge graph wants to make me believe who said "has residence", but I do not know, who introduced these three more attributes.
So, I reify the three new attributes with 9 more, and I am still not wiser, nor will I be with any other iteration of it.
 
If I know that the knowledge graph was produced by Martin as a trusted source as a whole, I do not need the E13 in it.
 
Then, I can add metadata to the whole knowledge graph, e.g., as a Named Graph or "context" or on paper etc. , but I am
still in the same situation: who produced these metadata, are they trusted?
 
Hence, I conclude three things:
 
a) There is no completely self-descriptive information. The trusted source ("sender of the message" in Claude Shannon's sense) lies outside the information unit. It must always be the default. In order to characterize the default, we need semantics  different from E13.
 
b) It makes no sense to describe the default in the graph itself.
 
c) Any description within a set of information about its provenance pushes the level where the default applies up to the next source of source. Hence, if a team decides to register actions of their members, the team as a whole pushes the default up to the trust in the registration, rather than in the primarily registered. I see all you examples as practices of this kind. There may be many reasons to do this, but in other cases also not to do it.
 
Such a rule cannot replace understanding the basic epistemology, which is always the same.
 
Does that make sense?
 

Posted by Maximilian Schich on 24/3/2018

Dear Martin,
 
My "recommendation" was just putting into question an aspect of Florian's suggestion, and not meant to replace it in a final way.
 
Regarding your points: The practical cases I am familiar with would use the E13 on the whole triple, i.e. the link/property-type including a specific source node and a particular target node. This means either the triple is stored as a quad, or the triple carries an ID or address, so one can refer to it. TEI standoff markup would be another practical example.
 
As an art historian/archaeologist and hopeless class-conceptualist, I do not believe in trusted sources. Everything comes with a probability. :)
 
a) Self-description is of course never perfect, yet depends on the density of information: A signature, as in "Martin performed Attr.Ass.512" or "A[lbrecht] D[ürer] fecit" is only one form of (self-)descriptive information, which is as good or bad as anything else, internal or external. Of course, it is better to see Dürer in detail or to hear Anne-Sophie Mutter actually play, rather than relying on a verbal statement of attribution.
 
b) I don't understand: Any graph-like description of a graph constitutes a forest of graphs with the original graph, i.e. a disconnected graph that contains the description of itself. If we generalize that statement to symbolic representation, you are in essence saying description is impossible.
 
c) I think in most cases "description within a set of information about its provenance" is the only thing we have. There is no default up the next source of source. Evolutionary biologists, material scientists, art historians working on renaissance drawings, and scholars of ancient manuscripts all rely on hysteresis, i.e. history of the object contained within the object. There never was a comprehensive DNS for organisms, manuscript fragments, or paintings, and there never will be. For the same reason we need to embed provenance in our data sets. Probably we should even block-chain it in with enough information, so we don't have to rely on simple signatures.
 
To make my case much more simple and short: "All of Wikipedia includes the full edit-history". This is how it is produced, and how it should be analyzed. The same standard should apply to any cultural heritage data set. Any other practice would be like citing monographs without pagination. This is why E13 is really central, particularly in multi-authored data sets. 

Posted by Martin on 27/3/2018

Dear Maximilian,
 
I had been a bit shorthand.
 
I referred to a knowledge graph theoretically, as CRM instance, without a quad feature, or looking at it as being a Named Graph. Of course current systems have these features built in.
 
The reason being, that only separating the Graph from the data about the Graph, be it quad or Named Graph or on paper, we can understand the logical problem.
 
If we look at the "metagraph" that is about a graph, it is again a simple graph. It does not have a provenance. Its provenance
could be another graph about this graph, and we are still at the same point.
 
This means, that ultimately, we have to rely on curators of information, that mediate trust in it. By "trusted source",
I do not mean a logical "TRUE", I mean that someone gives me information he himself trusts because of further knowledge,
and I trust that it is his best knowledge, regardless whether it turns out to be wrong later.
 
We may talk about degrees of trust, and plausibility, but that are only variant of trust. Trust is not a statistical probability.
If I get an image of a Duerer, I have no bl.. idea if it is a fake or not.  I would not assume that it is a fake, if I can trace its provenance to a museum. I would check the Website, if it suspicious or not.
 
The epistemological chain will go back to primary evidence. I assume that the museum knows how to connect the image to the original paper, that the paper has a credible chain of owners back to Duerer, or has been examined by analytical methods to be genuine.  Arguing about my image, I imply all this good practice was applied. Even if this is all described in the document, without the connection to the human curator there is no knowledge in it.
 
So, point b) means, the solution to the descriptive chain is that at the end there is a trusted source, i.e. a source I have no good reason to question, which may make errors, but follow a good practice of knowledge creation.
 
Your point c) below looks only at one level of metadata. If, e.g., I have an old physical book from 15th century in a library, I would rely on the author information and date in it. But I I would simultaneously rely on the physical book exhibiting features of that are compatible with the age and I would rely on the library curators not having smuggled in fakes over time. I would compare with other copies etc. Even if a fake is detected, I would rely on this human/material provenance to hypothesize how a fake could have come in. It will still be a form of (less)trusted source.
 
So, my point is, the trusted source can be directly responsible for the content. It may only be responsible for the metadata, or the metametadata, etc.
 
Each indirection is expected to rely on good practice and hence provide trust in the level below, i.e.,
the metametadescription about the metadescription, the metadescription about the description. Each level should describe where its confidence comes from.
 
At the end, there is always a (living) human being connecting data with the real world. In our information systems, we must keep that curation chain. No other information can save us from fake news, once they can be spread and multiplied without limits.
 
OR not?
 

In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the sig reviewed the proposed scope note  of E13 Attribute Assignment and made changes to the scope note. Then the sig combing the result of this discussion with the Sanderson's proposal on 10/4/2017 in the frame of  issue 340, reviewed the subclasses of E13 and decided to delete the classes: E38 (because there is no distinction with E36), E40(unnecessary leaf node), E44, E45, E46, E47, E48, E49, E50, E51 (for these last 8 classes, we may just use the E41 Appellation) . The revised  scope note of E13 is here.

Also, the sig  decided (1)  to add a subproperty of P2 that will allow pointing to the CRM property list exclusively (thus leaving P2 to do its work of typing the activity itself). E.g. assigned relation / concerned property  E55 (CRM Properties) and (2)  to open a new Issue about best practice on epistemology of the knowledge base itself, where to stop documenting the provenance. Can begin from the email exchange (issue 382)

Lyon, May 2018

 

Meetings discussed: