Issue 383: 'has content' property

ID: 
383
Starting Date: 
2018-05-22
Working Group: 
3
Status: 
Done
Closing Date: 
2020-06-26
Background: 

In the 41st joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 34th FRBR - CIDOC CRM Harmonization meeting, the sig resolving the issue 363, decided to open a new issue about the definition of a new property of E90 for  capturing the  the actual content of a symbolic object. This property should  be modelled on the R33 property of FRBRoo. HW to MD for formulation of this property

Lyon, May 2018

Current Proposal: 

Posted by Martin on 6/11/2018

I had sent the below as new issue, but it is indeed the answer to Issue 383.

The question is, how to deal with a file, which is more specific in content, such as an MS Word, but represents the character sequence that defines the content of the respective E90. Is is "is incorporated in", or a subproperty of it?

On 9/19/2018 11:09 PM, Martin Doerr wrote:
> Here my scope note:
>
> Pxxx has symbolic content
>
> Domain:             E90 Symbolic Object
>
> Range:                E62 String
>
> Quantification:    many to many (0,n:0,n) ??
In CRM RDFS   subproperty of: rdfs:value
>

>
> Scope note:         This property associates  an instance of E90 Symbolic Object with a complete, identifying representation of its content in the form of an instance of E62 String. This property only applies to instances of E90 Symbolic Object that can be represented completely in this form. The representation may be more specific than the symbolic level defining the identity condition of the represented. This depends on the type of the symbolic object represented. For instance, if a name has type "Modern Greek character sequence", it may be represented in a loss-free Latin transcription, meaning however the sequence of Greek letters. As another example, if the represented object has type "English words sequence", American English or British English spelling variants may be chosen to represent the English word "colour" without defining a different symbolic object. If a name has type "European traditional name", no particular string may define its content.
>

>
> Examples:         
>
>
> * The materials description (E33) of the painting (E22)  _has symbolic content_ “Oil, French Watercolors on Paper, Graphite and Ink on Canvas, with an Oak frame.”
>
> * The title (E35) of Einstein’s 1915 text (E73) _has symbolic content_ “Relativity, the Special and the General Theory“
>
> * The story of Little Red Riding Hood (E33) _has symbolic content_ “Once upon a time there lived in a certain village …”
> * The inscription (E34) on Rijksmuseum object SK-A-1601 (E22) _has symbolic content_ “B”
>

Posted by Robert Sanderson on 6/11/2018

Thank you for pushing this forward, Martin!

 

Quantification wise, I would be in favor of 0,1 : 0,1.

 

If the structure of the set of symbols changed, then it would be a different symbolic object according to my understanding of E90:

 

>  … identifiable symbols and any aggregation of symbols …  that have an objectively recognizable structure and

that are documented as single units.

Similarly, if the same string was used by different Symbolic Objects, then it seems like they would actually be the same symbolic object (or you would instead use two strings with the same data).

(And in the RDF projection this makes no difference, as literal values do not have their own separate identity)

 

For the examples, I would replace the Little Red Riding Hood example with one that is complete, to avoid confusion with the scope note requirement of being represented completely.

How about:

>  The Accession Number (E42) of the J. Paul Getty Museum’s “Abduction of Europa” (E22) _has symbolic content_ “95.PB.7“

 

And for the file question, do you mean that the symbolic object is the MS Word file, which has a representable set of (binary) symbols, or that the symbolic object is text which is incorporated within the file, but not verbatim (as the characters in the (e.g.) paragraph are likely to be represented in the file using very a different structure).

 

Posted by Martin on 9/11/2018

Dear Robert,

On 11/6/2018 9:00 PM, Robert Sanderson wrote:

> Thank you for pushing this forward, Martin!

> Quantification wise, I would be in favor of 0,1 : 0,1.
I prefer 0,1:0,n or 0,n:0,n

> If the structure of the set of symbols changed, then it would be a different symbolic object according to my understanding of E90:

> >  … identifiable symbols and any aggregation of symbols …  that have an objectively recognizable structure and
>
> that are documented as single units.
Correct. The question is, if we encounter different representations, for instance one giving a text "hello world" in Latin 1, and another in ASCII, but the E90 instance is of type Latin characters only, or if you write my name DOERR or DÖRR, both regarded by German authorities as identical variants representing the "Umlaut" OE or Ö.  Of course, in that case, having both representations would be redundant. In that case, 0:n is more tolerant.
Another opinion being, that one string is enough to define the E90. Then, 0,1.

> Similarly, if the same string was used by different Symbolic Objects, then it seems like they would actually be the same symbolic object (or you would instead use two strings with the same data).
This is a long debated question. In most cases, this appears as reasonable, but we do have cases in which the identity of the E90, seen as a message in the sense of Claude Shannon, is bound to the "sender". Discussing the sense of E35 Title, it appears that we cannot take the identity of the Title detached from the thing it was given to. This creates a precedent for the latter interpretation.

As a general principle, a 1:1 dependency is a thing subject to the suspicion of a hidden identity. To be on the safe side, I would rather not identify the E90 with the content model.

Two strings with the same data to be different is a (good) implementation choice of RDF, which assigns the identity to the link rather to the string, exactly in order to distinguish where the message comes from. If two strings with the same data are regarded as different, then we have actually a 0,x:0,n model in the ontology.
>
> (And in the RDF projection this makes no difference, as literal values do not have their own separate identity)

> For the examples, I would replace the Little Red Riding Hood example with one that is complete, to avoid confusion with the scope note requirement of being represented completely.
>
> How about:

> >  The Accession Number (E42) of the J. Paul Getty Museum’s “Abduction of Europa” (E22) _has symbolic content_ “95.PB.7“
Good!

> And for the file question, do you mean that the symbolic object is the MS Word file, which has a representable set of (binary) symbols,
No
>
> or that the symbolic object is text which is incorporated within the file, but not verbatim (as the characters in the (e.g.) paragraph are likely to be represented in the file using very a different structure).
 

Posted by Martin on 15/11/2018

Dear All,

Continuing the question from my last message below:

Very large strings one would normally describe in a file and instantiate E90 Symbolic Object or a subclass of it with the URL. However, the question is, if the URL would indeed be a good persistent identifier, since the URL stands for a physical location, albeit indirectly addressed. The Linked Open Data community has not yet given satisfactory answers for the long term validity of resolvable URIs. If the URL is not a good identifier, another, primary URI should be chosen, and the content found in the URL should be related to the primary URI as a representative of the content of the symbolic object identified with the primary URI.

I would like to discuss a new property,

PXXX has content representation
domain: E90 Symbolic Object
range:     E90 Symbolic Object

Tentative scope note:
Scope note: This property associates an instance of E90 Symbolic Object with another instance of E90 Symbolic Object (or any of its subclasses) that represents completely the content of the former identically concerning the the symbol set  in which the former is defined and nothing more. For instance, a text of Aristotle may be defined in terms of the ancient Greek alphabet, paragraphs and section titles, but the representing object may use some type phases and page layout. Metadata in the range instance are not regarded as part of the content.

Problem:
What about introductions, foot notes etc.?

Can someone make a scenario with a real canonical instance of a text of Aristotle or Platon, with indexed phrases, and propose how the text itself should be identified, possibly independent from spelling variants?

Another case: I submit to Springer a paper in .doc and they create a pdf, and a Journal image. How do we define "my paper" regardless these embodiments??

In the worst case, we would need yet another node in order to specify the part of the file that is the defining text.

Further, P165 incorporates is from information object to symbolic object, hence not compatible.

Another argument being, that an ontological link from E90 to E90 doesn't make sense. If the target should be a URL, we may regard this as an implementation level question.

In the 42nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 35th FRBR - CIDOC CRM Harmonization meeting,  the crm-sig has discussed MD’s proposal regarding defining a new property of E90 Symbolic Object, such that it captures the actual content of a symbolic object and has accepted it as is .(Issue 395).

HW: The crm-sig has assigned GB, NC and RS to come up with solutions accounting for both cases of linking an instance of E90 Symbolic Object to other instances of E90 Symbolic Object composing it, as well as for cases where the same instance of E90 Symbolic Object is conveyed through different means/encodings –i.e. things that might be considered as the equivalent to ‘spelling variants’.

Berlin, November 2018

Posted by Robert Sanderson on 23/2/2019

Fellows, shall we discuss next Friday when we continue the work on Dig?

My current feeling is close to Martin’s final question – that this doesn’t actually make sense for Symbolic Object directly. It would result in a 2^n style relationship where every expression of the content was related to all of the others.  It also comes dangerously close to FRBR and LRM.

 

Posted by George Bruseker on 20/10/2019

Dear all,

I have to admit that I no longer feel the functional context of what we were trying to resolve here after the p190 has symbolic content issue was addressed.

If I reconstruct correctly in my mind, it would have to do something with functional symbolic equivalents and the indication thereof? So I have the two versions of Martins' name Doerr and Doër or I have an image file showing the same visual content but encoded in jpg or tif? Is this it? Both can and do act as the same symbols but are different encodings of it?

If that's it, are we in the right area of the existing modelling if we think with regards to a sub property of p130 shows features of, if we were to specialize its  domain and range to Symbolic Object? We could call this property something like 'has representational equivalent' (the symbols are representing here, right? This is why we are asserting they are the same?). If this worked then we might also take advantage of a child .1 property which would be kind of functional equivalence whereby you could put 'alternate encoding' or whatever other modes of equivalence you might imagine. Or possibly just attaching a type to each instance of symbolic object would show their kind and therefore their relative equivalence?

Anyhow, here is the reference to my proposed potential property pattern/reference/super property:

http://www.cidoc-crm.org/Property/P130-shows-features-of/Version-6.2.1

Nicola, Rob, is this solving a problem or just making more?
 

posted by Robert Sanderson on 21/10/2019

That also fits with my recollection.

It seems like a peer of P73 has_translation, which has P130 as its super-property … so I would agree with your conclusion.  P73 is the same conceptual content in a different language, Pxxx would be that the symbols are intellectually equivalent but are encoded differently.

 

Posted by Robert Sanderson on 12/2/2020

I think that we could close it.  Is anyone asking for it, or is it just a theoretical concern?

I would simply create new linguistic objects, each of which has a P90 has_content, and then related those objects using existing properties.

Outcome: 

In the 47th joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 40th FRBR - CIDOC CRM Harmonization meeting; the sig decided to close this issue  since there is no loose ends there so there is no point to keep it open.

The issue closed

June 2020
 

Meetings discussed: