Issue 394: Solution for Dualism of E41 Appellation and rdfs:label

ID: 
394
Starting Date: 
2018-09-01
Working Group: 
3
Status: 
Done
Closing Date: 
2018-11-28
Background: 

Posted by Martin on 1/9/2018

Dear All,

Obviously, there are two ways in RDF to express what the CRM regards as an Appellation: Either using a URI, instance of E41, and then another property specifying in whatever way the symbolic content (I am not concerned with this here), OR using rdfs:label, which has exactly the meaning of some forms of Appellation that can be expressed exhaustively as literal.
Interesting enough, there seems to be no existing validation method, that would exclude any instance of xsd Datatype to be used as range of rdfs:label.
We have made therefor the following tests with Virtuoso, if P1 can have two ranges, Literal and E41, and if SPARQL gives the expected answers, it does:

see here the test

 

I propose this method for the RDFS implementation of the CRM: two ranges for P1, namely E41 and rdf:Literal, and P1 superproperty of rdfs:label.

Current Proposal: 

Posted by Robert Sanderson on 4/9/2018

Dear all,

Please no!  This is called “punning” (when the same property can be have both literals and resources as its range) and is widely recognized as a bad practice in RDF.

In particular, it makes it difficult in several serializations to distinguish between the string “http://example.museum.org/data/1” and the resource that has the URI http://example.museum.org/data/1 as its identifier.

posted by Detlev Balzer on 4/9/2018

Am 04.09.2018 um 19:19 schrieb Robert Sanderson:
> In particular, it makes it difficult in several serializations to distinguish between the string “http://example.museum.org/data/1” and the resource that has the URI http://example.museum.org/data/1 as its identifier.

Which ones do you mean? All the serializations I've seen make clear syntactic distinctions between literals and URIs.

While I agree that "punning" is bad practice, I don't see why it should confuse RDF software tools.

Posted by Robert Sanderson on 4/9/2018

Hi Detlev,

 

Apologies, I meant that the pattern makes it more complicated to understand, as opposed to it being ambiguous in the data (which would be much worse!). More difficult for a human rather than for the machine :)

 

For example, in JSON-LD, it would result in

 

{

  “P1_is_identified_by”: [

      “uri-as-string”,

      {

         “@id”: “uri-as-identifier”

      }

  ]

}

Which then makes developers cross, as there are mixed data types in the array, and the current specification doesn’t allow the string to be expressed in an object with only @value as a key.

Currently that would be the simpler compaction of:

{

  “P1_is_identified_by: [

      “uri-as-identifier”

  ]

}

Because P1 can only ever have a resource as its object.

Or (if you don’t care for the singleton array), the simplest possible form:

{

  “P1_is_identified_by”: “uri-as-identifier”

}

Posted by Mark Fichtner on 10/9/2018

Dear all,

the main question for me is: Is the use of rdf:label in this case really the intended way by the CIDOC CRM? In fact P1 currently has a valid range and E41 is a valid class and not a primitive datatype. Why should we substitute this?

I agree with Martin that we should integrate old data that has a different model and therefore the proposal and the work is very nice to see. However I think we should have exactly one best practice. At the GNM we typically have regular instances of E41, which in my eyes follows the CIDOC CRM better, so I would love to see this in the best practice.

 

Posted by George Bruseker on 12/9/2018

Dear all,

I am a fan of the traditional solution:

1) E1 -> p1 -> E41

here the encoding all the way down to a value would be rdfs:value VALUE because we want to track the actual string used to represent the name (separate from the URI of the name)

We use this solution whenever we want to name something about which we care for the name (much of the time)

2) rdfs:label Value

This should be used on all nodes to give a human readable label. This is often enough if we don’t study the names used.

Posted by Nicola Carboni on 11/9/2018

Dear all,
>>
>>         I propose this method for the RDFS implementation of the CRM: two ranges for P1, namely E41 and rdf:Literal, and P1 superproperty of rdfs:label.

While punning could be a solution, I have to admit it not my favourite either.

In our daily practice, for encoding appellation strings we defined as best practice[1]:

Approach 1)  E1 → P1 is identified by → E41 Appellation → rdfs:Label → rdfs:Literal

The modelling above allow us the possibility to add further statements about the Appellation. Nevertheless, I have to admit that there has been cases where using rdfs:label on the entity (without using the appellation) was very much desired, because it would easier to display, and it would not create extra unnecessary triples (which could be a problem in big datasets).

We are, for now, sticking with the modelling above, which seems to be the most comprehensive and able to accomodate the most diverse needs.
If I would vote, I would suggest Approach 1 as standard practice and the deprecation of the approaches 2 and 3 below:

Approach 2) E1 → P1 is identified by → E41 Appellation → P3 has note → rdfs:Literal
and
Approach 3) E1 → P1 is identified by → E41 Appellation → rdf:value → rdfs:Literal


If I may suggest, specification should also need be discussed for the preferred appellation, which does not have a corresponding property in CRM. Personally, we define each of the strings of a name as different appellation (unless they are transliteration) and specify the preferred one using P2 has type (more details here [2]).

Posted by Richard Light on 11/9/2018

Hi,

Apologies for being so quiet on this front.

I'm puzzled by Martin's final declaration test: it says the intention is to see if P1 can be a superproperty of rdfs:label, yet the declaration (and consequent SPARQL query) asserts/tests that it is a subproperty of rdfs:label:
>
> The next question is, if P1 can be declared superproperty of rdfs:label, so that the query for P1 returns everything CRM regards as Appellation. It works:
>
> It was tested by altering the cidoc-crm rdfs file, importing it in virtuoso and asking for the subproperties of rdfs:label as follows:
>
> <rdf:Property rdf:about="P1_is_identified_by">
>
>      <rdfs:label xml:lang="en">is identified by</rdfs:label>
>
>      <rdfs:label xml:lang="ru">идентифицируется посредством</rdfs:label>
>
>      <rdfs:label xml:lang="fr">est identifiée par</rdfs:label>
>
>      <rdfs:label xml:lang="pt">é identificado por</rdfs:label>
>
>      <rdfs:domain rdf:resource="E1_CRM_Entity"/>
>
>       <rdfs:range rdf:resource="E41_Appellation"/>
>
> *     <rdfs:subPropertyOf rdf:resource="http://www.w3.org/2000/01/rdf-schema#label" <http://www.w3.org/2000/01/rdf-schema#label>/>*
>
> </rdf:Property>
>
> Query (Give me all the subproperties of rdfs:label) :
>
> select * where {
>
> ?p rdfs:subPropertyOf rdfs:label
>
> }
>
> Result from Virtuoso:
>
> p:
>
> http://www.cidoc-crm.org/cidoc-crm/P1_is_identified_by

I suspect that in practice this distinction doesn't particularly matter (though it does make my brain hurt ). Presumably the thinking is that when you search for P1 relationships you will get all the rdfs:label shortcuts as well?  (And not the other way round ...?)

Anyway, my point is that a single SPARQL expression cannot pick up both fully-expressed P1's with an E41 on the end, and 'short cut' P1's with rdfs:label.  This is because the shape of the RDF is different in the two cases.  So you're going to have to query for both variant patterns explicitly, and declaring P1 as a sub- or super-property of rdfs:label isn't going to save you any time or effort.  Doing so might lead to unwanted consequences, especially if CRM data forms part of a larger RDF resource.

As regards your proposed short-cuts (and bearing in mind that P1 is already a short-cut), I don't have strong views on which approach to take.  I think that your experiments with Virtuoso demonstrate the 'open world' nature of the RDFS framework; it allows multiple alternative possibilities.

If we look at the definition of rdfs:label, it says it represents "a human-readable version of a resource's name" (my italics).  As such, it is certainly semantically close to our P1 property, whose range is an E41 Appellation.  So that might be an argument for your second option: using rdfs:label at the point where P1 would otherwise occur.

Like George, I originally liked the idea of using rdfs:value to represent values (!), but on re-reading their definitions I now tend to agree with Martin that rdfs:label is a closer fit, semantically, for our purposes.  rdfs:value may still be useful for representing more complex values (e.g. those involving values and units), and we may want to consider deriving some useful CRM-specific subproperties of rdfs:value.  But that's a discussion for another day ...

Posted by Martin on 11/9/2018

Dear All,

Firstly, apologies, the RDF was wrong, it was intended to be P1 is superproperty of rdfs:label.

Semantically, the range of rdfs:label, when used, is ontologically an Appellation in the sense of the CRM.

I agree with George, that all RDF nodes should have a human readable label. They name the thing, even if it is a technical node.
I would find it confusing to say, labels are not to be queried, only to be read, and the "real" names must have a URI,
regardless weather I have more to say about it.

I am really not a fan of punning, we definitely forbid it in the CRM.

The point with Appellations is that some, the simple ones, can directly be represented in the machine, or be outside. The solution to assign a URI in all cases, and then a value or label, does not make the world easier. It is extremely bad performance. We talk here about implementation, not about ontology.
You get simply a useless explosion of the graph for a purpose of theoretic purity.

Those claiming confusing should be more precise. Has someone looked at query benchmarks? Has someone looked at graphical representations of RDF graphs. Do they really look better?

So either we either ignore the issue, and write queries that collect names either via P1, URI and a value/label, or via a label, because this is where names appear in RDF, we make no punning, but our queries implement exactly this meaning. So, we are not better, but do as if we wouldn't know.

Or, we describe the fact by punning, have one superproperty for all cases, which we can query, and stop thereby the discussion if labels are allowed or not, and how they relate to appellations. The punning comes in, because the range of the superproperty must comprise the ranges of the subproperties. We can play a bit more, make the punning with a superproperty of P1, and have both P1 and rdfs:label subproperties of it, if this is preferred.
The solution I describe is just a logical representation of the situation, not creating a different situation. It just says that names can be complex objects or simple literals.

The problem is, that the RDF literals do have meaning beyond being symbol sequences.

The punning does not introduce the problem. With or without, the queries have to cope with names in either form.
This holds similarly for space primitives and large geometry files, for short texts and equivalent files etc.

Opinions? 

Posted by Richard Light on 12/9/2018


On 11/09/2018 20:02, Martin Doerr wrote:
> Dear All,
>
> Firstly, apologies, the RDF was wrong, it was intended to be P1 is superproperty of rdfs:label.

I'm not sure that this is something we need to state at all, and I worry that - if it is included in our RDFS Schema - it may bring unwanted side-effects.  Isn't this saying that any instance of rdfs:label is to be treated as an instance of P1?  Bear in mind that CRM data may co-exist in triple stores in company with other RDF data, which may well use rdfs:label for its own purposes.  This assertion that 'all rdfs:labels are P1 relationships' would then be applied to this other data as well.  This might well result in incorrect/spurious results when SPARQL queries are applied to the data.

In general, I suggest that we are ok to define sub-classes/properties of standard RDFS types, but we shouldn't define super-classes/properties of them.  (I would welcome comments on the validity of this suggestion from someone who understands RDF better than me.)

> Semantically, the range of rdfs:label, when used, is ontologically an Appellation in the sense of the CRM.
Agreed (see my reply from yesterday).  The conclusion I draw from this is that we can validly say:

E1 rdfs:label "string value" is a shortcut for the path 'E1 CRM Entity' 'P1 is identified by' 'E41 Appellation' ...

in exactly the same spirit as the similarly-worded note which we find in the definition of P1 itself. (Obviously, by using this shortcut, we lose the information that this string value is an Appellation, but that's the nature of short-cuts.)

> I agree with George, that all RDF nodes should have a human readable label. They name the thing, even if it is a technical node.
> I would find it confusing to say, labels are not to be queried, only to be read, and the "real" names must have a URI,
> regardless weather I have more to say about it.
>
> I am really not a fan of punning, we definitely forbid it in the CRM.
>
> The point with Appellations is that some, the simple ones, can directly be represented in the machine, or be outside. The solution to assign a URI in all cases, and then a value or label, does not make the world easier. It is extremely bad performance. We talk here about implementation, not about ontology.
> You get simply a useless explosion of the graph for a purpose of theoretic purity.

Agreed. What we need to do is to propose a simple way of expressing simple Appellations in RDF.  That is why my shortcut definition above ends with '...': I don't think we have yet decided how to do this.

I've just been looking over the draft document we are trying to write, and it currently says that a fully-worked-out path will use 'P3 has note -> E62 string' to express the value of an E41 Appellation.  This (i.e. the suggestion to use P3) comes from the definition of the superclass E90 Symbolic Object.  A comment in our draft RDF document questions whether this is sufficiently precise, since P3 is simply "a container for all informal descriptions about an object that have not been expressed in terms of CRM constructs".  I suggest that we need either to use rdfs:value to hold the string value, or (better) to define a CRM-specific subproperty of rdfs:value and use that.  (This subproperty could be part of the published CRM, or it could just form part of the 'RDF implementation' guidelines.)  I don't think that we should use rdfs:label here.

I don't think we should concern ourselves with URLs in our RDF guidance document.  Any implementer of our RDF solutions can choose to assign a URL to represent any node in the structure, but it won't change the logic of the resulting RDF, or how it responds to SPARQL queries.

>
> Those claiming confusing should be more precise. Has someone looked at query benchmarks? Has someone looked at graphical representations of RDF graphs. Do they really look better?
>
> So either we either ignore the issue, and write queries that collect names either via P1, URI and a value/label, or via a label, because this is where names appear in RDF, we make no punning, but our queries implement exactly this meaning. So, we are not better, but do as if we wouldn't know.
>
> Or, we describe the fact by punning, have one superproperty for all cases, which we can query, and stop thereby the discussion if labels are allowed or not, and how they relate to appellations. The punning comes in, because the range of the superproperty must comprise the ranges of the subproperties. We can play a bit more, make the punning with a superproperty of P1, and have both P1 and rdfs:label subproperties of it, if this is preferred.
> The solution I describe is just a logical representation of the situation, not creating a different situation. It just says that names can be complex objects or simple literals.

As I said yesterday, I don't see how any punning strategy can make differently-structured RDF equivalent for the purposes of querying. Therefore, I think we will have to accept that if we allow more than one way of representing a given statement in CRM RDF, we will have to construct queries which look explicitly for each of the possible patterns.

> The problem is, that the RDF literals do have meaning beyond being symbol sequences.
Insofar as they have such meaning, I would argue that we define it (i.e. that meaning) by the CRM context in which we place the string/literal value.  I think there is a danger that we could over-think this problem.

Posted  by Martin on 12/9/2018

Dear Richard,

I basically agree with your comments. Specifically however, I indeed wanted to say that the official definition of rdfs:label makes it exactly a subproperty of P1 (or shortcut of it) in any correct use of RDFS. If we want to mix RDFS models, we should have an opinion about their compatibility. Otherwise, we would have to regard them as alternative that cannot be compared with the CRM.

I am not happy with adding rdfs:label to instances of Appellation, because this would mean it is a name for a name and not the name. I would sympathize with George using rdfs:value, if it had the respective semantics.

What we need, to my opinion, is a property of Symbolic Object we may call it "has symbolic content" or "has symbolic content inline" or anything better, which defines that the symbolic content is identical to the Literal, abstracted to the "level of symbolic specificity" that the Literal implies and that conforms to the identity condition of the Symbolic Object, i.e., characters of a certain script, or whatever. That would make the meaning of the "value" unambiguous.

We may need add another property, such as "is contained in" or so pointing to a URL actually holding an instance of its content, again abstracted to the "level of symbolic specificity" that the file instance implies and that conforms to the identity condition of the Symbolic Object.

Whereas the shortcut interpretation is attractive, it is not exactly the same. Using a shortcut, we say that the intermediate node is of different, independent nature from the terminal node. Here, we do not say "Appellation" is related to something called "Literal". We say "this Appellation IS itself what is in this Literal". That may or may not be a reason to reject this interpretation.

We also have to distinguish Appellations and other Symbolic Objects which have multiple symbolic forms, i.e. spelling variants, versions etc., from those being one symbolic form. The rdfs:value has no means to express that. I believe we need yet another property "has symbolic content variant". In that case, the URI is necessary, to my opinion.

I think the polymorphism we describe here, well studied in object-oriented languages, is in the nature of Appellations. The problem for me is, that the the respective KR models have NOT THOUGHT of the case that such polymorphisms can occurr. Nevertheless, RDFS is tolerant enough to accept the Superproperty statement, but not to create a class which is either URI or inline expanded object.

This polymorphism occurs EXCLUSIVELY for Symbolic Objects with symbol sets a certain machine supports. Another reason not to use rdfs:value, because it does not give credit to the fact that only Symbolic Objects can have such a "value".

I agree that we may over-think the point. As I mentioned, the superproperty statement I propose has no other effect than that I can get E41's and labels back by querying P1 only.

Opinions?

Posted by Richard Light on 13/9/2018


On 12/09/2018 14:55, Martin Doerr wrote:
> Dear Richard,
>
> I basically agree with your comments. Specifically however, I indeed wanted to say that the official definition of rdfs:label makes it exactly a subproperty of P1 (or shortcut of it) in any correct use of RDFS. If we want to mix RDFS models, we should have an opinion about their compatibility. Otherwise, we would have to regard them as alternative that cannot be compared with the CRM.

OK: noted.  My concern is simply that we should not include assertions which mean that 'CRM RDF' fails to play nicely with other RDF frameworks.  I would welcome the thoughts of others on this issue.

> I am not happy with adding rdfs:label to instances of Appellation, because this would mean it is a name for a name and not the name. I would sympathize with George using rdfs:value, if it had the respective semantics.
Yes, we're in full agreement on this.

> What we need, to my opinion, is a property of Symbolic Object we may call it "has symbolic content" or "has symbolic content inline" or anything better, which defines that the symbolic content is identical to the Literal, abstracted to the "level of symbolic specificity" that the Literal implies and that conforms to the identity condition of the Symbolic Object, i.e., characters of a certain script, or whatever. That would make the meaning of the "value" unambiguous.
Again, I'm in complete agreement with this line of thought.  One decision we should make is whether this property forms part of the generic CRM framework, or if it is to be an implementation-specific property which only appears in our RDF implementation of the CRM.  My instinct is for it to go into the CRM proper: the treatment of Symbolic Object and its subclasses would I think be made clearer by the addition of this property.

It's worth bearing in mind that RDF strings have a built-in mechanism for specifying the language of the string.  This would allow us to express, for example, a place name in multiple languages by simply having one 'has symbolic content' property per language, each with an associated string.

> We may need add another property, such as "is contained in" or so pointing to a URL actually holding an instance of its content, again abstracted to the "level of symbolic specificity" that the file instance implies and that conforms to the identity condition of the Symbolic Object.
I think that we would benefit from some use cases which demonstrate the practical need for this property.  My own instinct is that if we are really just recording a string value, then it is overkill to assign it a URL and put it somewhere else.  If it's more than just a string value, in what way is it more?  Is it an instance of some other class, which we should be defining (or have already identified)?

My suggestion is that we define the "has symbolic content" property, and then put our energy into agreeing one or more subproperties of rdf:value which meet the known recording requirements for cultural heritage information.  By doing this, I suggest that we will have solved the main problem which confronts implementors who want to express CRM in RDF.

> Whereas the shortcut interpretation is attractive, it is not exactly the same. Using a shortcut, we say that the intermediate node is of different, independent nature from the terminal node. Here, we do not say "Appellation" is related to something called "Literal". We say "this Appellation IS itself what is in this Literal". That may or may not be a reason to reject this interpretation.
True.  At least two respondents in this conversation have said that they prefer the fully-worked-out paths.  Let's sort out an initial strategy for RDF based on the current CRM; then we can form a view as to whether further shortcuts are still required.

> We also have to distinguish Appellations and other Symbolic Objects which have multiple symbolic forms, i.e. spelling variants, versions etc., from those being one symbolic form. The rdfs:value has no means to express that. I believe we need yet another property "has symbolic content variant". In that case, the URI is necessary, to my opinion.
There may be a need for such a property; an analogy would be in SKOS, which has skos:prefLabel (one per language) and skos:altLabel.  However, I wonder if there is value in being able to express, in an open world situation, that one symbolic form is the "right" one and the others are variants.  I would welcome some concrete examples to inform our discussion.

From your explanations, I am getting a mental picture of an Appellation which has been the subject of much study, where you want to record, in a condensed way, all the possible forms which that Appellation might take.  For example, the sort of entry you might find in an encyclopaedia or a biographical authority.  I think that a more typical scenario might be where the 'same' name (e.g. the name of a known individual) occurs in a number of sources, but varies between them.

Also, I don't see how introducing a URL helps with this problem.  If you have an Appellation node in your graph, there are various statements which you can make about it.  If instead you invent a URL to represent that Appellation, you are in exactly the same situation as before, in terms of the statements you can make.  In fact, you have taken one step backwards, because you now have to begin by declaring explicitly that this node represents an Appellation: <myURL> rdf:type crm:P41_Appellation.

> I think the polymorphism we describe here, well studied in object-oriented languages, is in the nature of Appellations. The problem for me is, that the the respective KR models have NOT THOUGHT of the case that such polymorphisms can occurr. Nevertheless, RDFS is tolerant enough to accept the Superproperty statement, but not to create a class which is either URI or inline expanded object.
>
> This polymorphism occurs EXCLUSIVELY for Symbolic Objects with symbol sets a certain machine supports. Another reason not to use rdfs:value, because it does not give credit to the fact that only Symbolic Objects can have such a "value".

I'm afraid you have lost me here. It would be very helpful to me (and might encourage others to join in the conversation) if you could post one or two concrete examples of what you mean.

Posted by Martin on 13/9/2018

Dear Richard,
>
> On 12/09/2018 14:55, Martin Doerr wrote:

>> Dear Richard,
>>
>> I basically agree with your comments. Specifically however, I indeed wanted to say that the official definition of rdfs:label makes it exactly a subproperty of P1 (or shortcut of it) in any correct use of RDFS. If we want to mix RDFS models, we should have an opinion about their compatibility. Otherwise, we would have to regard them as alternative that cannot be compared with the CRM.

> OK: noted.  My concern is simply that we should not include assertions which mean that 'CRM RDF' fails to play nicely with other RDF frameworks.  I would welcome the thoughts of others on this issue.
>

>> I am not happy with adding rdfs:label to instances of Appellation, because this would mean it is a name for a name and not the name. I would sympathize with George using rdfs:value, if it had the respective semantics.
> Yes, we're in full agreement on this.
>

>> What we need, to my opinion, is a property of Symbolic Object we may call it "has symbolic content" or "has symbolic content inline" or anything better, which defines that the symbolic content is identical to the Literal, abstracted to the "level of symbolic specificity" that the Literal implies and that conforms to the identity condition of the Symbolic Object, i.e., characters of a certain script, or whatever. That would make the meaning of the "value" unambiguous.
> Again, I'm in complete agreement with this line of thought.  One decision we should make is whether this property forms part of the generic CRM framework, or if it is to be an implementation-specific property which only appears in our RDF implementation of the CRM.  My instinct is for it to go into the CRM proper: the treatment of Symbolic Object and its subclasses would I think be made clearer by the addition of this property.
For CRM proper!
>
> It's worth bearing in mind that RDF strings have a built-in mechanism for specifying the language of the string.  This would allow us to express, for example, a place name in multiple languages by simply having one 'has symbolic content' property per language, each with an associated string.
>

>> We may need add another property, such as "is contained in" or so pointing to a URL actually holding an instance of its content, again abstracted to the "level of symbolic specificity" that the file instance implies and that conforms to the identity condition of the Symbolic Object.
> I think that we would benefit from some use cases which demonstrate the practical need for this property.  My own instinct is that if we are really just recording a string value, then it is overkill to assign it a URL and put it somewhere
I made a jump here. This is for things like a (standardized) text of Aristotle in a MS Word document, and in a .html file. If I mean the text alleged to Aristotle, I obviously do not mean the type face in MS Word to belong to Aristotle's text, nor html layout instructions. Means, that both contain the precisely the same text, but are themselves different, because they are richer in information, which are modern renderings. All three, the standardized text of Aristotle, the MS Word representation and the html representation are different Symbolic Objects, but one is contained in the other two.
> else.  If it's more than just a string value, in what way is it more?  Is it an instance of some other class, which we should be defining (or have already identified)?
>
> My suggestion is that we define the "has symbolic content" property, and then put our energy into agreeing one or more subproperties of rdf:value which meet the known recording requirements for cultural heritage information.  By doing this, I suggest that we will have solved the main problem which confronts implementors who want to express CRM in RDF.
Yep, subproperty of rdf:value is not bad.
>

>> Whereas the shortcut interpretation is attractive, it is not exactly the same. Using a shortcut, we say that the intermediate node is of different, independent nature from the terminal node. Here, we do not say "Appellation" is related to something called "Literal". We say "this Appellation IS itself what is in this Literal". That may or may not be a reason to reject this interpretation.
> True.  At least two respondents in this conversation have said that they prefer the fully-worked-out paths.  Let's sort out an initial strategy for RDF based on the current CRM; then we can form a view as to whether further shortcuts are still required.
>

>> We also have to distinguish Appellations and other Symbolic Objects which have multiple symbolic forms, i.e. spelling variants, versions etc., from those being one symbolic form. The rdfs:value has no means to express that. I believe we need yet another property "has symbolic content variant". In that case, the URI is necessary, to my opinion.
> There may be a need for such a property; an analogy would be in SKOS, which has skos:prefLabel (one per language) and skos:altLabel.  However, I wonder if there is value in being able to express, in an open world situation, that one symbolic form is the "right" one and the others are variants.  I would welcome some concrete examples to inform our discussion.
Well, I did not mean that there is a "right" form: "Martin", "Martinus","Martijn", "Marty".....if you go back in history there is often no standard for one anguage either,
>
> From your explanations, I am getting a mental picture of an Appellation which has been the subject of much study, where you want to record, in a condensed way, all the possible forms which that Appellation might take.  For example, the sort of entry you might find in an encyclopaedia or a biographical authority.  I think that a more typical scenario might be where the 'same' name (e.g. the name of a known individual) occurs in a number of sources, but varies between them.
>
> Also, I don't see how introducing a URL helps with this problem.  If you have an Appellation node in your graph, there are various statements which you can make about it. 
Sure. It does not make sense for E41. Names are small enough to keep them in a Literal. Other Symbolic Objects may not be.
> If instead you invent a URL to represent that Appellation, you are in exactly the same situation as before, in terms of the statements you can make.  In fact, you have taken one step backwards, because you now have to begin by declaring explicitly that this node represents an Appellation: <myURL> rdf:type crm:P41_Appellation.

>> I think the polymorphism we describe here, well studied in object-oriented languages, is in the nature of Appellations. The problem for me is, that the the respective KR models have NOT THOUGHT of the case that such polymorphisms can occurr. Nevertheless, RDFS is tolerant enough to accept the Superproperty statement, but not to create a class which is either URI or inline expanded object.
>>
>> This polymorphism occurs EXCLUSIVELY for Symbolic Objects with symbol sets a certain machine supports. Another reason not to use rdfs:value, because it does not give credit to the fact that only Symbolic Objects can have such a "value".

> I'm afraid you have lost me here. It would be very helpful to me (and might encourage others to join in the conversation) if you could post one or two concrete examples of what you mean.

OK, in simple words: there are names which have an identity based on a certain sequence of characters. There are others, historically interesting, which have a phonetic identity, and even that may vary. We collaborate with historians, that deal with family names in the Aegean area around 1800, which have no standard spelling at all, not even a preferred one. The different spelling variants have later evolved into distinct family names. But in order to match instances in the documents, we need both concepts of identity.

Even my ancestors used "Derr" instead of "Dörr". Since the local dialect does not distinguish "e" and "ö", it is unclear if it is a spelling variant of the same phonetics or if the "ö" is an etymological misinterpretion, because "Dörr" has a linguistic meaning and the "e" in "Derr" may have another semantic root, but this is not widely accepted.

So, the names that are not identical to a Literal must be represented using a URI. That is what I mean by polymorphism.  Also, if we want to talk about the name itself as a historical fact, we need a distinct identity. All these cases are needed but rare for names. For texts, it is the opposite. They are more often in files than in literals.

On the other side, only Symbolic Objects can "reside" on computers and outside. Therefore the "punning" problem does only occur in connection to Symbolic Objects. Only these can have a "value" in the machine, whereas rdfs:value may be about anything.

posted by Richard Light on 14/9/2018

On 13/09/2018 20:57, Martin Doerr wrote:
> Dear Richard,
>>
>>> What we need, to my opinion, is a property of Symbolic Object we may call it "has symbolic content" or "has symbolic content inline" or anything better, which defines that the symbolic content is identical to the Literal, abstracted to the "level of symbolic specificity" that the Literal implies and that conforms to the identity condition of the Symbolic Object, i.e., characters of a certain script, or whatever. That would make the meaning of the "value" unambiguous.
>> Again, I'm in complete agreement with this line of thought.  One decision we should make is whether this property forms part of the generic CRM framework, or if it is to be an implementation-specific property which only appears in our RDF implementation of the CRM.  My instinct is for it to go into the CRM proper: the treatment of Symbolic Object and its subclasses would I think be made clearer by the addition of this property.
> For CRM proper!
OK: perhaps we should start a new issue to address this?
>>
>> It's worth bearing in mind that RDF strings have a built-in mechanism for specifying the language of the string.  This would allow us to express, for example, a place name in multiple languages by simply having one 'has symbolic content' property per language, each with an associated string.
>>

>>> We may need add another property, such as "is contained in" or so pointing to a URL actually holding an instance of its content, again abstracted to the "level of symbolic specificity" that the file instance implies and that conforms to the identity condition of the Symbolic Object.
>> I think that we would benefit from some use cases which demonstrate the practical need for this property.  My own instinct is that if we are really just recording a string value, then it is overkill to assign it a URL and put it somewhere
> I made a jump here. This is for things like a (standardized) text of Aristotle in a MS Word document, and in a .html file. If I mean the text alleged to Aristotle, I obviously do not mean the type face in MS Word to belong to Aristotle's text, nor html layout instructions. Means, that both contain the precisely the same text, but are themselves different, because they are richer in information, which are modern renderings. All three, the standardized text of Aristotle, the MS Word representation and the html representation are different Symbolic Objects, but one is contained in the other two.
I see: thanks. Yes, this would indeed be a different property, and in fact the URL concerned will not be a 'Linked Data' URL, since it will address a non-RDF resource.  So that forms a different discussion, perhaps?  To me, the Word document and HTML page sound like "attestations" of the Aristotle text (as the Pelagios/Linked Pasts people would say).  Another example would be a photograph of a plaque containing a text of interest.  This would also be an attestation; the difference is that the text in question is not encoded within the digital resource.

I'm certainly interested in the interface between the Linked Data world and the digital humanities world of TEI and Word documents and the like.  Techniques like Open Annotation [1] could well have a useful role to play here.

>> My suggestion is that we define the "has symbolic content" property, and then put our energy into agreeing one or more subproperties of rdf:value which meet the known recording requirements for cultural heritage information.  By doing this, I suggest that we will have solved the main problem which confronts implementors who want to express CRM in RDF.
> Yep, subproperty of rdf:value is not bad.
>

>>> I think the polymorphism we describe here, well studied in object-oriented languages, is in the nature of Appellations. The problem for me is, that the the respective KR models have NOT THOUGHT of the case that such polymorphisms can occurr. Nevertheless, RDFS is tolerant enough to accept the Superproperty statement, but not to create a class which is either URI or inline expanded object.
>>>

>>> This polymorphism occurs EXCLUSIVELY for Symbolic Objects with symbol sets a certain machine supports. Another reason not to use rdfs:value, because it does not give credit to the fact that only Symbolic Objects can have such a "value".
>> I'm afraid you have lost me here. It would be very helpful to me (and might encourage others to join in the conversation) if you could post one or two concrete examples of what you mean.
> OK, in simple words: there are names which have an identity based on a certain sequence of characters. There are others, historically interesting, which have a phonetic identity, and even that may vary. We collaborate with historians, that deal with family names in the Aegean area around 1800, which have no standard spelling at all, not even a preferred one. The different spelling variants have later evolved into distinct family names. But in order to match instances in the documents, we need both concepts of identity.
True, but any instance of the name in a document will only take one concrete form, not all of them.  (For handwritten sources it may be a matter of judgement what that form actually is.)  So you can record the form of name it exhibits (as a string), and then assert that it is (in your view) an attestation of the generic family name for which you have a URI.
>
> Even my ancestors used "Derr" instead of "Dörr". Since the local dialect does not distinguish "e" and "ö", it is unclear if it is a spelling variant of the same phonetics or if the "ö" is an etymological misinterpretion, because "Dörr" has a linguistic meaning and the "e" in "Derr" may have another semantic root, but this is not widely accepted.
>
> So, the names that are not identical to a Literal must be represented using a URI. That is what I mean by polymorphism.  Also, if we want to talk about the name itself as a historical fact, we need a distinct identity. All these cases are needed but rare for names.

There are perfectly good reasons for considering names to be worthy of study and recording in their own right.  I would argue that this is equally true whether the name in question has one, or many, possible forms.  So there is always an argument for minting a URI to represent the name as a Symbolic Object. Doing this allows you to make statements, for example, about its genesis, its meaning, its historical distribution, etc., and means you can record specific instances of the name as attestations of this Symbolic Object.

However, I would still argue that instances of the name should be recorded as strings - the actual value found in the resource in question.

> For texts, it is the opposite. They are more often in files than in literals.
>
> On the other side, only Symbolic Objects can "reside" on computers and outside. Therefore the "punning" problem does only occur in connection to Symbolic Objects. Only these can have a "value" in the machine, whereas rdfs:value may be about anything.

Posted by Martin on 14/9/2018

Dear Richard,

I'll shorten now:

On 9/14/2018 7:54 PM, Richard Light wrote:
>
>
>>> My suggestion is that we define the "has symbolic content" property, and then put our energy into agreeing one or more subproperties of rdf:value which meet the known recording requirements for cultural heritage information.  By doing this, I suggest that we will have solved the main problem which confronts implementors who want to express CRM in RDF.
>> Yep, subproperty of rdf:value is not bad.
>>

>>>> I think the polymorphism we describe here, well studied in object-oriented languages, is in the nature of Appellations. The problem for me is, that the the respective KR models have NOT THOUGHT of the case that such polymorphisms can occurr. Nevertheless, RDFS is tolerant enough to accept the Superproperty statement, but not to create a class which is either URI or inline expanded object.
>>>>
>>>> This polymorphism occurs EXCLUSIVELY for Symbolic Objects with symbol sets a certain machine supports. Another reason not to use rdfs:value, because it does not give credit to the fact that only Symbolic Objects can have such a "value".
>>> I'm afraid you have lost me here. It would be very helpful to me (and might encourage others to join in the conversation) if you could post one or two concrete examples of what you mean.

>> OK, in simple words: there are names which have an identity based on a certain sequence of characters. There are others, historically interesting, which have a phonetic identity, and even that may vary. We collaborate with historians, that deal with family names in the Aegean area around 1800, which have no standard spelling at all, not even a preferred one. The different spelling variants have later evolved into distinct family names. But in order to match instances in the documents, we need both concepts of identity.
> True, but any instance of the name in a document will only take one concrete form, not all of them.  (For handwritten sources it may be a matter of judgement what that form actually is.)  So you can record the form of name it exhibits (as a string), and then assert that it is (in your view) an attestation of the generic family name for which you have a URI.
This is not true. We do have counterexamples. The name may take multiple forms in the same document.
>>
>> Even my ancestors used "Derr" instead of "Dörr". Since the local dialect does not distinguish "e" and "ö", it is unclear if it is a spelling variant of the same phonetics or if the "ö" is an etymological misinterpretion, because "Dörr" has a linguistic meaning and the "e" in "Derr" may have another semantic root, but this is not widely accepted.
>>
>> So, the names that are not identical to a Literal must be represented using a URI. That is what I mean by polymorphism.  Also, if we want to talk about the name itself as a historical fact, we need a distinct identity. All these cases are needed but rare for names.

> There are perfectly good reasons for considering names to be worthy of study and recording in their own right.  I would argue that this is equally true whether the name in question has one, or many, possible forms.  So there is always an argument for minting a URI to represent the name as a Symbolic Object. Doing this allows you to make statements, for example, about its genesis, its meaning, its historical distribution, etc., and means you can record specific instances of the name as attestations of this Symbolic Object.
>
> However, I would still argue that instances of the name should be recorded as strings - the actual value found in the resource in question.
Sure. this is another issue. And they can be multiple...

Outcome: 

In the 42nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 35th FRBR - CIDOC CRM Harmonization meeting, the crm-sig discussed MD’s proposal that in the RDFS implementation of the CRM, P1 can have two ranges, Literal and E41, and that P1_is_identified_by is a superproperty of rdfs:label, bearing in mind that any decision reached would affect other types of primitive values, as it would carry over to them. There were some objections raised because for allowing two distinct datatypes as possible range for P1, however the crm-sig voted in favor of MD’s proposal, marking that

(i) it should be explicitly stated that it forms a technical solution that is part of the implementation of crm in rdf (CEO), and
(ii) it is a solution best understood as an implicit way to describe instances of E41 Appellation. An alternative path of P1 is given as best practice –namely one going through E41 Appellation –Pxx has symbolic content –Literal.

The issue is closed

Berlin, November 2018

Meetings discussed: