Issue 460: URI Management

ID: 
460
Starting Date: 
2020-01-16
Working Group: 
2
Status: 
Open
Background: 

Posted by Francesco Beretta on 16/1/2020

Dear all,

I have a question about CIDOC CRM URI management.

The last published version of CRMbase is 6.2.1. If I take the RDF serialization, I find this base URI:

http://www.cidoc-crm.org/cidoc-crm/

If I sent this URI in the web:

http://www.cidoc-crm.org/cidoc-crm/E92_Spacetime_Volume

I have an error message.

If I sent this URI in the web:

http://www.cidoc-crm.org/cidoc-crm/E5_Event

I'm dereferenced on verson 5.0.4.

The machine cannot know which version of CRM is considered.

I have then the Erlangen URI:

http://erlangen-crm.org/current/E92_Spacetime_Volume

dereferencing on a document of the whole version.

There are additional, earlier specific versions.

I have an issue in OntoME: which URI is to be used ?

We have a provisional, not dereferenced URI:

https://dataforhistory.org/external-ontology/cidoc-crm-base-6-2/E92_Spac...

It is there to avoid confusion but it's bad practice.

I'm asking myself what to do, and people adopting the CRM are asking me these kind of questions, beeing not happy with this situation.

I think there was already a discussion about this point in the SIG.

Shouldn't we find, and implement, a solution that meets current requirements?

The same issue is raised of course about the extensions familiy.

 

Posted by George on 16/1/2020

Dear all,

I agree that this is an ongoing issue that creates barriers to uptake because of confusion. It is an oft repeated question and deserves a clear answer. We need a solution based on community wide best practice. Suggestions?

Posted by martin on 16/1/2020

Dear Francesco,

At FORTH we will implement anything that is regarded good practice, and does not create a manual overhead we cannot manage. Volunteers to design whatever is needed? 

Posted by Robert Sanderson on 16/1/2020

Dear all,

I have a python script that already does this for CRM and the Linked Art extension.

The results of that script for Linked Art can be seen here:

https://linked.art/ns/terms/   -- the entire ontology is returned when dereferencing the namespace
https://linked.art/ns/terms/paid_amount.xml  -- an individual term is returned when dereferencing its URI

The script simply goes through the ontology files and cuts out each property and class in turn. Then a very simple redirect handler adds the mapping to the .xml files.

You can see the results for CRM in a temporary branch:

https://prov-updates--linked-art.netlify.com/ns/crm/P9_consists_of  -- P9 (but the rest of the data is there too)

Posted by Richard Light on 16/1/2020

On 16/01/2020 12:09, George Bruseker wrote:
> Dear all,
>
> I agree that this is an ongoing issue that creates barriers to uptake because of confusion. It is an oft repeated question and deserves a clear answer. We need a solution based on community wide best practice. Suggestions?

George,

It sounds as though there are a number of issues here:

    what is returned (HTML or RDF)
    which version of the CRM is returned (how to know which you have; how to specify which one you want)
    how much of the CRM is returned (one concept or the whole thing). If it's the whole thing, where are you placed within it
    whether to return a set of RDFS statements or an OWL ontology

As things stand, the Erlangen implementation returns an OWL ontology. This includes version information. By default (in my Firefox browser) Erlangen returns an RDF/XML response.  You are placed at the start of this document.  So you get the same response, no matter which CRM class or property you specified in the URL.

Our implementation returns a set of RDFS class and property statements. By default it redirects to an HTML response, and uses the '#' notation (supported in native HTML) to place you at the correct place within that web page. The version number is given, but only as a human-readable heading.  If you specify RDF/XML in your HTTP request (e.g. using curl), it redirects to RDF/XML and gives you the whole thing, again starting from the beginning.

I think our approach (human-readable response by default) is the better one, especially as you end up reading about the concept you expressed an interest in. It would be nice if the RDF response could also take you to the correct declaration - this would require the addition of IDs to each declaration, plus the addition of a '#' to the redirected URL. (? does this work for XML docs?)  The RDF response could also be improved by the addition of a machine-processible header, including version info and possibly links to the various versions that are available.

Which brings us to the question of supporting different versions. Erlangen have the concept of 'current', which at 6.2.1 is rather more current than we manage. However, I don't see any way of getting at earlier versions.  We could support a URL pattern:

http://www.cidoc-crm.org/cidoc-crm/[version]/E5_Event

which would allow users to explicitly state which CRM version they are conforming to.

If we can agree on a spec for improved RDF delivery, I would be happy to help implement it.

Posted by Detlev Balzer on 16/1/2020

> Martin Doerr <martin@ics.forth.gr> hat am 16. Januar 2020 um 13:27 geschrieben:
>
> (...)
> At FORTH we will implement anything that is regarded good practice, and 
> does not create a manual overhead we cannot manage. 

For formal specifications such as ontologies, there is a widely adopted pattern for change management which goes like this:

http://www.cidoc-crm.org/cidoc-crm/ always resolves to the latest version, while

http://www.cidoc-crm.org/cidoc-crm/{version}/ always resolves to the particular {version} given in the URI.

There can be any number of versions, and the latest one is both referenced through the un-versioned namespace and through the one with the most recent version number (or publication date, if that is used for versioning).

Alternatively, the most recent version could be labelled explicitly as the current one, e.g. http://www.cidoc-crm.org/cidoc-crm/current/

Application developers must then decide what kind of stability they prefer: stability of the namespace URI, or stability of the content retrieved from a URI. Evidently, one cannot have both.

Maintenance effort for this pattern is minimal: Just publish each new version under its versioned namespace and then, any time another version comes out, adjust the non-versioned namespace so that it will resolve to the most recent version. Most modern Web frameworks have a URL routing facility which makes this fairly easy.

I should not forget to say that LOD best practice also demands that URIs support content negotiation, as assumed throughout all recommendations in the http://linkeddatabook.com/

Posted by Velios on 16/1/2020

I agree with Detlev's proposal. Also, I believe that versions should not be included in the class URIs. These are not normally used to retrieve reasoning rules but only to identify classes, right? Resolving the class URI should return all versions of the class.
 

Posted by George on 17/1/2020

Dear all,

It seems a very fruitful discussion. Can I add some other 'complications' into it. 

Starting from what Detlev proposes:

    > For formal specifications such as ontologies, there is a widely adopted pattern for change management which goes like this:
    >
    > http://www.cidoc-crm.org/cidoc-crm/ always resolves to the latest version, while
    >
    > http://www.cidoc-crm.org/cidoc-crm/{version}/ always resolves to the particular {version} given in the URI.

This seems sensible. Here is a twist. 

If we click the first link, it brings us to CIDOC CRM 5.0.4 which is the last official ISO version.  In the meantime, we have a last official community version which 6.2.1. Which one should this be pointing to? Second, it points to the text version of the ontology in an html representation. 

For the appearance/presentation of the whole ontology, it is an html representation of the main document that we create. This seems fine. Would it be useful to be able to provide links explicitly at the top of this document to click over to encodings? This way somehow we can better consolidate and direct people to the RDF and the Erlangen OWL?

To me doing it this way, the Erlangen way, makes sense. So current always points to what current is (once we define what current is). It would also be good to be able to use the versioned edition (not currently supported but presumably easy).

Up to here we talk about pointing to the whole ontology representation.

Then there comes the question of resolving to an individual concept: http://www.cidoc-crm.org/cidoc-crm/E5_Event

As Richard points out, if you click it, it uses # and puts you to the right anchor point in the overall html document. Is this the best practice?

I will point out that on the CRM site, there is also an entire architecture wherein each version has its own overall presentation: e.g.: http://www.cidoc-crm.org/Version/version-6.2.1

and then you can click on an individual concept, eg: http://www.cidoc-crm.org/Entity/E5-Event/Version-6.2.1

The above follows a different URI pattern than suggested above, but is doing the same work. This is run on a database that also calculates incoming and outgoing properties, making the representation more full than one gets from the flat html versino of our word doc. Functionally, it can be argued it is more useful. Would it be possible to use this as the dereferencing point and stay within best practices? If the URI pattern were changed could we provide an easy was then to click over to the particular representation of the element in OWL, RDFS or other representations that exist for that version?

Finally to Thanasis' point.

"Resolving the class URI should return all versions of the class."

Currently we certainly don't do that. It definitely would not / could not happen based on our doc/html presentation of the ontology. With the database version I pointed to above, I suppose it would be relatively straightforward to have the older versions of a class you are looking at listed below as links. I guess it would be a specialist user who would care about this (not to put the idea down, just to say).

I hope these questions are a useful contribution to the conversation. 

Posted by Thanasis on 17/1/2020

> For the appearance/presentation of the whole ontology, it is an html representation of the main document that we create. This seems fine. Would it be useful to be able to provide links explicitly at the top of this document to click over to encodings? This way somehow we can better consolidate and direct people to the RDF and the Erlangen OWL?

Links would certainly be useful but the web server's content negotiation mechanism should be enough to deliver the right format to the client, is this what you mean?

> I will point out that on the CRM site, there is also an entire architecture wherein each version has its own overall presentation: e.g.: http://www.cidoc-crm.org/Version/version-6.2.1

I think this should be maintained but not used as URIs for classes.

> Finally to Thanasis' point.
>
> "Resolving the class URI should return all versions of the class."
>
> Currently we certainly don't do that. It definitely would not / could not happen based on our doc/html presentation of the ontology. With the database version I pointed to above, I suppose it would be relatively straightforward to have the older versions of a class you are looking at listed below as links. I guess it would be a specialist user who would care about this (not to put the idea down, just to say).

Yes I thought it should be relatively easy to do through a Drupal View. The point is that if there are no versions on the class URI, the user should be able to read about any version of the class given that they may be coming from a database using an earlier version than the current one. 

Posted by Robert Casties on 17/1/2020

Hi George,

On 17.01.20 10:47, George Bruseker wrote:
> I will point out that on the CRM site, there is also an entire architecture
> wherein each version has its own overall presentation: e.g.:
> http://www.cidoc-crm.org/Version/version-6.2.1

Wow, that is a really useful format, I didn't know it existed 

Especially having a concise list of all Classes and Properties and then
having all inherited Properties also listed with each class! That is
really useful when working on an implementation.

Sadly this format seems to exist only up to 6.2.2 

This is not exactly to the point of what is "right" to resolve the
default URIs to but as a documentation this is much more useful to me
than the reference PDF which is the only thing linked on
http://www.cidoc-crm.org/versions-of-the-cidoc-crm. Would it be possible
to have this format also for at least the latest version?

 

Posted by George on 17/1/2020

    Links would certainly be useful but the web server's content negotiation
    mechanism should be enough to deliver the right format to the client, is
    this what you mean?

My underlying assumption would be that the default thing served up would be html, but you could reach the other representation consistently through adding an appropriate ending or whatever would be most suitable... but that people looking at the html should have a shiny red button type clue that there is another way to retrieve the info which is for example as owl.
 

    > I will point out that on the CRM site, there is also an entire
    > architecture wherein each version has its own overall presentation:
    > e.g.: http://www.cidoc-crm.org/Version/version-6.2.1

    I think this should be maintained but not used as URIs for classes.

Why would you argue against using it as the resolving point for individual classes?  
 

    > Finally to Thanasis' point.
    >
    > "Resolving the class URI should return all versions of the class."
    >
    > Currently we certainly don't do that. It definitely would not / could
    > not happen based on our doc/html presentation of the ontology. With the
    > database version I pointed to above, I suppose it would be relatively
    > straightforward to have the older versions of a class you are looking at
    > listed below as links. I guess it would be a specialist user who would
    > care about this (not to put the idea down, just to say).

    Yes I thought it should be relatively easy to do through a Drupal View.
    The point is that if there are no versions on the class URI, the user
    should be able to read about any version of the class given that they
    may be coming from a database using an earlier version than the current one.

Currently this is not supported at all, correct? I mean you always point at a version. So you would suggest that 'current' should be 'versionless'? 

How I understood Erlangen to work is that it just makes the versionless URI redirect to the current. So I thought the idea would be that 'current' resolves to the present official (whatever the present official means). If a class has been deprecated then I guess it would have to revert to the last official in which it had existed?
 

Posted by George on 17/1/2020

Hi Robert,

Yes it is really quite nice actually. A hidden gem as it were.

About why it doesn't exist past 6.2.2, it's a bit odd. I would have said it is because it is only made for official release versions (like 6.2.1) but I see that it has been made for other non official versions. Perhaps it points to the need for a simple flowchart for understanding the steps that are taken in order to produce each version since there are many products (the word doc, the pdf, the html version in simple format, the database version drupal, the rdf, the owl etc.). 

We are aiming anyhow to make a new official release with the next SIG (fingers crossed) which I guess would probably entail updating the drupal resource to the latest state.

Posted by Thanasis on 17/1/2020

> My underlying assumption would be that the default thing served up would be html, but you could reach the other representation consistently through adding an appropriate ending or whatever would be most suitable... but that people looking at the html should have a shiny red button type clue that there is another way to retrieve the info which is for example as owl.

Yes, I agree.

>      > I will point out that on the CRM site, there is also an entire
>      > architecture wherein each version has its own overall presentation:
>      > e.g.: http://www.cidoc-crm.org/Version/version-6.2.1
>
>     I think this should be maintained but not used as URIs for classes.
>
>
> Why would you argue against using it as the resolving point for individual classes?

Because it includes versions. These are necessary when working across different versions but I do not think versions are needed for classes.

> Currently this is not supported at all, correct? I mean you always point at a version. So you would suggest that 'current' should be 'versionless'?

I am suggesting that classes do not need versions at all. Doing reasoning on a per class and per version basis would be bad practice, no? One would expect that the whole RDF/OWL representation would be used for reasoning. I think class URIs are only used as identifiers. This also avoids the problem of ensuring correct older versions for deprecated classes.

>
> How I understood Erlangen to work is that it just makes the versionless URI redirect to the current. So I thought the idea would be that 'current' resolves to the present official (whatever the present official means). If a class has been deprecated then I guess it would have to revert to the last official in which it had existed? 

Posted by Francesco Beretta on 17/1/2020

Dear all,

This very interesting conversation was up to now focusing on CRMbase. But what about the extensions family ? Often pointing from one extension to antoher ?

One major point for having machine actionable, consistent ontologies is to have a mechanism to point to the versions of each module (and base) to which a certain module version refers. This, as you know, to provide consistency.

One of the reasons for developing OntoME was to provide a way of easily integrating different modules and extensions. We added recently the possibility of having a rdf-owl export of a namespace and more will follow soon, I hope, to export profiles in OWL and probably soon SHACL.

The general vision for OntoME is to go from beta to MVP in summer, and at the same time go opensource so that the community can help improve the platform. And also integrate it, if desirable and desired, with the tooling at FORTH or any other platform.

I think we should discuss on a vision and rules for providing a robust, machine actionable integration of CRMbase and modules in general (i.e. platform independent). And to develop a commun platform providing versions integration and easy to use tooling for the community.

I raise this issue because I've heard expressing this need in the user community multiple times, and I wonder in which direction we should move, and I know developing such platforms is time-consuming, expensive and and causes headaches...

Posted by George Bruseker on 17/1/2020

    >      > I will point out that on the CRM site, there is also an entire
    >      > architecture wherein each version has its own overall presentation:
    >      > e.g.: http://www.cidoc-crm.org/Version/version-6.2.1
    >
    >     I think this should be maintained but not used as URIs for classes.
    >
    >
    > Why would you argue against using it as the resolving point for
    > individual classes?

    Because it includes versions. These are necessary when working across
    different versions but I do not think versions are needed for classes.

But is your objection to showing the data in the form that you see when you click this link (ie not a large html text and a pointer to the anchor) or to showing a version? 

I like the way that the link above displays an individual class and the functionality it gives to actually use the ontology. I don't know if it breaks good practice though. 

Re displaying a version, don't you always have to display a version? Even if you are displaying current, it is actually just the last official version.
 

    > Currently this is not supported at all, correct? I mean you always point
    > at a version. So you would suggest that 'current' should be 'versionless'?

    I am suggesting that classes do not need versions at all. Doing
    reasoning on a per class and per version basis would be bad practice,
    no? One would expect that the whole RDF/OWL representation would be used
    for reasoning. I think class URIs are only used as identifiers. This
    also avoids the problem of ensuring correct older versions for
    deprecated classes.

I think from a provenance point of view, given that the ontology is changing if one knew the version it could help one interpret the information in the future. I mean that if you made your data under version 4 when the intension of class x was of a certain size and now we widened it, then perhaps it affects how you used the ontology. I imagine this is a pretty sci fi scenario right now and nobody has this use case, but thinking of how things could shape up in a future world, I think it would be relevant. Actually even thinking about conversations in LinkedArt people get confused between versions. Why didn't you use property x? Well I was looking at version x and in that version class y doesn't have property x.

Anyhow if we had a workflow in which the structured data for classes and properties were edited first and from that the different products (doc, rdf, owl etc.) were generated then generating the versioned version would not be more overheard. Think it's a question of order of production of the documents.

 

Current Proposal: 

Posted by Pavlos Fafalios on 27/9/2021

Dear all,

We (at FORTH) have started working on the URIs management issue, i.e. on how to provide resolvable URIs for the different versions of CIDOC-CRM and its compatible models. We would like to hear you opinion about the following: 

(A) HAVING BOTH UNVERSIONED AND VERSIONED ONTOLOGY URIS  

The URI http://www.cidoc-crm.org/cidoc-crm/ will always resolve to the last official version of CIDOC-CRM ('official' according to the definition here). 
A question (also raised by George) is if we want to point to the last 'published' version (which is "a stable version of the standard and can be used for implementation, referencing and any other official purpose").

In parallel, each version will have its own versioned URI, e.g., 
http://cidoc-crm.org/cidoc-crm/7.1.1/ for version 7.1.1, 
http://cidoc-crm.org/cidoc-crm/6.2.9/ for version  6.2.9, etc. etc.

(B) SERVING HTML OR RDF (BASED ON HTTP REQUEST TYPE)

Different content will be served based on the type of the HTTP request. So, if one asks for 
   http://www.cidoc-crm.org/cidoc-crm/ 
will get either the HTML content of the last official version (using text/html content type), 
or the RDFS of the last official version (using rdf/xml content type). 

We will do the same for also the versioned URIs. 

Now, if one requests a specific class or property, e.g.: 
   http://www.cidoc-crm.org/cidoc-crm/E5_Event
will either navigate to the part of HTML content of the last official version which describes this particular class (text/html request),
or (for the case of rdf/xml) will get the entire RDFS of the last official version OR the star-view of that particular class (i.e., subclasses, superclasses, incoming properties, outgoing properties). 
So, a question here is if we want to provide the star-view or the entire RDFS. 
In our opinion, it's much better to provide the star view. Otherwise it makes no sense to request the URI (using rdf/xml) including the class/property name since the result is the same for any class/property name. So, in this case, one should request 
    http://www.cidoc-crm.org/cidoc-crm/ 
for the complete rdfs file, and
    http://www.cidoc-crm.org/cidoc-crm/E5_Event 
for the rdfs star view of E5_Event.

We will do the same for also the versioned URIs. 

(C) BASE URI (NAMESPACE) FOR CLASSES AND PROPERTIES 

Now the controversial issue :) 

What base URI should we use for the classes and properties of each version when serving RDF content? There are three options:

Option B1. Always use an unversioned base URI, i.e., http://www.cidoc-crm.org/cidoc-crm/ for all ontology versions. This means that the class/property URIs are unversioned (e.g., http://www.cidoc-crm.org/cidoc-crm/E5_Event). 
Then, we have to use 'owl:versionInfo' for providing information about the underlying cidoc-crm version (then we also expect that a KB will contain RDF data using only one particular cidoc-crm version).

Option B2. Always use a versioned base URI, e.g., 
http://www.cidoc-crm.org/cidoc-crm/7.1.1/ for the last official ontology version, 
http://www.cidoc-crm.org/cidoc-crm/6.2.1/ for ontology version 6.2.1, 
etc. etc.  
This means that the class/property URIs are versioned (e.g., http://www.cidoc-crm.org/cidoc-crm/7.1.1/E5_Event).
(This does not affect the fact that the unversioned ontology URI is resolvable)

Option B3. Provide both, one RDFS file having unversioned base URI and one RDFS file having versioned base URI (similar to the approach followed by Erlangen). 
In this case, the RDFS file having the unversioned base URI will be updated after each new ontology (official/published) release. 

We (Pavlos and Elias) support B2 for the following reason: 
When one builds a knowledge base (RDF dataset) using cidoc-crm, he/she considers a particular version of the model. There is no mix of model versions (or, at least, there should be no mix of model versions---unless there is a particular use case we are not aware of). Without considering a versioned base URI, it is very difficult (maybe impossible in some cases) to know which version of cidoc-crm was used for creating an RDF dataset. Also, option B3 does not always solve the problem. 

Thanasis has already made a point about not using versioned base URI:
"I am suggesting that classes do not need versions at all. Doing reasoning on a per class and per version basis would be bad practice, no? One would expect that the whole RDF/OWL representation would be used for reasoning. I think class URIs are only used as identifiers. This also avoids the problem of ensuring correct older versions for deprecated classes."

Thanasi, could you please elaborate more on this? It's not clear to us why/how reasoning considering a particular ontology version is affected when versioned URIs are used for the classes and properties.

(D) WHAT WE DO WITH RENAMED AND DEPRECATED CLASSES

1/ Renamed: When resolving a class/property (of a specific version) which has been renamed, we can point the user to the information about the renamed class (since semantics stay the same). For example:
when asking for http://www.cidoc-crm.org/cidoc-crm/E78_Collection,
users will get information about http://www.cidoc-crm.org/cidoc-crm/E78_Curated_Holding 
(once URI resolving has been implemented)

2/ Deprecated: When resolving a class/property (of a specific version) which has been deprecated, we (Pavlos and Elias) suggest not returning anything (404 response code).  In our opinion, this makes sense since the ontology version does not anymore contain the requested class/property. In the case of HTML content type, we can also point the user to the Migration Instructions (page 229). Any comments?

(E) COMPATIBLE MODELS  

The plan is to follow the same approach for the compatible models. Here, it seems that having versioned URIs for the ontology and the extension models solves the problem of how to point to specific versions (as mentioned by Francesco). We just need to include the versioned namespaces of the considered models in the RDFS.   

Looking forward to your comments and feedback! 

Posted by Richard Light on 27/9/2021

On 27/09/2021 11:34, Pavlos Fafalios via Crm-sig wrote:
> Dear all,
>
> We (at FORTH) have started working on the URIs management issue, i.e. on how to provide resolvable URIs for the different versions of CIDOC-CRM and its compatible models. We would like to hear you opinion about the following: 
>
> (A) HAVING BOTH UNVERSIONED AND VERSIONED ONTOLOGY URIS  
>
> The URI http://www.cidoc-crm.org/cidoc-crm/ will always resolve to the last official version of CIDOC-CRM ('official' according to the definition here). 
> A question (also raised by George) is if we want to point to the last 'published' version (which is "a stable version of the standard and can be used for implementation, referencing and any other official purpose").

I'm happy with the idea of 'latest published' being supported. However, we will need to think about what gets returned.

Currently, the latest official version (7.1.1) is later than the latest 'published' version. Am I right in assuming that this will not normally be the case, i.e. that there will usually be a 'published' version which is more recent than the last official version, which has been both published and made to correspond to an official ISO release of the CRM?

If so, what do we do in the current situation?  Does a request for the latest 'published' version return 7.1.1?
>
> In parallel, each version will have its own versioned URI, e.g., 
> http://cidoc-crm.org/cidoc-crm/7.1.1/ for version 7.1.1, 
> http://cidoc-crm.org/cidoc-crm/6.2.9/ for version  6.2.9, etc. etc.
>
>
> (B) SERVING HTML OR RDF (BASED ON HTTP REQUEST TYPE)
>
> Different content will be served based on the type of the HTTP request. So, if one asks for 
>    http://www.cidoc-crm.org/cidoc-crm/ 
> will get either the HTML content of the last official version (using text/html content type), 
> or the RDFS of the last official version (using rdf/xml content type). 
>
> We will do the same for also the versioned URIs. 
>
> Now, if one requests a specific class or property, e.g.: 
>    http://www.cidoc-crm.org/cidoc-crm/E5_Event
> will either navigate to the part of HTML content of the last official version which describes this particular class (text/html request),
> or (for the case of rdf/xml) will get the entire RDFS of the last official version OR the star-view of that particular class (i.e., subclasses, superclasses, incoming properties, outgoing properties). 
> So, a question here is if we want to provide the star-view or the entire RDFS. 
> In our opinion, it's much better to provide the star view. Otherwise it makes no sense to request the URI (using rdf/xml) including the class/property name since the result is the same for any class/property name. So, in this case, one should request 
>     http://www.cidoc-crm.org/cidoc-crm/ 
> for the complete rdfs file, and
>     http://www.cidoc-crm.org/cidoc-crm/E5_Event 
> for the rdfs star view of E5_Event.
Yes, I support returning the star-view, for the reasons you give.
>
> We will do the same for also the versioned URIs. 
>
> (C) BASE URI (NAMESPACE) FOR CLASSES AND PROPERTIES 
>
> Now the controversial issue :) 
>
> What base URI should we use for the classes and properties of each version when serving RDF content? There are three options:
>
> Option B1. Always use an unversioned base URI, i.e., http://www.cidoc-crm.org/cidoc-crm/ for all ontology versions. This means that the class/property URIs are unversioned (e.g., http://www.cidoc-crm.org/cidoc-crm/E5_Event). 
> Then, we have to use 'owl:versionInfo' for providing information about the underlying cidoc-crm version (then we also expect that a KB will contain RDF data using only one particular cidoc-crm version).
>
> Option B2. Always use a versioned base URI, e.g., 
> http://www.cidoc-crm.org/cidoc-crm/7.1.1/ for the last official ontology version, 
> http://www.cidoc-crm.org/cidoc-crm/6.2.1/ for ontology version 6.2.1, 
> etc. etc.  
> This means that the class/property URIs are versioned (e.g., http://www.cidoc-crm.org/cidoc-crm/7.1.1/E5_Event).
> (This does not affect the fact that the unversioned ontology URI is resolvable)
>
> Option B3. Provide both, one RDFS file having unversioned base URI and one RDFS file having versioned base URI (similar to the approach followed by Erlangen). 
> In this case, the RDFS file having the unversioned base URI will be updated after each new ontology (official/published) release. 
>
> We (Pavlos and Elias) support B2 for the following reason: 
> When one builds a knowledge base (RDF dataset) using cidoc-crm, he/she considers a particular version of the model. There is no mix of model versions (or, at least, there should be no mix of model versions---unless there is a particular use case we are not aware of). Without considering a versioned base URI, it is very difficult (maybe impossible in some cases) to know which version of cidoc-crm was used for creating an RDF dataset. Also, option B3 does not always solve the problem. 
>
> Thanasis has already made a point about not using versioned base URI:
> "I am suggesting that classes do not need versions at all. Doing reasoning on a per class and per version basis would be bad practice, no? One would expect that the whole RDF/OWL representation would be used for reasoning. I think class URIs are only used as identifiers. This also avoids the problem of ensuring correct older versions for deprecated classes."
>
> Thanasi, could you please elaborate more on this? It's not clear to us why/how reasoning considering a particular ontology version is affected when versioned URIs are used for the classes and properties.

I'm a simple soul (in this area, at least), and don't have an informed view on how versioned URIs would work as regards reasoning.  However, my instinct is to favour unversioned URIs. I'm not sure I would support versioned URIs at all.  Just imagine the situation where 12 institutions pool their CRM-encoded RDF resources to create a combined knowledge base.  Now imagine that between them they use 8 different versions of the CRM, and this is reflected in their URIs. How on earth could anyone perform reasoning on the resulting rat's nest?

Putting it another way, if we are properly maintaining the ontological commitments we make when defining a particular CRM class or property, then its semantics should remain sufficiently stable over time for 'old' assertions to remain valid/meaningful when seen through the lens of the 'current' CRM.

Thanks for working on this.

Posted by Pavlos Fafalios on 27/9/2021

Dear Richard,

Thank you for the feedback and comments. Answers inline below: 

    On 27/09/2021 11:34, Pavlos Fafalios via Crm-sig wrote:
>     Dear all,
>
>     We (at FORTH) have started working on the URIs management issue, i.e. on how to provide resolvable URIs for the different versions of CIDOC-CRM and its compatible models. We would like to hear you opinion about the following: 
>
>     (A) HAVING BOTH UNVERSIONED AND VERSIONED ONTOLOGY URIS  
>
>     The URI http://www.cidoc-crm.org/cidoc-crm/ will always resolve to the last official version of CIDOC-CRM ('official' according to the definition here). 
>     A question (also raised by George) is if we want to point to the last 'published' version (which is "a stable version of the standard and can be used for implementation, referencing and any other official purpose").

    I'm happy with the idea of 'latest published' being supported. However, we will need to think about what gets returned.

    Currently, the latest official version (7.1.1) is later than the latest 'published' version. Am I right in assuming that this will not normally be the case, i.e. that there will usually be a 'published' version which is more recent than the last official version, which has been both published and made to correspond to an official ISO release of the CRM?

    If so, what do we do in the current situation?  Does a request for the latest 'published' version return 7.1.1?

We consider the 'official' version as also being 'published'. 
So, let's suppose that there is a next cidoc-crm version 7.3.1 which is marked as 'published', and another one 7.4 marked as 'draft'. 
If SIG decides that the URI http://www.cidoc-crm.org/cidoc-crm/ should point to the last 'official' (ISO) version, then it will point to 7.1.1
If SIG decides that the URI http://www.cidoc-crm.org/cidoc-crm/  should point to the last 'published' version, then it will point to 7.3.1. 

As it is now (last official or published version is 7.1.1), the URI will point to 7.1.1. 

>
>     In parallel, each version will have its own versioned URI, e.g., 
>     http://cidoc-crm.org/cidoc-crm/7.1.1/ for version 7.1.1, 
>     http://cidoc-crm.org/cidoc-crm/6.2.9/ for version  6.2.9, etc. etc.
>
>
>     (B) SERVING HTML OR RDF (BASED ON HTTP REQUEST TYPE)
>
>     Different content will be served based on the type of the HTTP request. So, if one asks for 
>        http://www.cidoc-crm.org/cidoc-crm/ 
>     will get either the HTML content of the last official version (using text/html content type), 
>     or the RDFS of the last official version (using rdf/xml content type). 
>
>     We will do the same for also the versioned URIs. 
>
>     Now, if one requests a specific class or property, e.g.: 
>        http://www.cidoc-crm.org/cidoc-crm/E5_Event
>     will either navigate to the part of HTML content of the last official version which describes this particular class (text/html request),
>     or (for the case of rdf/xml) will get the entire RDFS of the last official version OR the star-view of that particular class (i.e., subclasses, superclasses, incoming properties, outgoing properties). 
>     So, a question here is if we want to provide the star-view or the entire RDFS. 
>     In our opinion, it's much better to provide the star view. Otherwise it makes no sense to request the URI (using rdf/xml) including the class/property name since the result is the same for any class/property name. So, in this case, one should request 
>         http://www.cidoc-crm.org/cidoc-crm/ 
>     for the complete rdfs file, and
>         http://www.cidoc-crm.org/cidoc-crm/E5_Event 
>     for the rdfs star view of E5_Event.
    Yes, I support returning the star-view, for the reasons you give.
>
>     We will do the same for also the versioned URIs. 
>
>     (C) BASE URI (NAMESPACE) FOR CLASSES AND PROPERTIES 
>
>     Now the controversial issue :) 
>
>     What base URI should we use for the classes and properties of each version when serving RDF content? There are three options:
>
>     Option B1. Always use an unversioned base URI, i.e., http://www.cidoc-crm.org/cidoc-crm/ for all ontology versions. This means that the class/property URIs are unversioned (e.g., http://www.cidoc-crm.org/cidoc-crm/E5_Event). 
>     Then, we have to use 'owl:versionInfo' for providing information about the underlying cidoc-crm version (then we also expect that a KB will contain RDF data using only one particular cidoc-crm version).
>
>     Option B2. Always use a versioned base URI, e.g., 
>     http://www.cidoc-crm.org/cidoc-crm/7.1.1/ for the last official ontology version, 
>     http://www.cidoc-crm.org/cidoc-crm/6.2.1/ for ontology version 6.2.1, 
>     etc. etc.  
>     This means that the class/property URIs are versioned (e.g., http://www.cidoc-crm.org/cidoc-crm/7.1.1/E5_Event).
>     (This does not affect the fact that the unversioned ontology URI is resolvable)
>
>     Option B3. Provide both, one RDFS file having unversioned base URI and one RDFS file having versioned base URI (similar to the approach followed by Erlangen). 
>     In this case, the RDFS file having the unversioned base URI will be updated after each new ontology (official/published) release. 
>
>     We (Pavlos and Elias) support B2 for the following reason: 
>     When one builds a knowledge base (RDF dataset) using cidoc-crm, he/she considers a particular version of the model. There is no mix of model versions (or, at least, there should be no mix of model versions---unless there is a particular use case we are not aware of). Without considering a versioned base URI, it is very difficult (maybe impossible in some cases) to know which version of cidoc-crm was used for creating an RDF dataset. Also, option B3 does not always solve the problem. 
>
>     Thanasis has already made a point about not using versioned base URI:
>     "I am suggesting that classes do not need versions at all. Doing reasoning on a per class and per version basis would be bad practice, no? One would expect that the whole RDF/OWL representation would be used for reasoning. I think class URIs are only used as identifiers. This also avoids the problem of ensuring correct older versions for deprecated classes."
>
>     Thanasi, could you please elaborate more on this? It's not clear to us why/how reasoning considering a particular ontology version is affected when versioned URIs are used for the classes and properties.

    I'm a simple soul (in this area, at least), and don't have an informed view on how versioned URIs would work as regards reasoning.  However, my instinct is to favour unversioned URIs. I'm not sure I would support versioned URIs at all.  Just imagine the situation where 12 institutions pool their CRM-encoded RDF resources to create a combined knowledge base.  Now imagine that between them they use 8 different versions of the CRM, and this is reflected in their URIs. How on earth could anyone perform reasoning on the resulting rat's nest?

    Putting it another way, if we are properly maintaining the ontological commitments we make when defining a particular CRM class or property, then its semantics should remain sufficiently stable over time for 'old' assertions to remain valid/meaningful when seen through the lens of the 'current' CRM.

What you say makes sense to me and justifies a decision to avoid having versioned base URIs (considering that the semantics of the classes do not change in different versions ---which seems to be the case). 
As you say, direct data integration will be a problem, even for simple tasks like querying (one will need to apply some preprocessing/consolidation before integration). 
I guess that this is also what Thanasis means in issue 460. 
So, now I would also go for Option B1 where we include versioning information in the RDFS.

Any other thoughts? Any reason for selecting Option B3? 

Thanks again for the feedback! 

Posted by Richard on 27/9/2021

On 27/09/2021 13:51, Pavlos Fafalios wrote:
> So, now I would also go for Option B1 where we include versioning information in the RDFS.
>
> Any other thoughts? Any reason for selecting Option B3?

My concern is that if we offer versioned URIs, people choosing to use that option will be storing up problems for any subsequent data pooling exercise.

Posted by Robert Sanderson on 27/9/2021

Reordering to most important first..

    (C) BASE URI (NAMESPACE) FOR CLASSES AND PROPERTIES 
    What base URI should we use for the classes and properties of each version when serving RDF content? There are three options:
    Option B1. Always use an unversioned base URI, i.e., http://www.cidoc-crm.org/cidoc-crm/ for all ontology versions. 

This is the correct answer, according to 2 decades of RDF / Semantic Web  experience.
In particular, FOAF, one of the earliest RDF ontologies and written by one of the original authors for RDF Dan Brickley, warns us in the specification:

    Much of FOAF now is considered stable. Each release of this specification document has an incrementally increased version number, even while the technical namespace ID remains fixed and includes the original value of "0.1". It long ago became impractical to update the namespace URI without causing huge disruption to both producers and consumers of FOAF data. We are left with the digits "0.1" in our URI. This stands as a warning to all those who might embed metadata in their vocabulary identifiers.

(emphasis added). http://xmlns.com/foaf/spec/ 

Please, do NOT put a version number into the URIs. It makes everyone's lives worse, and breaks interoperability between systems. It also makes it much harder for people to upgrade systems and retract/republish data, meaning we will leave folks behind in previous versions. It also makes it harder to aggregate data, as the same property (say, P2) has different URIs in different systems.

I would go so far as to say that, given we already have different RDFS and OWL namespaces, that if there was further fragmentation, it would further harm adoption and most users would simply pick the one that was easiest for them given the already incompatible URIs. 

In looking at similar topics (XML namespaces, API versions) the results are the same -- URIs should be persistent, and versions / dates make them either less persistent or appear out of date, both of which are harmful.

    Thanasis has already made a point about not using versioned base URI:
    "I am suggesting that classes do not need versions at all. Doing reasoning on a per class and per version basis would be bad practice, no? One would expect that the whole RDF/OWL representation would be used for reasoning. I think class URIs are only used as identifiers. This also avoids the problem of ensuring correct older versions for deprecated classes."
    Thanasi, could you please elaborate more on this? It's not clear to us why/how reasoning considering a particular ontology version is affected when versioned URIs are used for the classes and properties.

As above, but Thansis is 100% correct - URIs are used as identifiers. We wouldn't change the numbers in the ontology (E22, P2 etc) ... in RDF the URI has the same function.

On Mon, Sep 27, 2021 at 6:41 AM Pavlos Fafalios via Crm-sig <crm-sig@ics.forth.gr> wrote:

    Dear all,

    We (at FORTH) have started working on the URIs management issue, i.e. on how to provide resolvable URIs for the different versions of CIDOC-CRM and its compatible models. We would like to hear you opinion about the following: 

    (A) HAVING BOTH UNVERSIONED AND VERSIONED ONTOLOGY URIS  

    The URI http://www.cidoc-crm.org/cidoc-crm/ will always resolve to the last official version of CIDOC-CRM ('official' according to the definition here). 
    A question (also raised by George) is if we want to point to the last 'published' version (which is "a stable version of the standard and can be used for implementation, referencing and any other official purpose").

    In parallel, each version will have its own versioned URI, e.g., 
    http://cidoc-crm.org/cidoc-crm/7.1.1/ for version 7.1.1, 
    http://cidoc-crm.org/cidoc-crm/6.2.9/ for version  6.2.9, etc. etc.

Yes. Best practice would be that the documentation for each version has a separate URI, and a common URI can be used to always refer to the latest version.
See: https://www.w3.org/2005/05/tr-versions 

This is less important than (C) (people are better at concluding identity than machines!) but still important !

 

    (B) SERVING HTML OR RDF (BASED ON HTTP REQUEST TYPE)

    Different content will be served based on the type of the HTTP request. So, if one asks for 
       http://www.cidoc-crm.org/cidoc-crm/ 
    will get either the HTML content of the last official version (using text/html content type), 
    or the RDFS of the last official version (using rdf/xml content type).  We will do the same for also the versioned URIs. 

Excellent.

 

    Now, if one requests a specific class or property, e.g.: 
       http://www.cidoc-crm.org/cidoc-crm/E5_Event
    will either navigate to the part of HTML content of the last official version which describes this particular class (text/html request),
    or (for the case of rdf/xml) will get the entire RDFS of the last official version OR the star-view of that particular class (i.e., subclasses, superclasses, incoming properties, outgoing properties). 

Star view, or just the term itself. You can always get the entire RDFS by going to the namespace.

 

    (D) WHAT WE DO WITH RENAMED AND DEPRECATED CLASSES

    1/ Renamed: When resolving a class/property (of a specific version) which has been renamed, we can point the user to the information about the renamed class (since semantics stay the same). For example:
    when asking for http://www.cidoc-crm.org/cidoc-crm/E78_Collection,
    users will get information about http://www.cidoc-crm.org/cidoc-crm/E78_Curated_Holding 
    (once URI resolving has been implemented)

Yes, and ... via an HTTP redirect to the new name for the class/property please.

 

    2/ Deprecated: When resolving a class/property (of a specific version) which has been deprecated, we (Pavlos and Elias) suggest not returning anything (404 response code).  In our opinion, this makes sense since the ontology version does not anymore contain the requested class/property. In the case of HTML content type, we can also point the user to the Migration Instructions (page 229). Any comments?

Yes, agreed.

 

    (E) COMPATIBLE MODELS  

    The plan is to follow the same approach for the compatible models. Here, it seems that having versioned URIs for the ontology and the extension models solves the problem of how to point to specific versions (as mentioned by Francesco). We just need to include the versioned namespaces of the considered models in the RDFS.

Yes, agreed.

Posted by Ethan Gruber on 27/9/2021

I agree with Rob.

In the case of Part D, I suggest the HTTP 303 See Other for a replacement/renaming. I agree there shouldn't be a 404 for a deprecation, but there should be a 3xx code for redirecting to the Migration Instructions. 301? Not sure.

Posted by Pavlos Fafalios on 27/9/2021

Dear Robert,

Thank you for your reply (and the interesting pointer to the FOAF's experience with URIs :)

I fully agree. It seems that Option B1 ("always use an unversioned base URI in all RDFS versions") is the better way to go.

Let's see if there are other opinions on this. 

Posted by Pavlos on 27/9/2021

Thank you Ethan. 

Redirecting to Migration Instructions for a deprecated class seems reasonable when the request is of type HTML. However, I think this is impossible when the request type is RDF/XML; we either point to 404 or (?) redirect to some other class, e.g., the one mentioned in the migration instructions (e.g., using 303). Not sure if the latter is a good option (it also does not seem straightforward to implement in some cases, e.g. for P88, P115 and others).

Posted by Ethan Gruber on 27/9/2021

What about 410 Gone:

"Indicates that the resource requested is no longer available and will not be available again. This should be used when a resource has been intentionally removed and the resource should be purged. Upon receiving a 410 status code, the client should not request the resource in the future. Clients such as search engines should remove the resource from their indices.[42] Most use cases do not require clients and search engines to purge the resource, and a "404 Not Found" may be used instead."

So if you request a deprecated property/class with content negotiation for application/rdf+xml, the response you get tells you it no longer exists. I think the semantics of 410 are more clear than 404.

There are probably multiple scenarios for redirections.

Deprecated and replaced with a single new URI: 303 See Other
Deprecated and replaced with more than one URI: 300 Multiple Choices (in this case, you can use the Link HTTP header to provide multiple links, but the user will have to make a choice).
Deprecated and not replaced at all: 404 or 410.

We have similar scenarios in dealing with new editions of Roman coin typologies. A new volume might have decided that one typology from an old volume is now actually two typologies. So we implemented 300 Multiple Choices when people request the URI from the old volume. Otherwise, in 1:1 relations between the old and new volume, it's a 303.

Posted by Detlev Balzer on 27/9/2021

Dear all,

good to see that the idea of publishing some of the CIDOC CRM as a Linked Data resource is finally gaining traction.

Concerning versioning, I'm also in favour of a single, un-versioned URI namespace, as long as there is a way of knowing which version is currently being served (e.g. by including statements from the W3C VoID vocabulary). Nevertheless, having previous versions accessible via additional, versioned namespaces can be useful for tracing conflicts that may arise from non-backward-compatible changes. 

Regarding the question of "star view" vs. complete schema, I'd suggest to follow the common practice of serving just the direct statements where the class or property URI occurs as the subject. This is basically what you get from SPARQL using a DESCRIBE <uri> query. In an HTML view, you probably want to see more context such as inferred statements. Pure RDF however, is intended for machine processing and if you make all of the RDFS availabe via SPARQL (in addition to resolving URIs directly), then anyone is free to construct their own "star view" as required.

Concerning HTTP 3xx redirections, I would not take this too far. It may make some sense for HTML pages, but less so on the level of RDF processing.

Just a few thoughts from the depths of the engine room ...

Posted by Gordon Dunshire on 28/9/2021

All

I do not think it is good practice to limit access to deprecated resources by intercepting requests. A general coded response may trigger an exception in a consuming application, but no appropriate action can be taken until the specific reason is known. In the case of deprecation, appropriate action might include replacing the deprecated resource in the local application, removing the deprecated resource, or leaving it untouched (in which case the application remains out of synch with the CRM - but that is their choice).

I think it is better to return deprecated resources on request, but clearly mark them as deprecated. A consuming application can set up an exception when the mark is encountered (or not, if synchronization is not an issue).

I don't think there is any commonly-agreed method of marking the status of resources. For what it's worth, the RDA Registry uses a local status property and coded value (see http://www.rdaregistry.info/rgAbout/updates/deprecation.html).

Redirection to a 'better' or replacement non-deprecated resource can be handled by publishing a mapping from the deprecated resource. Again, this is more flexible for consuming applications, and is also easier to maintain and manage. Human redirection can be handled by a comment attached to the resource.