Issue 286: Why #TEI P5 as format-of-record for #cidocCRM Definition document

Starting Date: 
2015-05-15
Working Group: 
4
Status: 
Proposed
Background: 

Posted by Jim Salmons  on 15/5/2015

#cidocCRM SIG Members,

I have proposed that the #cidocCRM SIG adopt TEI P5 (or something equally capable/expressive) as the 'format of record' for the official #cidocCRM Definition document that is currently supplied in MS-Word and PDF formats.

In summarizing my dense posting of initial self-introduction, Dominic Oldman (a valued member of my #cidocCRM/#TEI Personal Learning Network) parenthetically asked "Why TEI?" to which this is a reply.

First, IANAMIP -- I am not a museum informatics professional -- so my opinions and rationale are subject to first, _exposure_ (what I've heard/read about which is limited), and second, to what would be useful to me in terms of my anticipated FactMiners Fact Cloud design and development.

A quick run-down of the problem and proposed rationale for a solution is as follows:

PROBLEM: The proprietary formats of MS-Word and PDF are an issue for access and maintenance of the #cidocCRM Definition document itself and these existing formats provide no leverage in maintaining the official cidoc-crm.org website. In addition to these issues, neither of the existing official Definition document formats is directly machine readable which is desirable for developer tool and resource support.

Why TEI P5 as the "format of record" for the #cidocCRM Definition document?

1. There is an existing, active harmonization and cross-pollinating activity between the #cidocCRM and #TEI communities as evidenced by the work of two of the five members of my #cidocCRM/#TEI Personal Learning Network (Oyvind Eide and C.E.S. Ore) among others. C.f. Oyvind and Christian's papers detailing this relationship: http://www.edd.uio.no/artiklar/tekstkoding/poster_156_eide.html and http://llc.oxfordjournals.org/content/24/2/161.abstract

2. For legacy support purposes, a clean #cidocCRM Definition document in TEI P5 format could be used to quickly and painlessly generate the prior MS-Word and PDF formats via OxGarage: http://www.tei-c.org/oxgarage/# (Note, too, that given a TEI P5 official format-of-record for the #cidocCRM Definition document, OxGarage could provide convenient generation of formats beyond the two legacy formats.)

3. A TEI P5 #cidocCRM Definition document could serve as the all-important first step in a new cidoc-crm.org website maintenance pipeline. Indeed, it may be that the new website will simply be a "view and controller" on the #TEI-encoded Definition document -- the "model" in other words of a model-view-controller style website design.

4. As #cidocCRM applied research delves into the use of the #cidocCRM as an executable metamodel as well as it being the basis for Domain-Specific Language (DSL) extensions in support of fine-grained document structure and content depiction modeling, the TEI P5 Header will be an appropriate and convenient place to open up what Michael Witmore has called "Text: A Massively Addressable Object" (http://winedarksea.org/?p=926).

For example and getting to the "scratching my itch" aspect of why I recommend #TEI, we'll be investigating the use of the TEI Header to "freeze-dry" FactMiners' Fact Cloud 'facts' within the page-segment files generated by our version of Softalk magazine as a "massively addressable object." As a Content Partner of the Internet Archive, we have the interest and cooperation of the Archive.org folks in exploring this approach to a human- and machine-readable version of the magazine offering unparalleled richness in document structure and content depiction modeling. A "deeper dive" into this aspect of my applied research interests is found here: https://goo.gl/3Vb0lO.

I believe these points provide reasonable justification for the recommendation of adopting TEI P5 as the "format of record" for the #cidocCRM Definition document. This said, if there are better or preferred format, I am eager and willing to consider alternatives providing advantages similar to those I have outlined. 

 

Posted by Richard Light on 16/5/2015

As a point of information, it's interesting to note that this is exactly the way in which SPECTRUM is maintained and published.  There is a single TEI source, within which we have conventions for encoding things like Information Units.  The published PDF is generated directly from this source, and so is the XML Schema; a good example of Literate Programming at work.

Left to our own devices, Alex Dawson and I would expect to enhance this TEI source with a structured breakdown of each Procedure and its associated information sources and sinks. This could include the mapping of SPECTRUM to the CRM which is currently being worked on.  We could then additionally generate e.g. a BPMN expression of SPECTRUM procedures from this source.


Posted by Jim Salmon on 16/5/2015

Richard,

Nice to meet and hear from you. I can tell from your insightful comments that we will have much in common from a Kindred Spirit perspective, Alex too it seems. A few thoughts in reply…

Through my work at FactMiners.org I am working two interrelated agendas:

·       #cidocCRMgraph – being a “full graph” version of the #cidocCRM to make it accessible for computational analytics, including its use in metamodel-driven software designs, and

·       #cidocCRMdev – being the use of the #cidocCRM in executable models, e.g. #cidocCRM-compliant microservice workflows (as within the FactMiners’ platform where LAM-based social games will be the “gentle on-ramp” for Citizen Scholarship, etc.)

To pursue #cidocCRMgraph, I wanted to get the #cidocCRM into a Neo4j graph database so I could express it within the Reference Models partition of a metamodel subgraph of the FactMiners Fact Cloud of Softalk magazine. To this end, I developed a one-off Python script to parse the 5.x edition of the Definition document as here on GitHub: https://goo.gl/5eJ2Kr (using Py2Neo to get the Definition into Neo4j).

Just getting the Entities as nodes and (lightweight) Properties as Relationships into Neo4j based on Regex parsing of a text-only copy of the Definition was a burden. I need the convenient flexibility to, for example, “node-ify” Properties and other adjustments (de-hypergraphing – edge-to-edge linkage, etc.) to make the #cidocCRM tractable for my purposes. With the release of 6.0 and a growing wish list of things I want to explore, I felt there had to be a better solution. This was the genesis of my recommendation for the #TEI P5 as the “format of record” for the Definition as it would make new releases accessible to researchers and developers in a vendor- and technology-neutral way.

To that end, I expanded my #cidoCRM Personal Learning Network to be my #cidocCRM/#TEI #PLN and began a learning-by-doing exercise which is this GitHub project: https://goo.gl/bRSviu. Basically, I took the 6.0 MS-Word document and ran it through OxGarage and renewed my non-commercial license to the AWESOME Oxygen XML IDE (which is a GREAT tool for TEI authoring). I haven’t gotten much further than a preliminary “heads up” and quick advice from my PLN mentors, and my working through the also awesome: http://www.teibyexample.org/. So this is activity is very much in its infancy. What I know best is what I want, and not how to get there. J But this is the Joy of Life-long Learning (especially if you are animated by the experience of sitting down with the Reaper during a cancer battle and get a chance to walk away for some Bonus Rounds!)

But, Richard, what is most encouraging about your insightful comments – beside it being obvious that you will have ALL KINDS of good insight and experience to contribute to this potential “#cidocCRM Definition in #TEI P5” initiative – …the most encouraging thing is your mention of BPMN workflows. For those without a tech acronym cheatsheet at hand, that is the OMG’s Business Process Model and Notation (http://www.bpmn.org/) methodology.

While there are some good things to like in the BPMN standard, there are many points where my philosophical and software design interests differ from this well-meaning group. More than 10 years before the OMG decided to even think about business process modeling, I was an Executive Consultant in the Object Technology Practice of IBM Global Services. During that time in the early through mid-1990s, I was a lead in a “skunkworks” doing applied research inspired by David Gelernter’s brilliant book, ‘Mirror Worlds’ (https://goo.gl/lnhnuW). At the time, our practice was  100% hardcore Smalltalk designer/developers. We created a pair of “Executable Business Model” frameworks based on explicit adherence to an actor-role metamodel. (The closest thing philosophically to what we were working on is http://www.SOCIAM.org, but in many ways they are “barking up the wrong tree” from my experience/beliefs.)

The details of #cidocCRMdev go well beyond the scope of this discussion, but this post based on a Schema.org #cidocCRM-related modeling conversation will shed more light on this topic: http://goo.gl/x1DSAB. I have an additional substantive comment to contribute to this conversation which I hope to make within a couple days. There is a link in my post to the actual Schema.org conversation over at GitHub. You’ll find additional thoughts on the Temporal Entities branch of the #cidocCRM Entity hierarchy in this post: http://goo.gl/bA1rzk.

As is too often the case due to my isolation from others to talk to about these things, I have said far too much for a mailing list post. But I don’t want to miss an opportunity to connect with Kindred Spirits simply because I hid on the sidelines.

I will close by saying to Richard – in hope that we’ll have an opportunity to collaborate within the CRM SIG community – is that what I am working on is an “agent-based software exoskeleton for the Empowered Individual” by exploring/evolving this #cognitivecomputing agenda via my #cidocCRMgraph and #cidocCRMdev initiatives in the museum informatics (Digital Humanities) domain. #cidocCRM-compliant microservice workflows are a big part of my agenda. I am combining this role-actor “#cidocCRM as executable metamodel” design with a #SmartData design pattern of a metamodel subgraph.

If I had to explain my motivation for working on this to a prospective funder who is inclined to discount the direct contribution of “Ivory Tower” Digital Humanities research to the betterment of our daily lives, I would ask them to read my Medium article (https://goo.gl/3Vb0lO) that ends with this sentence:

“We grind the Lens to see the Future by first turning it on the Past.”


 Posted by Martin on 16/5/2015

Dear All,

We'll discuss this issue at the next meeting. Thank you Jim for the suggestion. In the meanwhile, volunteers who understand the implications are welcome to comment.


Posted by Sebastian Rahtz     on 17/5/2015

It seems to me self-evident that the CRM should be managed as an information resource (ie as a structured XML file) instead of a word-processor presentational document. And if its going to be in XML, I'd agree that TEI is the obvious format to choose since it is already used for literate programming, and has a lot of the necessary elements and semantic constructs, and tools, all in space. It is not insignificant that the TEI is aligned with the CRM in that its interested in digital cultural heritage, and that the TEI is consciously interested in the relationship between its world and the CRMs.

However,  I would like to hear more from Martin or whoever on how the CRM is authored at the moment. What is the master copy,  how is it managed, what are the tools used and/or needed by the editors? is the appearance of being a large Word document actually right?

As the author / maintainer of much of the TEI processing software (including OxGarage which Jim mentioned) and the chief designer of its current meta language, I can fairly lay claim to being someone who understands the implications :-}
Nothing would make me happier [1] than helping transform the CRM master into a beautiful TEI XML file, if there is sufficient interest.  [1] thats not actually true. I have a long list.


posted by Dominic Oldman on 17/5/2015

I am glad I asked. :-)

I agree with Sebastian.

What I meant was - don't you consider the requirements first before implementing the solution - a key principle of the CIDOC CRM an any other project that might utilise a technology.  In other words the usual approach is to understand the process and requirements first (and that is not just the requirements and processes of CRM SIG members) and then choose the appropriate implementation. Not choose the technical implementation first and then think about the requirements and user needs afterwards.

Requirement 1. We need to structure the document to make it easier to manage and maintain?
Requirement 2. We need interchange with other TEI documents?
Requirement 3. We need accessible interfaces for non-technologists for find CRM entities and relationships and also useful patterns of entities and relationships ?
Requirement 4....etc, etc


Posted by Martin on 17/5/2015

Dear All,

The CIDOC CRM master in a logical sense is an SIS-TELOS knowledge base, product of ICS-FORTH since 1990, which guarantees the formal logical consistency of IsA relations of classes and properties. It provides much better views than Protege.
TELOS can be transformed into RDF/RDFS, except for property-properties. Otherwise, there were never a consistent version, not talking of a bunch of extensions.

From this, the Word document is manually maintained and the Cross-Reference Manual and RDFS is semiautomatically generated. This is not satisfactory, because we have to create "in sync" the Word document, an LoD version, the RDFS

versions, OWL versions, Translations and their versions in several languages, amendment lists and preferably pointers to issues and e-mail discussions. All this times the extensions, and now even with an FOL notation. Tomorrow RDFS may be abandoned for another model. We may like to have 3 RDFS syntax forms.

I do not see any means to maintain all this in XML only. We have tried in the past TMX for the translations, but it turned out to be complex and tools were not tuned to that. Multilingual manual writing tools could do part of the job, but not the logics.

Appears we are first on the world to take formal ontologies seriously, otherwise, all this should exist on the market. The complexity of creating user documents seems not to be common knowledge.

Currently, we are about finalizing the alpha version of the new CRM Website. It became a complex piece of S/W, with a database in the center managing the logical structure of the CRM master document and all by-products as tables and fields. We hope that it will help collaborative work on these products: Think of updating a Chinese version of a scope note etc.

A TEI encoded version could be an intermediate product. Need to undestand what we gain. Could be interesting for interfacing with other apps. Word is very handy to do edit changes during discussions on the screen. Impossible without a WYSIWYG editor.

We'll present details this week.


Posted by  Rahtz Sebastian  on 17/5/2015

I think the #1 requirement is that the CRM be a machine-readable spec so that one can derive other forms of output than a human-readable document, and do schema-type checks on the document.  My cursory browse of the source suggests that only simple tagging is used (ie headings) in the spec - but I may well simply be not-understanding how the editors work!

For example, when I see something like this in the Word source[1]: 

<w:t xml:space="preserve"> assumptions about the scale of the associated phenomena. In particular all events are seen as synthetic processes consisting of coherent phenomena. Therefore E4 Period is a superclass of E5 Event. For example, a modern clinical E67 Birth can be seen as both an atomic E5 Event and as an E4 Period that consists of multiple activities performed by multiple instances of E39 Actor. </w:t> my immediate worry is that there doesn’t _appear_ to be any validation of the names of entities. When I see “E67 Birth” here, I wonder what checks are in place to stop the editor typing “E67 Brith” :-}

[1] the distributed version is .doc, so I converted it to .docx in order to read it.


 Posted by Sebastian Rahtz on 17/5/2015


> On 17 May 2015, at 13:37, martin <martin@ics.forth.gr> wrote:

> The CIDOC CRM master in a logical sense is an SIS-TELOS knowledge base, product of
> ICS-FORTH since 1990, which guarantees the formal logical consistency of IsA relations of classes and properties. It provides much better views than Protege.

how does one get access to the data behind this SIS-TELOS system? is it not a slight concern that the master is held in what is effectively a proprietary system?

>
> I do not see any means to maintain all this in XML only.

I don’t see why it would be a problem? i mean, what aspects of the master cannot be represented in XML? I am not  saying it would be easy or convenient to work on necessarily, but it must be possible

> We have tried in the past TMX for the translations, but it turned out to be complex and tools were not tuned to that. Multilingual manual writing tools could do part of the job, but not the logics.

translations  are a pain, I agree. for the TEI, we have managed to create a system for maintaining multi-lingual versions of the reference materials, but managing the human-readable prose is hard

> ….
> A TEI encoded version could be an intermediate product. Need to undestand what we gain. Could be interesting for interfacing with other apps. Word is very handy to do edit changes during discussions on the screen. Impossible without a WYSIWYG editor.

forgive me, but this looks like a tail wagging a dog :-}


Posted by Martin on 17/5/2015

Dear Sebastian,

On 17/5/2015 4:23 μμ, Sebastian Rahtz wrote:
>> On 17 May 2015, at 13:37, martin <martin@ics.forth.gr> wrote:
>> The CIDOC CRM master in a logical sense is an SIS-TELOS knowledge base, product of
>> ICS-FORTH since 1990, which guarantees the formal logical consistency of IsA relations of classes and properties. It provides much better views than Protege.
> how does one get access to the data behind this SIS-TELOS system? is it not a slight concern that the master is held
> in what is effectively a proprietary system?
Please do understand me right. We do all these services for free. None of this is because we would defend jobs or technology.  Everything we do is a big concern, not a slight one;-) . Everything better costs again. We need funding for each migration:-( .

The SIS-TELOS we have distributed to the partners in the past. You can have it for free. The data are exported in a human and machine readable syntax (TELOS) like RDFS. It's just a format of the nineties. We have the transformer to RDFS. 
>
>> I do not see any means to maintain all this in XML only.
> I don’t see why it would be a problem? i mean, what aspects of the master cannot be represented in XML?
> I am not  saying it would be easy or convenient to work on necessarily, but it must be possible
The process, operations are complex and need control. XML is not a database, it does not have functions. Of course you can put in in XML, and then use an XML database on top to do the operations. It does not help you running IsA constraints on classes and properties, and trigger sequences of changes. That needs code.
>
>> We have tried in the past TMX for the translations, but it turned out to be complex and tools were not tuned to that. Multilingual manual writing tools could do part of the job, but not the logics.
> translations  are a pain, I agree. for the TEI, we have managed to create a system for maintaining multi-lingual versions of the reference materials, but
> managing the human-readable prose is hard
Our problem is the dependency management. The meaning of CRM constructs is highly cross-correlated. The S/W we have designed, seems to solve the problem. We'll see. When we showed the design, other CRM-SIG members did not even look at the logical issues .
>
>> ….
>> A TEI encoded version could be an intermediate product. Need to undestand what we gain. Could be interesting for interfacing with other apps. Word is very handy to do edit changes during discussions on the screen. Impossible without a WYSIWYG editor.
> forgive me, but this looks like a tail wagging a dog :-}
Well, forgive me also, what is the dog: The application S/W or the data structure ? If the search and update operations are simple, a data structure may be the dog.
Otherwise....;-)

If anybody is seriously interested, we can send the complete requirements analysis. Then we can make together checklists what the benefits are.

The CRM maintenance costs us some 25.000 Euros per year. Any serious simplification and increase of efficiency is money back in our pocket. We get 7% of our budget from  the government.

We need to be able to distribute the maintenance process and keep automatically control of the consistency.

If we may go out of this business some day, nobody will be able to continue the process as it stands now.


Posted by Sebastian Rahtz on 17/5/2015

Sorry, Martin, I was not meaning to criticize at all. I hope it didn’t come over like that.

My reflections on this come from an interest in how software and similar communities work; I have spent the last few decades hanging out in the complex mini-worlds of TeX and of TEI, and its fascinating to see the similarities between them and the CRM.

...
> Please do understand me right. We do all these services for free. None of this is because we would defend
> jobs or technology.  Everything we do is a big concern, not a slight one;-) . Everything better costs again. We need funding for each migration:-( .
>
absolutely understood. I was not using proprietary in a  critical sense. my concern is that the future of the CRM is compromised if there are any technology barriers to people working on it, and (for example) talking over the maintenance if FORTH lose interest.,

...
> The process, operations are complex and need control. XML is not a database, it does not have functions. Of course you can put in in XML, and then use an XML database on top to do the operations.
> It does not help you running IsA constraints on classes and properties, and trigger sequences of changes.
> That needs code.

it does. but I wouldn’t distinguish so much between the XML representation of the source and the XML representation of the functions. not the implementing the functions doesnt need code itself, of course.

I am thinking of the way an XML representation of a schema can embed all sorts of rules and constraints in the form of Schematron. in another world, you’ll call that code, I call it part of the spec.


>> forgive me, but this looks like a tail wagging a dog :-}
> Well, forgive me also, what is the dog: The application S/W or the data structure  ?
> If the search and update operations are simple, a data structure may be the dog.
> Otherwise....;-)

true :-}

> If anybody is seriously interested, we can send the complete requirements analysis.
> Then we can make together checklists what the benefits are.
can you expand? the requirements analysis for what?

> …..
> If we may go out of this business some day, nobody will be able to continue the
> process as it stands now.
>
precisely, that’s what interests me. how you get to a setup which any reasonably IT-literate person around the world can replicate, and take over the CRM in the (highly unlikely!) event that a Grexit causes FORTH to go bankrupt.


Posted by Dominic Oldman

It seems the right thing to do to spend some time on this to reduce FORTH's costs here.

That should be a core requirement.


Posted by  Karl Grossner on 17/5/2015

----- Original Message (M. Doerr) -----

> Appears we are first on the world to take formal ontologies seriously,
> otherwise, all this should exist on the market. The complexity of creating
> user documents seems not to be common knowledge.

I would say rather that CIDOC-CRM is the first formal ontology to be taken seriously by ordinary mortals outside of bio-medical.

It's exciting to hear CIDOC-CRM referred to as a formal ontology, and that its canonical representation is in a format for which there is s/w to check its consistency. My work with it has been with RDFS versions and the handy Word doc. RDFS is today's lingua franca, but re-making an elaborate s/w system and related processes (yes, previously unaware of the extent) is non-trivial as they say. ICOM (w/Mellon,Getty?) should fund that and ensure it lives and evolves in perpetuity -- or whatever passes for perpetuity going forward :^)


Posted by Jim Salmons on 17/5/2015

I love this discussion yet appreciate the time pressures many of you are under in preparation for the many meetings and agendas to be addressed in the coming week in Germany.

What I can say is, based on this excellent and on-going discussion, the way forward will be as "seriously fun" as it will be innovative and constructive to the evolution of the #cidocCRM and its community/ecosystem.

I especially am awed by the SME (subject matter expertise, the most appropriate term I can muster from my prior years in corporate consulting) that Sebastian brings to the table. Much of my Wolf Child Blindness can be complemented by his experience and current activities.

I love Sebastian's reference to Literate Programming because it reminds me of my 20+ years in hardcore Smalltalk design/development. The "Golden Years" of Smalltalk -- before C++ failures morphed into face-saving Java Worship -- were years spent in programming Nirvana. Smalltalk is total immersion Literate Programming.

But what is more relevant to this is Smalltalk's "eating its own dogfood" nature. Other than its minimalist bootstrap to get going (and as much of that that could be replaced by Smalltalk once it was "self-sustainable") is that Smalltalk was written and executed "literally" in an image-based simulation-engine (to give it a "Mirror Worlds" connection). And that is the "self-improvement" leverage I think we have here with the bundling of a #TEI "format of record" initiative with the new website and its on-going maintenance initiative.

I propose that it would be extraordinarily helpful to the #cidocCRM itself, and to our community/ecosystem, if we were to "eat our own dogfood" and officially declare the #cidocCRM Definition document as an 'E73 Information Object' and that we encourage a "community skunkworks" to use the #cidocCRM as an executable metamodel with the goal to develop an Open Source #cidocCRM-compliant microservice pipeline that will, when credibly able to, then be used to maintain, extend, support, and promote  cidocCRM use.

As I write this, I see that Karl Grossner has weighed in with helpful relevant comments, and I concur with his comments, especially with regard to "salable" funding rationale.

As I was working to "birth" my #cidocCRM/#TEI Personal Learning Network, one of the hopefully helpful points I made in our email conversations was that there would seem to be great synergy between the #cidocCRM-based agenda of @ResearchSpace and that of SOCIAM, the well-funded and "well-endowed" team of researchers pursuing the "Social Machines" agenda (@project_sociam, http://www.SOCIAM.org). That synergy would not only be in terms of valuable cross-fertilization of the domains of study themselves, but for the #cidocCRM community, we gain "salable" vectors into funding sources. When we connect #cidocCRM as an executable metamodel for microservice systems that are Citizen Science, Citizen History, and Citizen Scholarship "Social
Machines" I honestly think affective altruists' ears will perk up. 

I have to run... we have our first "Brewers, Brewpubs, and Beer Loving Friends of the DPLA" organizing meeting at our "home pub" @LionBridgeBrew in a half-hour. :D 


Posted by Martin on 17/5/2015

Dear Sebastian, All,

Thank you for your interest. Good to learn from your experience!

I think it's a good idea to place the requirements analysis for the new CRM Website, i.e., a complete analysis of data products and processes, on the CRM site so that partners can contribute to the solutions. Anyway, we'll have very soon a functional prototype much richer than what exists now. The S/W will anyway be Open Source. 


Posted by Semastian Rahtz on 17/5/2015

I am with Jim 100% on the dogfood eating, and the literate programming attitude. I would note, however, that the CRM is an abstract ontology which is only relatively recently concerning itself with “implementation”, so let’s not get _too_ carried away with the
CRM executing iself - it’s not a bit of software, after all. As I was working to "birth" my #cidocCRM/#TEI Personal Learning Network, one

> of the hopefully helpful points I made in our email conversations was that
> there would seem to be great synergy between the #cidocCRM-based agenda of
> @ResearchSpace and that of SOCIAM, the well-funded and "well-endowed" team
> of researchers pursuing the "Social Machines” agenda

I would note that the CRM interest in my institution, Oxford, is effectively based around our e-Research Centre, whose head (Dave De Roure) is a died-in-the-wool social machinist. So expect this sort of talk to go down well there :-}


Posted by Martin on 17/5/2015


> On 17 May 2015, at 19:09, martin <martin@ics.forth.gr> wrote:
>
> I think it's a good idea to place the requirements analysis for the new CRM Website,
> i.e., a complete analysis of data products and processes, on the CRM site so that
> partners can contribute to the solutions. Anyway, we'll have very soon a functional
> prototype much richer than what exists now.

That will be very interesting to see, and I am especially keen to hear how the user requirements were created, because that raises the question of who the CRM web site is actually _for_ ….

These are fun topics, and I thank Jim for stirring them up so entertainingly. But don’t let it distract anyone from the meeting this week.


Posted by Martin on 17/5/2015

On 17/5/2015 9:22 μμ, Sebastian Rahtz wrote:
> I would note, however, that the CRM is an abstract ontology which is only relatively
> recently concerning itself with “implementation”, so let’s not get _too_ carried away with the
> CRM executing iself - it’s not a bit of software, after all.
Well, I'd regard that a misunderstanding. The S/W we need is to manage highly cross-correlated editing with logical constraints, not "executing the CRM".

However, the CRM was always meant for implementation. The immediacy or not of RDFS/OWL versions is for me not fundamental to implementation, and yes, the CRM is data structure, not S/W, equal to data dictionary definitions and UML and all the stuff of the past. Indeed, on SIS-TELOS CRM did run as database from its conception. I wrote the core code myself in 1990-1994 .


Posted by Jim Salmons on 18/5/2015

All,

I think a few comments and a link to a diagram can provide some clarifying context.

This is a very specific situation where my prior early-to-mid 1990s experience developing Smalltalk-based “Executable Business Model” (EBM) frameworks based on an actor-role-task metamodel is helpful (and almost certainly unique within this group).

It is not the case that the #cidocCRM is to be executable _itself_. Rather, as a metamodel it is used as a constraint against the design of metamodel-compliant “Social Machine parts” from which INSTANCES of compliant executable models are buildable. The compliant instances are executable, not the metamodel from which the instance is derived.

In other words, the #cidocCRM tells us the _model elements_ from which we can build an executable instance, but it is our _selection_ of, and _recipe_ for putting together, these selected elements which creates an executable instance.

For example we might say, “To create the Sagging Walrus Museum collection management system we take 5 of ‘these’ piece parts, 2 of ‘those’, create new ones called ‘this’ and ‘that’ which descend from ‘some other’ metamodel element.” These new domain-specific extensions guarantee that they will comply to required metamodel element interfaces while internally performing some new unchartered instance-specific activity that is the Sagging Walrus way of doing things. Once we have this collection of model elements and put them together in a certain configuration, that _instance_ is executable and traceably compliant to its metamodel although the metamodel itself is not directly executable. The execution comes from the S/W frameworks we write that are compliant to the metamodel.

I cite, for example, this draft model which is part of the @FactMiners design wherein I have “roughed in” #cidocCRM model elements into a UML-based metamodel that says a lot about how to acceptable put these #cidocCRM elements together: http://goo.gl/1jhh1Q.

The #cidocCRM elements themselves don’t tell us much about their execution. We get that clarity by putting these elements in the context of a recommended actor-role-task executable “harness” which, I might add,  is implied by these elements’ names and descriptions. The actor-role-task “harness” is what gives us the ability to move from ontological/descriptive use of the #cidocCRM model, to using it as a “Social Machine” blueprint for generating #cidocCRM-compliant system instances.

I believe that #cidocCRM microservices frameworks could be an ideal “easy on ramp” for widespread adoption of the #cidocCRM as LAMs large and small wrestle with life in the emerging Linked Open Data world. For some, these frameworks will be used to generate whole-system replacement through system migrations. Others will simply wrap or extend existing systems using these frameworks.

All this said, the “unknown new territory” nature of #cidocCRM microservice frameworks is also why I recommended the “community skunkworks” approach to this exploration so as not to derail or detract from the progress being made on the current new website project at FORTH. Done in parallel, we could get some real synergy as one project informs/reacts to the other through shared experience and communication.

I believe the most useful thing we could do is simply have Martin’s new website team put its requirements doc and any other helpful design documents together with new system code to date in a GitHub or similar repository where we can fork, explore, and feedback to each other.


Posted by Richard Light on 18/5/2015

On 17/05/2015 13:37, martin wrote:
> Dear All,
>
> The CIDOC CRM master in a logical sense is an SIS-TELOS knowledge base, product of
> ICS-FORTH since 1990, which guarantees the formal logical consistency of IsA relations of classes and properties. It provides much better views than Protege.
> TELOS can be transformed into RDF/RDFS, except for property-properties.
> Otherwise, there were never a consistent version, not talking of a bunch of extensions.
>
> From this, the Word document is manually maintained and the Cross-Reference Manual and RDFS is semiautomatically generated.
>
> This is not satisfactory, because we have to create "in sync" the Word document, an LoD version, the RDFS
> versions, OWL versions, Translations and their versions in several languages, amendment lists and
> preferably pointers to issues and e-mail discussions. All this times the extensions, and now even with an FOL notation. Tomorrow RDFS may be abandoned for another model. We may like to have 3 RDFS syntax forms.
>
> I do not see any means to maintain all this in XML only. We have tried in the past TMX for the translations, but it turned out to be complex and tools were not tuned to that. Multilingual manual writing tools could do part of the job, but not the logics.
My experience (from working with SPECTRUM) is that you can use XML as the single source of a range of documents, each of which is produced by an appropriate XSLT transform.  If you need another delivery format, you just write a new transform.

These products can be textual in nature (e.g. HTML pages - which are BTW something which is needed for the CRM; PDF documents) or machine-processible logical documents (e.g. RDFS, OWL, XML Schema).

If your SIS-TELOS system had an import function, data could be output from the XML representation in a format which could be loaded into TELOS.  This would allow TELOS to continue its role of cross-checking the formal logical consistency of the CRM as it develops.
 


Posted by Martin on 18/5/2015

Dear Jim,

Very nice thoughts indeed! Let me put another perspective to that:

In a way, you describe the CRM as a sort of metamodel we have seen in the past for workflow systems. These systems, like many other IT solutions, have an orientation to the future:
Control, constrain, and such make precdictable the future. If this pertains to human activities and machinery controled by humans, it results in an analysis of relevant relationships humans may have to things via their activities. An application would then narrow down the constraints to even more specific subsets of interactions. A particular process instance would then add all the unlimited details of reality we can never completely record.

The CRM however was explicitly developed to describe the past, and not even the museum processes. It was designed to describe possible pasts in terms of relevant relationships we can associate with a sort of weak causality, i.e. how states of affairs have influenced each other in the evolution of time.
The actual past has all the unlimited details of reality we can never completely record. Our interest is not in what we wanted to achieve, but what actually happened. Constraints are not interesting, only global ones we cannot avoid. On the opposite, the deviation from the idle plans is often more enlightening. Intention is more seen as a historical fact than as a deterministic cause.

If we see both views together, the past and the future, we indeed can expect that the metamodel to control the future has a strong overlap with that to describe possible pasts and to explain the flow of influence. Even the human attempts to control the future play a role of influence.

In the last meeting we decided to map and model SPECTRUM under the CRM. That will be the first application of the CRM to model processes intended to be followed. If such applications emerge, it will indeed be exciting to see, with the due scientific rigor, how CRM concepts cover and extend into a intentional process world.

Altogether, what you describe here is building support for the application discourse. We have so far targeted at that with the mailing list and mapping descriptions. It deserves more, as you say. Another, fairly different discourse is the evolution of the CRM and its translations and derivatives.

It continuously questions the CRM concepts themselves. The complexity emerges, because any local change has consequences up and down the IsA hierarchy of properties and classes  in scope notes and examples, possibly in introductions, and along "shortcut" paths and other reasoning constructs related to the behavior of reality, such as space-time models we discuss now. This creates "avalanches" of changes. Most of the time goes into spotting the non-obvious consequences, and then writing the prose. The automatic creation of derivatives is also not trivial due to ever changing syntactic conventions.

This is the current target of the new Website.

Luckily, our syntax-neutral KR format of the CRM definition in MS-Word - I mean the terms "class", "subclass" etc., not the Word binary-, has helped us not to change format with each KR fashion. Many people seeing this text think there is no logical model in the CRM, and then are surprised that it is consistent when they translate it into a KR language. These logical foundations of KR models have hardly changed over the last decades, some change of flavors not withsdtanding. Indeed we hide the TELOS encoding behind, because it is not standard, and encoded KR is not suited for our target audience. (important KR fashions: KL-One, KIF, DAML, DAML-OIL, RDF(S), OWL).

It may also be worthwhile to remind the younger among us, that Knowledge Representation is one of the oldest disciplines of IT, and not an invention of the semantic Web (and implementation of the CRM did not start with RDF/OWL!!), and that there is literature in the IT journals the Google saerching scientist has no idea about .


Posted by Jim Salmons on 18/5/2015

Martin,

Thank you for such an informative post! I will enjoy more deeply reading it for the valuable context and insights that will more fully inform my appreciation for the great work that the CRM SIG has performed to date, and is forging ahead upon… especially the SPECTRUM collaboration which I believe will be very beneficial to both parties.

But I wanted to reply with some enthusiasm based on my initial quick read of your post. And that is to reinforce and note my appreciation for the “dual nature” of the #cidocCRM; that is, its attempt to both model its “objects of study” (the looking to the ‘past’) as well as provide modeling concepts to address that “act of study” (preservation, transfer, ownership, measurement, etc. including associated scholarship linkage/reference, etc., that is the ‘now’ of object/collection management, etc.).

In response to the first Neo4j GraphGist exploring the use of a metamodel subgraph for modeling the Softalk magazine ‘Fact Cloud’ (http://goo.gl/2b6934), a Neo4j community member and now my Kindred Spirit buddy, Laurence Cook (@metacirque), tweeted that my gist reminded him of the #cidocCRM. To which I responded, “What?!” and he replied “..the Conceptual Reference Model for Museums” which I said, “Oh, THAT sounds interesting!” and long story short… museum informatics, digital humanities… who knew!? And here we are today.

The thing that struck me upon my first exposure to the #cidoCRM was that “Déjà vu” feeling of “Oh man, this is not just another ontology, this is a metamodel!” based on my “executable business model” experiences. And the thing that clicked for me was closer inspection of the CRM Entity hierarchy where I saw the two big partitions of Persistent Objects and Temporal Entities. Mixed in there on the Temporal Entity side were things obviously intended to both cover the “description of the past” and “stuff we do when using this model to do work on/with the objects of the past” – that is, there were these ‘process-looking’ Entities that evoked the “looking forward” aspects of the #cidocCRM.

My goal for the @FactMiners Fact Cloud platform requires both dimensions. On the one hand, I need the #cidoCRM’s expressive “looking at the past – documenting the object itself” aspect for fine-grained _document structure_ and _content depiction_ modeling capabilities. But we also want our #SmartData to be _self-descriptive_ in the sense of providing explicit information about “who can do when, when, where, and how” which is the #cidocCRM’s “forward-looking” dimension.

My belief is that “smart programs work best with smart data” which is reflected in this diagram that puts the actor-role-task “process-harness” of my prior note in context: http://goo.gl/OYMESR.

 

What this diagram shows is that the META partition – the “self-descriptive” aspect of my #SmartData graph database design – has three primary sub-partitions: META:Structure, META:Process, and META:ReferenceModels. The #cidocCRM “past/object” dimension will be used in the META:Structure partition through a DSL (domain-specific language/extension) to do fine-grained document structure and content depiction modeling. The META:Process partition is where #cidocCRM “looking forward/workflow” dimension comes into play. In both cases, I plan to use the #cidocCRM in as much of a “pure graph” form as possible so as to work at the “live whiteboard” level of human-conceptual model-understanding.

 

And this is where the third partition of my #SmartData design comes into play, I want to push META:ReferenceModel mapping to an explicit level where we can do two things: 1) thoughtfully map our “private garden” (human-tractable) conceptual models to existing and future Reference Models such that, 2) we can perform “on demand” graph transformations of our DSL conceptual model to whatever format is needed to appropriately respond to requests coming in via RESTFUL #LOD queries.

 

I want to stay away from the necessary “niddling detail” of such things as RDF expression until I have “on-demand harmonization” to perform. In other words, push the complexity of Reference Model  expression to the “view/controller” levels while maintaining the #SmartData aspect of an inner “private garden.” (BTW, you’ll notice a slight “plug/shout-out” in this diagram to the amazing Karma project at ISI/USC: http://www.isi.edu/integration/karma/#. I believe Karma will/should figure into the “to-be” platform of the new cidoc-crm.org website as a means to provide downloadable, pre-configured model-harmonizing “#cidocCRM adapters” for #cidocCRM developers.)

 

The “A-ha! Moment” I had when I saw this dual nature of the #cidocCRM – to serve both in the META:Structure and in the META:Process partitions of my proposed design – are what clearly sets the #cidocCRM apart from its peers in the ontological systems space. (Again, I am speaking from a Wolf Child perspective of only knowing what I have bumped into on my increasingly non-random walk through the museum informatics domain.)

 

It was the thrill of this insight about the dual nature of the #cidocCRM that led me to create what is probably one of the first known instances of a #cidocCRM-based cartoon! :D

 

             http://goo.gl/gQUDXw

 

Thank you again, Martin, for taking the time to contribute such a thoughtful post to this list while preparing to head into a “buzzsaw” of back-to-back meetings in Nuremburg addressing many gnarly issues that inevitably complicate the best intentions of model harmonization.