Issue 545: Differentiate TX8 Grapheme definition

Starting Date: 
Working Group: 

Post by Martin Doerr (17 June 2021)

Dear all, 

I think we need to distinguish the set of all possible atomic graphemes of a writing system, from the atomic grapheme, a grapheme sequence (or arrangement) of an actual text, and the grapheme set appearing in an actual text.

Post by Achille Felicetti (10 October 2021)

Dear Martin,

Our HW for issue 545.

A grapheme is atomic by definition as it represents the minimum unit (i.e., a unit that cannot be further decomposed) of a writing system.

Furthermore, the grapheme is a conceptual and non-concrete unit and is made manifest by the individual act (performance) of writing. For this reason, a grapheme of an actual text cannot exist. Instead, in actual texts we find glyphs that are precisely the physical manifestation of graphemes.


We hope this clarifies :-)



Achille & Francesca

Post by Martin Doerr (10 October 2021) 

Dear Achille, dear Francesca,

My apologies for using "grapheme" for "glyph" and "grapheme". I understand the difference

You write: "A grapheme is atomic by definition as it represents the minimum unit (i.e., a unit that cannot be further decomposed) of a writing system."


I agree, but you define:

TX8 Grapheme

Subclass of:         E90 Symbolic Object

Superclass of:

Scope Note:         Subclass E90 Symbolic Object used to represent the abstract units with distinctive value in a given writing system. A grapheme is a character or sequence of characters [MD1] that functions as a distinct unit within an orthography. It

I wonder about "sequence of characters".  I think this should be differentiated.

You define:

TXP11 transcribed (was transcribed by)
Domain:              TX6 Transcription
Range:                TX8 Grapheme
Subproperty of:   P16 used specific object (was used for)
 Quantification: many to many (0,n:0,n)
 Scope note:         This property highlights the specific way in which an activity of TX6 Transcription results in the rendering of the specific TX8 Grapheme(s) of which an instance of TX1 Written Text is composed.

I think we confuse here glyphs, graphemes, and symbolic occurrences of graphemes. I understand the following:

The reading in the sense of observation understands each glyph as materialization of a grapheme. (As I pointed out in an earlier message, the candidate graphemes must be limited in some way). A sequence of glyphs does NOT correspond to a sequence of graphemes, but a sequence of grapheme occurrences.
"EEEEE" is a sequence of 5 occurrences of one grapheme in one symbolic object.

Since TXP11 transcribed is the only output property of TX6 Transcription, you cannot describe by TXP11 the result of a transcription of a text. The sequence "EEEEE" uses only grapheme "E", one instance.

TX5 Reading does not allow to describe the glyph-grapheme association of a single glyph.

I miss the output of a reading process. Is it thought to be a transcription? Always? Either TX5 or TX6 should result in a Symabolic Object representing the written text, which is at least the sequence of grapheme occurrences corresponding to the sequence of glyphs, and possibly more structure features.

I propose not to mix observation and understanding, because as far as I understand, the use of interpretation of the intended meaning of a written text is only relevant for reading when the glyphs and their arrangements are ambiguous.

I think the scope note of TX5 is confusing, because is talks about "decoding signs", without relating it to the glyph - grapheme association:

"The reading activity, thus, is intended as a specific observation (S4) in which the decoding of the signs is performed, i.e. the linguistic value is recognised and the message is understood. Cases in which decoding does not happen (e.g., the observer is able to describe the signs but not to assign a specific linguist value to them), the S4 class could be used as it is..."

I further miss a property implementing the glyph-grapheme association.

If we interpret a text in symbolic form as a sequence of grapheme occurrences in the symbolic text, we may need another property of which graphemes occurred. I would like to have your expert opinion, if a Writing system used for a Written Text is different from that used for a symbolic text (IsA Expression).

I hope this makes my concerns more clear

All the best,


Post by Achille Felicceti (12 October 2021)

Dear Martin,

We understand your doubts and we believe they make perfect sense.


In fact, re-reading the scope notes of the classes that you mention, we realised that their definitions are a bit confused and incoherent with each other and with the concepts we have in mind, probably due to the different revisions made to the model along its various versions, and should certainly be revised and possibly better harmonised.

It will be a pleasure to discuss it with tomorrow and in the next meetings, virtual or preferably face to face :-)



Achille & Francesca

In the 51st CIDOC CRM & 44th FRBRoo SIG meeting, it was decided that the scope notes of Tx5 Reading, Tx6 Transcription, Tx8 Grapheme and TxP11 transcribed (was transcribed by) need to be redrafted in order for them to: 

  1. clearly distinguish btw atomic units of writing vs combinations thereof, by means of introducing a concept of Txx Grapheme Occurrence Sequence and associate it with an instance of E73 Information Object (or a specialization)
  2. showcase the relation btw glyphs(/monographs…) and graphemes 
  3. explain the correspondences btw transcriptions from one type of writing system to another and the parts of written text that get transcribed     

HW: AF, FM, PR, MD. TV to proofread 

5 September 2022

Link to the HW by Achille, Francesca and Martin for the issue.  




By decision of the Editors' group, the issue will be merged with 549. All discussions will carry over there. 

12 October 2022