Issue 274: Archetypical sounds

Starting Date: 
Working Group: 

Posted by Steve Stead on 10/2/2015

We have no way of documenting sounds that behave like instances of E37 Mark.

I would like to propose a new class for this.

The 34nd joined meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9 and the 27th FRBR - CIDOC CRM. It is decided that Thanasis Velios will work on this

Heraklion, October 2015

In the 35th joined meeting of the CIDOC CRM SIG and   28th FRBR - CIDOC CRM Harmonization meeting, the crm-sig discussed the proposal made by Thanasis Velios (see below) about  a  Audio /Sonic  Item  Class   but it is decided to postpone the incorporation of this class in the CRM until there is evidence and if the evidence conforms to the proposed definition. Also we should examine if such a concept imply traditional melodies  and if there is any  community of use. 

EXX Audio Item/Sonic Item
Subclass of:    E73 Information Object
Superclass of:  EXX Audio Mark/Sonic Mark

Scope Note: This class comprises the intellectual or conceptual aspects of recognisable sounds and compositions.

This class does not intend to describe the idiosyncratic characteristics of an individual performance or playback of sound, but the underlying prototype. For example, a sound such as Walter Werzowa's Intel sonic logo is generally considered to be the same logo when played in any number of adverts or media. The tone may change, but the logo remains uniquely identifiable. The same is true of music which is performed many times. This means that audio items are independent of their performance.

The class EXX Audio Item provides a means of identifying and linking together instances of E7 Activity that deliver a performance of the same sounds, compositions or soundtracks etc., as follows E7 Activity through PXX delivers audio item (is delivered by), EXX Audio Item, [P138 represents (has representation) to E1 CRM Entity – not sure how representing anything else apart from a conceptual object is possible with sound, but I suspect blind people would find it a relatively simple task]

- Walter Werzowa's Intel sonic logo (EXX)
- Francisco Tárrega's Nokia tune (Grande Valse) (EXX)
 -Beethoven’s “Ode an die Freude” (Ode to Joy) (E73)

In First Order Logic:
  EXX(x) ⊃ E73(x)

[P138 represents (has representation): E1 CRM Entity
(P138.1 mode of representation: E55 Type) – again not sure about these.]

Prato, February 2016

Current Proposal: 

Posted by George Bruseker on 18/8/2019

Dear all,

In the course of recent modelling exercises, I encounter often the need to have a way of modelling audio information or sounds. While this initially seems an easy request, on further thought, it seems complicated. Picking up from previous work at the SIG by Steven and Thanasis, I have been trying to formulate something reasonable and consistent. Below I present the present state of thought on my part and a potential scope note and properties list proposal. This continues the old issue: building on it and expanding it somewhat in scope.


Sound is an object of collection, curation and preservation. It is something that falls within the wider scope of cultural heritage and the more specific scope of museum information. Sound is related to the sense of hearing and forms a basic meso-scopic aspect of human experience of interest in the study of the past.

Background Problems:

The closest existing model we have to the question of modelling an object of the senses in CIDOC CRM is the ‘Visual Item’.  In a way, the obvious solution is to think if an ‘Audio Item’ forms a useful parallel, as has been explored in the past.

The potential parallel has the advantages of symmetry and being well known, but comes with ontological issues.

‘Images’ as they are modelled in CRM have the features of being a) the result of intentional action by a human being and b) being representational or potentially representational.

So the visual item scope note begins, “This class comprises the intellectual or conceptual aspects of recognisable marks and images.”

The seeming problems in trying to create a modelling parallel then of ‘image’ and ‘sound’ include that a) the basic case of sound is not to represent anything (but of course there is onomtopeia and then language which as a small subset of all sounds is representational, in the basic sense) and b) not all sound is created intentionally as part of an intellectual project (indeed a fleetingly small subset of sounds compared to the total would seem to be so).

The problem, to my mind, arises because the image has a particular nature (and is a particular focus of Western thought and creative production) because although all physical objects give off ‘images’ in the sense of appearance, if we wish to fix an image, human being must intervene and through the power of their creativity/mind and under certain cultural codes translate the perceived or imagined or speculated image on to a surface. Images are also often used for representing the world and as a means of communication and, with photography and other methods, documentation. Thus a visual item is a sensible ontological category for western thinking and is definitely a product of the human mind. 

Like the image, many things give off sounds. Unlike the image, there is - arguably - no need to translate sound via the human mind and add intellectual content in order for sound to… sound. So an inanimate object like the smashing of a pane of glass that accidentally falls from a table makes a sound, no-volition involved (not even willed), and a frog seeking its mate or warning off predators makes its croaking sound and need never consult a human being as to whether it is croaking its song correctly. Human beings develop at least two major systems of sound: music and speech. These two subsets of sounds are intentional. The former is, usually, non representational (except for Peter and the Wolf, or perhaps the representation of inner emotions?), while the latter is fundamentally representational.

Within the domain of cultural heritage, the question is what is documented:

    collection of folk songs
    collection of speech samples
    collection of interviews
    collection of bird songs
    collection of frog calls
    Collection of machine sounds
    collection of musical performances

Argument for Modelling something around sound:

There is an immaterial, repeatable pattern of sound recognizable by human beings that is not linked to a particular episode of performing a sound sequence, nor to a particular recording or recording medium. It is of interest for researchers to be able to track these different instances where they are present. Sound has different means of propagation and repetition. An image is marked on a surface and it is tracked across different media which are able to act as a surface of simulate a surface. Sound is a ‘time based’ entity which until recently could only be performed and reperformed. Now that we have recordings (the last 200 or so years), it is still the case that the carriers of sound and their means of carrying are different than an image (we need a player for a recording and not for an image unless it is digital). 

Thus there is both a research need and there seem to be different properties that are required to describe the relation of sound to other entities.

Potential solutions:

Given the above, we need an approach to modelling how ‘sound’ is present in CH documentation systems and what questions are asked of it. The following solutions seem possible: 

    Limit the sounds that can be modelled to those that are planned (concerts, speeches, etc.): this solves the intentionality problem if not the representationality problem. Since the representation problem is likely also an issue in Visual Item (since much more art is non representational for example), whatever reasoning permits the notion of the property of ‘represents’ on visual item, also could be applied to the audio item. This would mean, however, that it is not obvious how we model the croak of a frog. Or rather, we are able to model the croak of a frog, since the recording will have the intellectual input of the sound engineer, but the frog itself, not being on the level of we humans, won’t be able to perform its own croak. [also how do we model the appearance of Jesus on toast as occurs often enough around the world? Can we say that it is an image? Certainly man did not make it.]
    Decide that sounds indicate a new branch of immaterial item that is NOT human made. Is there is something above E28 Conceptual Object, which is is ‘pattern’ under which E28 resides alongside ‘sound’. The notion of this ‘pattern’ would be recognizable immaterial items which are object of discussion but are not human made. For the moment, the only example I would put under this would be sound sequence patterns, like the croak of a frog, which are not human made and yet certainly sound sequences. This could be used as a class together with symbol in order to indicate some sort of ‘audio item’ which encodes the croaking of a frog and the like. We could then also have an audio item which follows the ‘visual item’ pattern separately, which has the notion of human creation as well.
    Decide the above is heterodox and that all immaterial items are the product of human mind. Then we can follow the simple pattern of simply having an ‘audio item’ parallel to ‘visual item’ and then argument goes that all sounds are actually humanly perceived/able and their identifiability is based on a process of their being made an object of discourse and then recoded/encoded in  some way such that it can be recognized again. So when the frog croaks, it would croak an instance of the human made object ‘audio item’, unbeknownst to it. 

There are potentially better ways to look at it than the above, but these are the possibilities that come to mind to me. It seems to me that the 3rd solution is the closest to existing CRM approaches and on that basis then, I have crafted a first attempt at a scope note, building on what was already done in this issue. 

Proposed Scope Note:

Features: instances can be recordings or acts of musical performances or speech but also recordings or acts of inanimate and biological objects. 

Causes a relation to E5 which is the possibility of an event not ‘sound’ the audio item (not necessary to know what you are doing… can be a frog or a robot or an AI whatever) Arguably there should also be a new property which is a sub property of ‘carries’ which deals with the fact that the object ‘bears’ the audio item but not in a way that is immediate (I need a play back device). If this were accepted then we would need a subclass like information carrier to come back (I am not at all sure this is a good idea, just putting out a thought). Finally, there could potentially be a property for an audio item providing a typical case for a class of sounds. 

EXX Audio Item
Subclass of:    Information Object

Scope Note: This class comprises the intellectual or conceptual aspects of recognisable sounds and compositions.

The substance of an audio item is a recognizable pattern of vibration in a medium as perceivable by an auditory system. Sounds in and of themselves are not human constructs, instances of audio item, however, are. Specifically they are the identifiable and recognizable vibratory patterns which have become objects of discourse within given cultures and societies and act as symbolic markers and can be the basis for contemplation, discourse and reasoning inter alia.  

This class does not intend to describe the idiosyncratic characteristics of an individual occurrence of a particular sound, performance or playback of sound, but rather the underlying prototype. For example, a sound such as Walter Werzowa's Intel sonic logo is generally considered to be the same logo when played in any number of adverts or media. The tone may change, but the logo remains uniquely identifiable. The same is true of music or speeches which are performed many times. While individual characteristics of the performance or speech may incidentally change, a basic, identical form can be recognized across performance instances. This means that an instance of audio item is independent of performance. 

Aside from sounds following a particular composition, sounds captured from the environment (natural or human) and recognizable within a certain society or culture can be instances of audio item. Examples would include the sound of a tuned Porsche Carrera engine revving at 3000 rpm, the warble of the common Loon, or the David Frost Interviews with Nixon. 

The class EXX Audio Item provides a means of identifying and linking together instances of E5 Event in which the same sounds, compositions or utterances etc. can be identified to have occurred, using PXX sounded (was sounded by), EXX Audio Item. Further an instance of EXX Audio Item may be recorded and then can be indicated as pXX is recorded on (bears recording of) E24 Physical Man Made Thing.

- Walter Werzowa's Intel sonic logo (EXX)
- Francisco Tárrega's Nokia tune (Grande Valse) (EXX)
- Beethoven’s “Ode an die Freude” (Ode to Joy) (E73) 
- a recording of the Greater Horned Toad
- the sound of the Porsche 911 engine revved at 3000 rpm

Pxx sounded (was sounded by) D: E5 R: Exx Audio Item
Pxx bears recording of (is recorded by) D: E24 R: Exx Audio Item
Pxx sounds typical for D: Exx Audio Item R: E55 Type

I am sure there is much to be discussed/improved. Look forward to your thoughts.

In the 47th joint meeting of the CIDOC CRM SIG and ISO/TC46/SC4/WG9; 40th FRBR - CIDOC CRM Harmonization meeting; GB presented a classification of sound recordings instantiating a set of prototypical sounds and commented on the similarity of these sounds to visual images (instances of E36) from a conceptual point of view –in that they create identifiable patterns and have intellectual/conceptual aspects. 
However, given the scope note of E36 starts by defining visual items as *intellectual or conceptual aspects of recognizable marks and images*, it could not possibly be expanded to refer to the sounds animals produce. The proposed solution to that: only consider such sounds in as much as they represent the outcome of an activity performed by a human agent (collection), which is what grants them an intellectual/conceptual aspect. 

According to the discussion that followed the sig decided to reconsider the HW –continue working on that. TV will ask sound art colleagues to point him in the right direction with regards to sound integration. MD will rework the scope notes and examples for E90 Symbolic Object and E73 Information Object. GB and OE to contribute to that.

June 2020

Post by Thanasis Velios (12 june 2021)


Dear all,

My homework for this issue was to check for cases of integration for sound in sound arts. I spoke with a professor of sound art in UAL and from the examples we discussed it appears that there are very few collections systematically cataloguing sound art and they hardly document anything else apart from title and artist. We could not find any projects around integration of such collections. This does not mean that there is no desire for integration from the academic perpsective, but I could not identify any domain experts working towards that direction.

All the best,


Post by Daria Hook (13 June 2021)


Dear colleagues,

the examples of cataloguing the audiovisual information exist, and an expert gave it me:


Details: Chinese standard includes 25 positions of metadata (code of archive, its category, level of invenorisation, unique identifier, file numver, name of the scan creator, date of digital copying, date of the next conversion/migration, permission, notes, address of real location, original носитель, mode of digital copying, copying device, software and OS, name of file, size of file, format, video parameters, audio paremeters etc.)

Something similar was  proposed for the Russian archives.

With kind regards,
Daria Hookk

Post by George Bruseker (15 June 2021)

I imagine I'm running fast at windmills here but I already prepared a homework for this several sigs ago in which I list dozens of collections of sounds. There is documentation and research on sound in CH.


What exactly are we searching for in this issue?

Post by Thanasis Velios (15 June 2021)

Maybe we have failed to document the issue in its development but my understanding was that we were looking for use cases from sound arts where a sound piece with distinct identity (for example) has been used as part of another sound piece. The integration would have been required to identify these as related separate entities thus providing an additional argument for the new class. This is different to the typical preservation metadata documented for audio recordings or performances.

All the best,


Post by Martin Doerr (15 June 2021)

This is also my understanding. Basically, we would need properties substantially different from other information objects, and different from audiovisual recordings for a new class, that would express relations not covered by others in the CRM, and essential in these applications. To my understanding, these have not been identified so far.

The question of bird songs that came up is more complex, because it is a type-type relation: species A uses to sing soundtype B.

Another aspect is sound as intangible heritage, very different again.

All the best,


In the 57th CIDOC CRM & 50th FRBR/LRMoo SIG Meeting, the SIG assigned GB & SdS to review the empirical evidence gathered until now, review the class and property definitions by GB dating back to 2019, and bring them up for a vote in the next meeting, in Paris. 

HW: GB, SdS to summarize the proposal and bring to a decidable form in time for Paris 2024

Marseille, October 2023