Issue 530: Bias in data structure

ID: 
530
Starting Date: 
2021-03-03
Working Group: 
4
Status: 
Open
Background: 

Posted by Thanasis on 23/10/2020

As part of an initiative on decolonising teaching in the University of the Arts I was asked to recommend bibliography on bias in data structures, classification systems, etc. This is not about content, but about the information science side of things. Could anyone suggest any relevant pointers?

Many thanks in advance.

All the best,

Thanasis

P.S. I felt that the meeting went well this week and that we are a significant step closer to 7.1. 

 

Current Proposal: 

Posted by Martin on 23/10/2020

On 10/23/2020 8:33 PM, Athanasios Velios wrote:
> Dear all,
>
> As part of an initiative on decolonising teaching in the University of the Arts I was asked to recommend bibliography on bias in data structures, classification systems, etc. This is not about content, but about the information science side of things. Could anyone suggest any relevant pointers?

I think it was a very good meeting. Thank you all!

Thanasi, you may check Alison Wiley, feminist archaeologist, philosopher and very bright spirit. She worked on gender bias and many other things in archaeology. 

Posted by George on 26/10/2020

Hi Thanasis,

On the topic of bias in data structures, there is: 

a thread by Erin Canning which might go somewhere: https://twitter.com/eecanning/status/1297552611873300480?s=19

data feminism site:  https://datafeminism.io/ 

this indigitization group on twitter: https://twitter.com/Indigitization/status/1318637928008929280?s=19

we had a nice session in cidoc 2018, session 3.8 decolonizing knowledge http://www.cidoc2018.com/sessions-by-theme with presentations by Mike Jones and Faye Belsey

I really like the topic. Let me know if you hear of new things in this regard!
 

Posted by martin on 24/02/2021

Dear All,

Here mine an Thanasis' attempt on a bias statement:
 

Possible title:

Bias Awareness

The CIDOC CRM provides a deliberately small selection of relationships and categories which logically encode specific types of facts about the past that can unambiguously be compared and matched across applications, domains and cultures. Even though this is highly effective for relating and accessing relevant content across the globe, it inevitably enforces a certain point of view and misses others, i.e., it constitutes an information bias.  The maintainers of the CIDOC CRM are committed to keep this point of view as unconstrained as possible and at a relatively basic material level in the hope that it does not interfere with the description of diverse cultural-historical interpretations related to facts such encoded. In particular, the CIDOC CRM does not include any classification beyond the necessary for the definition of properties; it supports multiple instantiation and it does not model causation. The CIDOC CRM is the result of examining documentation practice in institutions based in specific regions and cultures. The thereby represented diversity may still miss the possibility to express other relevant points of view appropriate for the intended functionality of the CIDOC CRM. Readers and users of the CIDOC CRM are therefore invited to bring to the attention of the maintainers potential interference with relevant points of view, best by concrete examples, so that the CIDOC CRM as a tool for relating and integrating information about the past can be continuously improved in embracing the diversity of human culture and its interpretation.

Posted by Martin on 24/02/2021

Dear All,

Here mine an Thanasis' attempt on a bias statement:

Possible title:

Bias Awareness

The CIDOC CRM provides a deliberately small selection of relationships and categories which logically encode specific types of facts about the past that can unambiguously be compared and matched across applications, domains and cultures. Even though this is highly effective for relating and accessing relevant content across the globe, it inevitably enforces a certain point of view and misses others, i.e., it constitutes an information bias.  The maintainers of the CIDOC CRM are committed to keep this point of view as unconstrained as possible and at a relatively basic material level in the hope that it does not interfere with the description of diverse cultural-historical interpretations related to facts such encoded. In particular, the CIDOC CRM does not include any classification beyond the necessary for the definition of properties; it supports multiple instantiation and it does not model causation. The CIDOC CRM is the result of examining documentation practice in institutions based in specific regions and cultures. The thereby represented diversity may still miss the possibility to express other relevant points of view appropriate for the intended functionality of the CIDOC CRM. Readers and users of the CIDOC CRM are therefore invited to bring to the attention of the maintainers potential interference with relevant points of view, best by concrete examples, so that the CIDOC CRM as a tool for relating and integrating information about the past can be continuously improved in embracing the diversity of human culture and its interpretation.

Comments?

Posted by George on 25/02/2021

Dear all,

I think the first one was better. This one makes it sound as if we say that we are innocent of bias until otherwise proven guilty and that you had better bring good evidence if you want to challenge that claim. Especially the clause about bringing evidence sounds defensive. Regardless, the whole question seems substantial enough that it shouldn't be decided editorially but by open discussion in the community. We have not had a statement nor policy on bias (a broader position on bias would take into account structural issues of inequality practically) up to now so I don't see why we need this for 7.1 Let's have it as an open issue in the SIG and see what the community thinks.

Posted by Thanasis on 25/02/2021

Hi George,

The reason I thought it would be useful to have a statement in this
version is because the version will be a point of reference for a while
and like I said in the original email it would seem like an oversight
not to include it when this is one of the most widely discussed topics
in heritage at the moment.

I support a broader discussion of the issue in the community.

Perhaps all we need to include is a sentence like:

"Discussions on the types of bias present in the CIDOC CRM are in
progress within the CIDOC CRM community".

 

Posted by George on 25/02/2021

Dear all,

A very minimal statement like this could be a good way to go. Then the acknowledgement of the possibility happens and leaves open for the future a fulsome discussion of the nature of bias. The present formulation by Martin and Thanasis could then be put forward as an issue to be discussed since it is a solid proposal to work from. That would make sense to me.

 

Posted by Martin on 25/02/2021

Dear George, all,
The short statement is not bad! I would put it at the end of the section Scope of the CIDOC CRM.
I agree with discussing this in a wider area, after 7.1

What I wanted to express is that there is inevitably a bias in an FOL-based system and a system empirically derived. The empirical base is better than a bright theory, but represents cultural realities. I did not want to express that we are not guilty, just the opposite, and that this is not a moral question, but a question of research and awareness. I want to be explicit about what bias such a system must contain, may contain, and what are the measures to reduce it. "Especially the clause about bringing evidence sounds defensive" .... easy to change 

Posted Athanasios Velios on 4/03/2021

Dear all,

In version 7.1 a short but important sentence has been added at the end
of the scope section:

"Discussions on the types of bias present in the CIDOC CRM are in
progress within the CIDOC CRM community."

Issue 530 is used to track the discussions here:

http://cidoc-crm.org/Issue/ID-530-bias-in-data-structure

It is important to engage in this discussion so that we first understand
the issues around bias and privileged positions and then how these may
or may not impact the development of the model.

We will then be more confident in making a more complete statement is
future versions. Issue 530 is scheduled to be discussed at the community
session of the forthcoming meeting.

Looking forward to it.

Posted by Anais Guillem on 8/03/2021

Dear Thanasis, all, 

Some digital humanists work and publish on this question of bias in digital humanities: here is an example of very a propos publication:

https://journals.dartmouth.edu/cgi-bin/WebObjects/Journals.woa/xmlpage/4/article/425

I gathered myself bibliography about decolonizing knowledge and methodology especially in digital project. I could join the discussion of your working group if you want.

Posted by Thanasis on 08/03/2021

Fantastic! Thank you for sharing and you are first in the list.

For the rest in list and if you did not attend today's sessions, following discussion for issue 530, a working group is being formed to discuss bias in the CRM. Please let me know if you wish to contribute to the discussion.

Posted by Erin Canning on 08/03/2021

I would also like to be involved in this discussion, please! I too have a reading list on the subject that I would be happy to share; I have been meaning to pull everything into a Zotero library and this is a good excuse to do so. For a single article to start things off, I would recommend Miriam Posner's "What's Next: The Radical, Unrealized Potential of the Digital Humanities" as an interesting read.

Posted by Robert Sanderson on 08/03/2021

Happy to join as well. I'm co-chair for the Bias Awareness and Responsibility Committee for Cultural Heritage at Yale University, and happy to share our experiences in that work. This is especially relevant to our work as we move to adopt CIDOC-CRM (via Linked Art) as our baseline ontology.

Some readings that we found useful:

https://doi.org/10.1080/0270319X.2019.1696069 -- "Aliens" vs Catalogers: Bias in the Library of Congress Subject Headings

https://journals.litwinbooks.com/index.php/jclis/article/view/120 -- Cultural Humility as a Framework for Anti-Oppressive Archival Description

https://doi.org/10.1111/cura.12191 -- Coming Together to Address Systemic Racism in Museums

https://www.youtube.com/watch?v=MbrC0yvBCNo&ab_channel=CollectionsTrust -- Decolonizing the Database by Dr Errol Francis

And, in print media: Algorithms of Oppression: How Search Engines Reinforce Racism by Sufiya Noble of UCLA

A colleague and I presented about our work at EuroMed2020:  Libraries, Archives and Museums are not Neutral: Working Toward Eliminating Systemic Bias and Racism in Cultural Heritage Information Systems 

Youtube capture of the zoom: https://youtu.be/V9-IHQQv-LY?t=26661 

From a CIDOC-CRM perspective, I think there are several issues to grapple with, including those that were brought up today. 

Some differentiation I would try to draw, and without presumption that the answer for any of them is positive or negative:

* Ontology Features

 -- does the data structure described by the ontology introduce, require or reinforce biases (especially harmful ones)?

 -- does the ontology preclude use or engagement with different communities - is it accessible or are there barriers to entry that limit usage to certain communities, thereby introducing bias through exclusion

* Documentation of the Ontology

  -- does the documentation about the ontology introduce, require or reinforce biases?

  -- is the documentation accessible to broad and diverse communities?

  -- is the documentation transparent about issues that are known or presumed to exist

* Methodology of determining the Ontology

  -- does the way we produce the ontology, from ideation to standardization, introduce, require or reinforce biases

  -- is the methodology accessible to broad and diverse communities for participation?

  -- is the methodology transparent as to how it works, and accountable when it doesn't?

* Implementations and Instances of the Ontology

  -- I think these are useful as second-order evidence, but that we should not be too involved or prescriptive.

 

And some micro-topics and thoughts, which are more opinionated:

* P48 Has Preferred Identifier -- this breaks the very beneficial "neutral standpoint" design decision. We should deprecate it for this reason, quite apart from the issue on the docket that it should be deprecated as an outmoded design pattern.

* E31 Document, E32 Authority Document vs E73 Information Object -- The need to distinguish "propositions about reality" and "terminology or conceptual systems" from other information seems to introduce subjectivity and the potential therein for harmful biases as to what constitutes "truth" or "reality", and what is a "terminology" versus what is just a word document. 

 

Posted by Thanasis on 08/03/2021

Thank you Erin. We are using Zotero already for the CRM so this is a good idea. I can check if a new folder can be created for issue 530.

Posted by Nicola Carboni on 9/3/2021

Dear Thanasis, all,

I would be happy to join the discussion. Another useful reading other than the already cited ones, is "Cataloguing Culture: Legacies of Colonialism in Museum Documentation” by Hannah Turner.

Regarding the name and the scope of the issue: should we focus on data structure (I see the title of the issue is "bias in data structure") or specifically on ontologies and CRM?
While I do very much believe that data structure is an enormously important topic to discuss, it is an extremely large subject, and entail a larger series of problems which do derive from the informational foundation, the concept of structure itself, the recorded information, as well as disciplinary inheritance in the chosen subject matter.

I second rob proposing to focus on the problem of the ontology and the process of documentation/development. I would add that we should include some point about CRM as system of thought as well as the problem of formalisation.

>>* Implementations and Instances of the Ontology
>>-- I think these are useful as second-order evidence, but that we should not be too involved or prescriptive.

I would include the topic, as to make clear the diversity in implementation (use of terminological systems as well as the use of the concept of controlled terminology itself), avoiding indeed the prescriptive stance.

Posted by Thanasis on 9/03/2021

Indeed the intention is to focus on ontological level for the CRM and not to expand to data structures, schemas etc. The issue label does not represent the issue exactly, but it can act as a reminder. I will add the reference to the library.

Posted by George on 10/03/2021

Dear all,

I, too, fully support this important initiative and hope to learn much from colleagues in the discussion. The opening discussion was already very fruitful to start us off at looking at fundamental issues to take into account in the method of creating ontologies.

The shared zotero library suggestion is a great one. 

 

In the 49th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 42nd FRBR – CIDOC CRM Harmonization meeting, TV gave an outline of the issue -question of bias in data structures (link to presentation) -and introduced Prof. P.Goodwin who brought the sig up to date on the Worlding Public Cultures Project

The discusson concluded in a concise proposal to move forward with the issue of bias by forming a working group, where to disuss bias-related concerns (forms of bias in data structures which interfere with cultural points of view; empirical or theoretical means we have to detect them; whether documenting concepts of one’s culture as an empirical fact can be regarded as bias;) with the aim of 

  • finding a common denominator and maximising diversity;  
  • producing a statement on buas for the CRM specification document, 
  • establishing crieteria for examining classes and properties
  • creating new issues for improving the model.

The sig agreed to the above and decided to take action as indicated: (a) inform the sig-list of the decision to start a WG on the discourse around bias (ask for participation), (b) assign TV with leading the initiative/discussion. Prof. P.Goodwin and Dr M.Hidalgo Urbaneja will support the initiative. 

 

Details of the discussion can be found here

 

March, 2021

In the 50th joint meeting of the CIDOC CRM SIG and SO/TC46/SC4/WG9; 43nd FRBR – CIDOC CRM Harmonization meeting, EC reported on the progress of the WG dealing with Bias. Progress report here

Discussion points: 
MD: understanding bias in data structures is a misrepresentation of the actual problem. And the scope of the CRM is clearly stated and does not encourage bias. 
What they should be aiming for is to identify the bias that may be introduced by the intended use of one particular data construct. It is not the data construct as such that gets identified as a source of bias.
GB: besides investigating whether constructs in an ontology further entrench bias, they could also review the process of ontology building, and dialogue and see if bias manifests in that case too.
EC: we all come from particular perspectives and that translates into our understanding of the world. However, bias comes into play when it comes to existing power structures. In which case, you cannot just undo bias, because it’s a symptom of some sort of inequity. Identifying sources of bias serves to raise the issue, and for their part they are interested in identifying sources of bias in ontology/data structures. 

June 2021