Monitoring the Gender Gap with Wikidata Human Gender Indicators

Authors: Maximilian Klein (GroupLens Research), Harsh Gupta, Vivek Rai (Indian Institute of Technology, Kharagpur), Piotr Konieczny (Hanyang University) and Haiyi Zhu (GroupLens Research)

Abstract: The gender gap in Wikipedia’s content, specifically in the representation of women in biographies, is well-known but has been difficult to measure. Furthermore the impacts of efforts to address this gender gap have received little attention. To investigate we utilise Wikidata, the database that feeds Wikipedia, and introduce the “Wikidata Human Gender Indicators” (WHGI), a free and open source, longitudinal, biographical dataset monitoring gender disparities across time, space, culture, occupation and language. Through these lenses we show how the representation of women is changing along 11 dimensions. Validations of WHGI are presented against three exogenous datasets: the world’s historical population, “traditional” gender-disparity indices (GDI, GEI, GGGI and SIGI), and occupational gender according to the US Bureau of Labor Statistics. Furthermore, to demonstrate its general use in research, we revisit previously published findings on Wikipedia’s gender bias that can be strengthened by WHGI.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

