Category Archives: Full Research Papers

Monitoring the Gender Gap with Wikidata Human Gender Indicators

Title: Monitoring the Gender Gap with Wikidata Human Gender Indicators

Authors: Maximilian Klein (GroupLens Research), Harsh Gupta, Vivek Rai (Indian Institute of Technology, Kharagpur), Piotr Konieczny (Hanyang University) and Haiyi Zhu (GroupLens Research)

Abstract: The gender gap in Wikipedia’s content, specifically in the representation of women in biographies, is well-known but has been difficult to measure. Furthermore the impacts of efforts to address this gender gap have received little attention. To investigate we utilise Wikidata, the database that feeds Wikipedia, and introduce the “Wikidata Human Gender Indicators” (WHGI), a free and open source, longitudinal, biographical dataset monitoring gender disparities across time, space, culture, occupation and language. Through these lenses we show how the representation of women is changing along 11 dimensions. Validations of WHGI are presented against three exogenous datasets: the world’s historical population, “traditional” gender-disparity indices (GDI, GEI, GGGI and SIGI), and occupational gender according to the US Bureau of Labor Statistics. Furthermore, to demonstrate its general use in research, we revisit previously published findings on Wikipedia’s gender bias that can be strengthened by WHGI.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Supporting Cyber Resilience with Semantic Wiki

Title: Supporting Cyber Resilience with Semantic Wiki

Authors: Riku Nykänen and Tommi Kärkkäinen (University of Jyväskylä)

Abstract: Cyber resilient organizations, their functions and computing infrastructures, should be tolerant towards rapid and unexpected changes in the environment. Information security is an organization-wide common mission; whose success strongly depends on efficient knowledge sharing. For this purpose, semantic wikis have proved their strength as a flexible collaboration and knowledge sharing platforms. However, there has not been notable academic research on how semantic wikis could be used as information security management platform in organizations for improved cyber resilience. In this paper, we propose to use semantic wiki as an agile information security management platform. More precisely, the wiki contents are based on the structured model of the NIST Special Publication 800-53 information security control catalogue that is extended in the research with the additional properties that support the information security management and especially the security control implementation. We present common uses cases to manage the information security in organizations and how the use cases can be implemented using the semantic wiki platform. As organizations seek cyber resilience, where focus is in the availability of cyber related assets and services, we extend the control selection with option to focus on availability. The results of the study show that a semantic wiki based information security management and collaboration platform can provide a cost-efficient solution for improved cyber resilience, especially for small and medium sized organizations that struggle to develop information security with the limited resources.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Enabling team collaboration with task management tools

Title: Enabling team collaboration with task management tools

Authors: Dimitra Chasanidou, Brian Elvesæter, and Arne-Jørgen Berre (SINTEF ICT)

Abstract: Project and task management tools aim to support remote or face-to-face collaboration. Despite the growing needs for these tools, little is known about how they are utilized in practice. This paper presents the results of an exploratory study using UpWave, a task management tool, and the ways that it enables team collaboration. The group interviewees utilize UpWave for their collaborations and report on its features in terms of use, best practices, motivations and rewards for users to encourage their collaboration. This paper concludes that project and task management tools offer new possibilities for collaborations; it also makes suggestions for using such tools in teams. This study’s future work will include a mixed-methods approach to gain a greater understanding of the tools’ effects in various collaboration settings.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Mining team characteristics to predict Wikipedia article quality

Title: Mining team characteristics to predict Wikipedia article quality

Authors: Grace Gimon Betancourt, Armando Segnini, Carlos Trabuco, Amira Rezgui and Nicolas Jullien (Télécom Bretagne)

Abstract: In this study, we were interested in studying which characteristics of virtual teams are good predictors for the quality of their production. The experiment involved obtaining the Spanish Wikipedia database dump and applying different data mining techniques sui- table for large data sets to label the whole set of articles according to their quality (comparing them with the Featured/Good Articles, or FA/GA). Then we created the attributes that describe the characteristics of the team who produced the articles and using decision tree methods, we obtained the most relevant characteristics of the teams that produced FA/GA. The team’s maximum efficiency and the total length of contribution are the most important predictors. This article contributes to the literature on virtual team organization.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

A Sector-Selection Methodology for Living Labs Implementation

Title: A Sector-Selection Methodology for Living Labs Implementation

Author: Dr Ir Robert VISEUR (CETIC)

Abstract: Creative Wallonia is a framework program that puts creativity and innovation at the heart of the redevelopment of Wallonia. In the context of Creative Wallonia, the Walloon government has decided to study the implementation of Living Lab pilot projects in Wallonia. The initiators required to identify two sectors in which the pilot phase could be addressed and conducted. This paper is dedicated to the sector selection methodology that was developed for the implementation of the Walloon Living Lab pilot projects. The paper is organized in three sections. In the first section we search for the criteria that could be used to select appropriate sectors. In the second section we present the developed methodology and the selection grid based on criteria. In the third section we discuss the grid and the results after application to the Walloon call for pilot projects. The contribution of the research consists in a methodology that allows to objectivize the choice of sectors that will be applied to the future Living Lab projects. Finally, a preliminary feedback about the living labs implementation is discussed.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Operation Digital Chameleon – Towards an Open Cybersecurity Method

Title: Operation Digital Chameleon – Towards an Open Cybersecurity Method

Authors: Andreas Rieb and Ulrike Lechner (Universität der Bundeswehr München, Germany)

Abstract: In the Serious Game Operation Digital Chameleon red and blue teams develop attack and defense strategies to explore IT-Security of Critical Infrastructures as part of an IT-Security training. Operation Digital Chameleon is the training game of the IT- Security Matchplay series in the IT-Security for Critical Infrastructure research program funded by BMBF. We present the design of Operation Digital Chameleon in its current form as well as results from game #3. We analyze the potential and innovation capability of Operation Digital Chameleon as an Open Innovation method for the domain of IT-Security of Critical Infrastructures. We find that Operation Digital Chamaeleon facilitates creativity, opens the process of IT-Security strategy development and – despite being designed for training purposes – opens the process to explore innovative attack vectors.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

An Empirical Evaluation of Property Recommender Systems for Wikidata and Collaborative Knowledge Bases

Title: An Empirical Evaluation of Property Recommender Systems for Wikidata and Collaborative Knowledge Bases

Authors: Eva Zangerle, Wolfgang Gassler, Martin Pichl, Stefan Steinhauser, Günther Specht (University of Innsbruck)

Abstract: The Wikidata platform is a crowdsourced, structured knowledgebase aiming to provide integrated, free and languageagnostic facts which are amongst others used by Wikipedias. Users who actively enter, review and revise data on Wikidata are assisted by a property suggesting system which provides users with properties that might also be applicable to a given item. We argue that evaluating and subsequently improving this recommendation mechanism and hence, assisting users, can directly contribute to an even more integrated, consistent and extensive knowledge base serving a huge variety of applications. However, the quality and usefulness of such recommendations has not been evaluated yet. In this work, we provide the first evaluation of different approaches aiming to provide users with property recommendations in the process of curating information on Wikidata. We compare the approach currently facilitated on Wikidata with two state-of-the-art recommendation approaches stemming from the field of RDF recommender systems and collaborative information systems. Further, we also evaluate hybrid recommender systems combining these approaches. Our evaluations show that the current recommendation algorithm works well in regards to recall and precision, reaching a recall@7 of 79.71% and a precision@7 of 27.97%. We also find that generally, incorporating contextual as well as classifying information into the computation of property recommendations can further improve its performance significantly.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions

Title: Evaluating and Improving Navigability of Wikipedia: A Comparative Study of Eight Language Editions

Authors: Daniel Lamprecht (KTI, Graz University of Technology), Dimitar Dimitrov (GESIS – Leibniz Institute for the Social Sciences), Denis Helic (KTI, Graz University of Technology) and Markus Strohmaier (GESIS – Leibniz Institute for the Social Sciences and University of Koblenz-Landau)

Abstract: Wikipedia supports its users to reach a wide variety of goals: looking up facts, researching a topic, making an edit or simply browsing to pass time. Some of these goals, such as the lookup of facts, can be effectively supported by search functions. However, for other use cases such as researching an unfamiliar topic, users need to rely on the links to connect articles. In this paper, we investigate the state of navigability in the article networks of eight language versions of Wikipedia. We find that, when taking all links of articles into account, all language versions enable mutual reachability for almost all articles. However, previous research has shown that visitors of Wikipedia focus most of their attention on the areas located close to the top. We therefore investigate different restricted navigational views that users could have when looking at articles. We find that restricting the view of articles strongly limits the navigability of the resulting networks and impedes navigation. Based on this analysis we then propose a link recommendation method to augment the link network to improve navigability in the network. Our approach selects links from a less restricted view of the article and proposes to move these links into more visible sections. The recommended links are therefore relevant for the article. Our results are relevant for researchers interested in the navigability of Wikipedia and open up new avenues for link recommendations in Wikipedia editing.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Predicting the quality of user contributions via LSTMs

Title: Predicting the quality of user contributions via LSTMs

Authors: Rakshit Agrawal and Luca de Alfaro (University of California, Santa Cruz)

Abstract: In many collaborative systems it is useful to automatically estimate the quality of new contributions; the estimates can be used for instance to flag contributions for review. To predict the quality of a contribution by a user, it is useful to take into account both the characteristics of the revision itself, and the past history of contributions by that user. In several approaches, the user’s history is first summarized into a number of features, such as number of contributions, user reputation, time from previous revision, and so forth. These features are then passed along with features of the current revision to a machine-learning classifier, which outputs a prediction for the user contribution. The summarization step is used because the usual machine learning models, such as neural nets, SVMs, etc. rely on a fixed number of input features.We show in this paper that this manual selection of summarization features can be avoided by adopting machine-learning approaches that are able to cope with temporal sequences of input.

In particular, we show that Long-Short Term Memory (LSTM) neural nets are able to process directly the variable length history of a user’s activity in the system, and produce an output that is highly predictive of the quality of the next contribution by the user. Our approach does not eliminatethe process of feature selection, which is present in all machine learning. Rather, it eliminates the need for deciding which features from a user’s past are most useful for predicting the future: we can simply pass to the machine-learning apparatus all the past, and let it come up with an estimate for the quality of the next contribution.

We present models combining LSTM and NN for predicting revision quality and show that the prediction accuracy attained is far superior to the one obtained using the NN alone. More interestingly, we also show that the prediction attained is superior to the one obtained using user reputation as a feature summarizing the quality of a user’s past work. This can be explained by noting that the primary function of user reputation is to provide an incentive towards performing useful contributions, rather than to be a feature optimized for prediction of future contribution quality.

We also show that the LSTM output changes in a natural way in response to user behavior, increasing when the user performs a sequence of good quality contributions,and decreasing when the user performs a sequence of low-quality work. The LSTM output for a user could thus be usefully shown to other users, alongside the user’s reputation and other information.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.

Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List

Title: Differentiating Communication Styles of Leaders on the Linux Kernel Mailing List

Authors: Daniel Schneider, Scott Spurlock and Megan Squire (Elon University)

Abstract: Much communication between developers of free, libre, and open source software (FLOSS) projects happens on email mailing lists. Geographically and temporally dispersed development teams use email as an asynchronous, centralized, persistently stored institutional memory for sharing code samples, discussing bugs, and making decisions. Email is especially important to large, mature projects, such as the Linux kernel, which has thousands of developers and a multilayered leadership structure. In this paper, we collect and analyze data to understand the communication patterns in such a community. How do the leaders of the Linux Kernel project write in email? What are the salient features of their writing, and can we discern one leader from another? We find that there are clear written markers for two leaders who have been particularly important to recent discussions of leadership style on the Linux Kernel Mailing List (LKML): Linux Torvalds and Greg Kroah-Hartman. Furthermore, we show that it is straightforward to use a machine learning strategy to automatically differentiate these two leaders based on their writing. Our findings will help researchers understand how this community works, and why there is occasional controversy regarding differences in communication styles on the LKML.

This contribution to OpenSym 2016 will be made available as part of the OpenSym 2016 proceedings on or after August 17, 2016.