Background and Rationale
In our paper at the Intelligent Virtual Agents conference (2019) we opened a discussion about methodological issues that exist in human-computer interaction (HCI) and specifically in the evaluation of Artificial Social Agents (ASA). ASAs, such as intelligent virtual agents (IVA) and social robots, are computer controlled entities that can autonomously interact with humans following the social rules of human-human interactions.
The motivation of this work is driven by the crisis of methodology that the social and life sciences are facing as the results of many scientific studies are difficult or impossible to replicate in subsequent investigation (e.g. Pashler et al, 2012). The Open Science Collaboration (2015) observed, for example, that the effect size of replications was about half of the reported original effect size and that whereas 97% of the original studies had significant results, only 39% of the replication studies had significant results. In fact it has been suggested that more than 50% of psychological research results might be false (i.e. theories hold no or very low verisimilitude) (Ioannidis, 2005). Many of the methods employed by Human-Computer Interaction (HCI) researchers come from the fields that are currently in a replication crisis. Hence, we ask the question "do our studies (in this paper we focus on user evaluations of intelligent virtual agents) have similar issues?"
A variety of ideas to improve research practices have been proposed and it is likely these ideas can be beneficial to the methods used in the field of HCI. Some actionable points leading to open and reproducible science are pre-registration of experiments, replication of findings, collaboration and education of researchers. It is clear that the replication crisis needs our attention.
Development
In the Open Science Foundation work group "Artificial Social Agent Evaluation Instrument", scientists from the Intelligent Virtual Agent (IVA) community are collaborating to create a validated community driven standardized questionnaire instrument for evaluating human interaction with Intelligent Virtual Agents. This instrument will help researchers to make claims about people’s perceptions, attitudes, and beliefs towards their agent. It will allow agents to be compared across user studies, and importantly, it helps in replicating our scientific findings. This is essential for the community if we want to make valid claims about the impact that our social agents can have in domains such as health, entertainment, and education.
Plan
The plan, as preregistered on the OSF platform, will be updated to reflect the progress. Where possible a link to the result of each step in the plan will be added. For all resources, visit the publication on this site and/or the OSF webpage.
Determine the process, and get people involvedDetermine the modelExamine existing questionnairesDiscussion among experts
Determine the constructs and dimensionsFace validity among expertsGrouping of existing constructs
Determine initial set of construct itemsContent validity analysis – study into expert’s agreement of items to measure constructsReformulating into easy to understand item questions
Determine the final item set with the provision to create a long and short questionnaire version (i.e., construct validity, convergent and discriminant validity analysis: select items that both convert and discriminate)Determine the generalization performance of the long and short questionnaire version (i.e. cross validation: fit model on data set from new set of ASAs)- Criteria validity
- Predictive validity: agreement with future observation
- Concurrent validity agreement with other ‘valid’ measure collected at same time (Ongoing)
- Translate questionnaire (forward/backward translation)
Mandarin ChineseDutchGerman- French (Ongoing)
- Italian (Ongoing)
- Developing normative data set (Ongoing)
References
Ioannidis, J. P. (2005). Why most published research findings are false. PLoS medicine, 2(8), e124.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. Science, 349(6251), aac4716.
Pashler, H., & Wagenmakers, E. J. (2012). Editors’ introduction to the special section on replicability in psychological science: A crisis of confidence?. Perspectives on Psychological Science, 7(6), 528-530.