The Difference of Preference versus Performance Can Differ for Concurrent versus Retrospective Ratings
Several studies have found differences between subjective preference ratings and objective performance measures. Bailey (HFES'93, pp.282-286) summarizes several, and argues for separate treatment of these concepts. Our results in a multifactor multivariate experiment support Bailey's contention, but adds a new dimension of concern: the use of concurrent versus retrospective subjective ratings. The presentation here will focus on the relationship of performance and concurrent versus retrospective preference ratings (not the overall agenda of the research on model-based linking).
An experiment evaluated the usefulness of entity-relationship-based links in accessing online versus print information. Subjects sought answers to questions in each of four comparison conditions:
Predictions of performance were based on "designer's intuition"
PS: Linked < Unlinked < Linked < Unlinked (time) Online Online Paper Paperbut the experiment determined actual performance:
AS: 83.3 106.6 121.3 104.6 (seconds)which correlated well with concurrent confidence (of accuracy) ratings (1-10):
CA: 9.1 8.9 8.3 8.9 (Confidence)but not retrospective ratings (all of which were correlated):
RS: 8.2 7.9 4.8 4.3 (Speed) RA: 8.8 8.3 7.1 6.9 (Accuracy) 7.7 6.9 4.9 4.4 (Usability)The actual accuracy varied little across conditions despite significantly lower confidence for Linked-Paper and significantly lower retrospective ratings for Paper versus Online conditions.
AA: 77.3 78.1 75.8 76.6 (% correct)Designer-intuition predicted-speed (PS) did not correlate well with actual-speed (AS). AS correlated well with CA, which might make one want to generalize that subjective confidence ratings about accuracy are good predictors of performance time, but one would be less eager given that actual-accuracy (AA) did not differ across conditions. The use of isolated measures becomes even more tenuous when we look at the retrospective ratings, which all correlated well with predicted-speed (PS); If only subjective retrospective ratings were collected as data, one might conclude that the designer's intuition was perfect. But the RS and RA scores seem to have lost the poor performance (AS) and confidence (CA) of the Linked-Paper condition (all conditions were counter-balanced in a latin square).
In summary, in this experiment, the gathering of retrospective usability ratings has helped to demonstrate that they may not serve well as measures of true performance, and could have, if collected as the only dependent measure, have been used to confirm incorrect predictions. On the other hand, if retrospective ratings are to be used to measure an overall impression that a user takes with them from an experience, then the uncorrelated actual data (e.g., the objective time measure) may be less useful in predicting future purchase/use behavior (see F.D.Davis (1989) Perceived Usefulness, Perceived Ease of Use, and User Acceptance of Information Technology. MIS Quarterly, 13:3, 319-340).
Tracking Number: IEA00-178-390 Submitted 1998-06-28 01:06 Computer Systems Techncial Program (Arnold Lund; alund@acm.org)