|
|
Usability Evaluation – Summative Techniques Jenny Le Peuple, 2006, revised by Pen Lister, 2009
Formative evaluation - review Carried out early in design process; Changes cheaper, easier to implement; Should be iterative; Provides informal, usually qualitative indications of usability; Often employs “quick & dirty” techniques; Relatively easy; Results relatively quick to analyse & interpret.
Summative evaluation - overview Carried out in final stages of design process; Usually provides objective, often quantitative measures of usability; Generally employs more “scientific” techniques; Expertise needed to apply; Can be costly and complex to analyse -though not necessarily.
Selecting techniques At the very least, need to establish extent to which design of product meets specific usability goals. Preferably, using representative users May also wish to establish how far the design meets a specific set of design principles (a design “profile”) using experts or other “judges”.
Summative techniques Include: Performance measurement Field trials Focus groups Controlled product tests - classical experimental design Usability reviews/audits, using design profiles
Performance measurement - overview Useful for: Obtaining quantitative data; Comparative testing; Testing against predefined benchmarks (e.g. usability goals connected with efficiency) Ideally, conducted in laboratory Can measure e.g. time taken to complete task no of errors made no of features used/not used etc Can include pre/post test questionnaire or interview to elicit more qualitative data e.g. user’s subjective attitude Performance measurement
Field trials- overview Conducted in users’ environment, so therefore contextual, and evaluator can view system as part of total environment. Methods can include many of the “laboratory” techniques Problems - interruptions difficult to control considered “unscientific” Useful for: Interacting with users Getting ideas for later versions
Focus groups - characteristics Discussion based group interview. Originated during 2nd World War - to test effectiveness of propaganda. Often used nowadays for market research. Features: Comprises people with a particular set of characteristics Moderated (often by the researcher) Tend to be relatively informal Centres around open questions designed to generate dynamic discussion Participants may also be required to complete a questionnaire
Focus groups - when used Early (as part of requirements analysis) to explore possibilities and reveal areas of importance; understand the system users’ domain and language; reveal users’ subconscious motivations. Middle - part of rapid prototyping; formative evaluation. Late - summative evaluation, users’ and or client’s reaction to completed product Can be used throughout the development process:
Focus groups - general procedure 1. Identify appropriate participants, possibly from earlier stakeholder analysis. Sometimes several different groups with different characteristics. Ideally 6 - 8 participants (evidence suggests group size inversely related to degree of participation) 2.Prepare list of appropriate questions/topics dependant on purpose of session.(But moderator must be flexible in light of interesting topics that may arise) 3. Arrange: facilities accommodation catering hardware/software/video equipment etc
Focus groups - during the session Moderator should ensure that the tone is relaxed, open & dynamic. Needs full participation -particular individuals should not dominate. Shy people should be coaxed into speaking. Moderator skill set: someone that participants can identify with & trust skilled in inter-personal communication All raw data should be recorded verbatim, so need note takers and/or audio or video recording.
Focus groups - data analysis Qualitative (relatively subjective). Emphasises meaning over quantification Quantitative (relatively objective): generate numeric values e.g. frequency of certain words, phrases, gestures classification into groups (of words, phrases, gestures) Structural - takes as inputs, qualitative & quantitative data analysis. Can be used to construct representation of people’s belief systems
Focus groups - utility Potentially rich set of data. Naturalistic - ecological & construct validity Scientific? Useful as part of evaluation - argument that usability not being tested, as moderator usually “drives” the product. http://en.wikipedia.org/wiki/Construct_validity http://en.wikipedia.org/wiki/Ecological_validity
Empirical research - experimental design
Experimental design - controlled product tests Follows tradition of classic experimental design Purpose - obtain quantitative data about factors relating to the product and/or the users Advantages: experimenter more in control of variables can be replicated to establish reliability of results Disadvantages: “artificial”? lacks ecological validity
Empirical research - general form
Hypotheses Explicit statements of what experiment sets out to test e.g. (Hypothesis Example a) “it is quicker to select menu items from a radial menu than from a linear menu” example of a one-tailed hypothesis; the direction of change is specified (Hypothesis Example b) “the time taken to select a menu item from a radial menu differs from the time taken to select from a linear menu” example of a two-tailed hypothesis; the direction of change is not specified The ‘Experimental hypothesis’
Hypothesis testing Not possible to prove a hypothesis is true Need to formulate a ‘Null hypothesis’ & find evidence to support its rejection e.g. “there is no relationship between the time taken to select a menu item and the structure of the menu” If the experimental evidence allows rejection of the Null hypothesis, this lends support (but does not prove, per se) to the ‘Experimental hypothesis’
Variables Any factor that can vary (i.e. can take 2 or more values). Independent variables (I.V.) those variables which the experimenter can control, i.e. by manipulating the values Dependent variables (D.V.) those variables which the experimenter has an interest in observing, but which are not controlled by the experimenter Independent variable: the one we play around with – the variable we expect is causal Dependent variable: the one we measure – the one that shows us if there is any effect on it by the changing values of the IV (Harris, 2002, p105)
Variables - examples In controlled product testing, IVs are usually concerned with some feature of the system such as: structure of menu number of menu items size of icons colour of text etc. DVs are usually concerned with some aspect of user behaviour such as: number of errors made time taken to perform task physiological measures (e.g. heart rate) recall etc.
Experimental design Between subjects subjects are randomly assigned to one or other condition e.g. one group of subjects (Group1) tested with linear menu (Condition A) and the other group, comprised of different individuals (Group 2) tested with radial menu (Condition B) Within subjects each subject is tested on both conditions e.g. Subject 1 tested with linear menu (Condition A), and tested with radial menu (Condition B), and so on... Many other designs. Care needed to select appropriate design can become very complex, especially where variables can take > 2 conditions Conditions: conditions of a causal variable, presence (experimental condition) and absence (control condition). (Harris, 2002, p107 >)
Order & experimenter effects Order of presentation can bias results e.g. could be a learning effect if subjects tested on both conditions Desirable to randomise order of experimental material e.g. each subject has to select menu items in different order Experimenter can influence subject’s behaviour by giving verbal instructions that vary across subjects by giving involuntary verbal or physical clues NB Also, check ‘Confounding variable’: variables that have different levels that coincide with the different levels of the IV (Harris, 2002, p108-10)
Experimental results - summary Can be subjected to visual inspection to e.g. spot “outliers”, form an impression of “result” Statistics - example 4.35 4.31
Mean: For a data set, the mean is the sum of the observations divided by the number of observations Mode: the value that occurs the most frequently in a data set Median: the number separating the higher half of a sample, a population, or a probability distribution, from the lower half Standard Deviation: the standard deviation of a statistical population, a data set, or a probability distribution is the square root of its variance Statistics – means, median, mode, standard deviation See also: http://en.wikipedia.org/wiki/Arithmetic_mean http://www.gcseguide.co.uk/standard_deviation.htm
Experimental results analysis/interpretation/conclusions Subjected to appropriate statistical analysis to determine whether results could have occurred by chance Overall need to consider: validity - does the experiment really “measure” what you claim it measures? (important to format adequate operational definitions) reliability - consistency of the experimental effect can be tested via replication of experiment Experimental results can be valid, but not reliable & vice versa (or neither)
Usability audits - aka heuristic evaluation
Usability audits - general form To obtain a measure of the product’s accordance with pre-defined specific design principles/guidelines (a design profile) Method: 1.produce design profile & questionnaire 2. administer questionnaire 3. summarise results 4. analyse results 5. interpret results
Usability audits - design profile 1. Identify appropriate design principles e.g. consistency, transparency etc. (In this context, principles are usually referred to as “dimensions” or “constructs”) 2. “Operationalise” constructs (i.e. formulate checklist that might “measure” adherence to these principles). Essentially a design profile comprising a set of guidelines. 3. Incorporate design profile into questionnaire to measure compliance with design profile. Can incorporate notes to assist with interpretation of questions & aid consistency of replies.
Usability audit - design profile questionnaire, example
Usability audit - questionnaire notes example
Usability audits - method 1.Use 2 “judges”, preferably usability experts (in which case it is referred to as heuristic evaluation) but colleagues would suffice 2. Ask them both to use the application & complete questionnaire 3. Summarise results 4. Analyse results 5. Interpret the results
Usability audit - results summary, example We can see in this particular example (even without doing any complex statistical analysis) that there is a measure of disagreement between the judges. This may indicate problems with the reliability and or validity of the instrument - beyond the scope of this module.
Usability audit - interpretation, example May well be useful for your coursework - maps quite well to the “Five Es”, even though this example uses different constructs/dimensions.
Jakob Nielsen “…Heuristic evaluation - in which you evaluate user interface designs by inspecting them relative to established usability guidelines” http://www.useit.com/alertbox/discount-usability.html http://www.useit.com/papers/heuristic/ http://www.useit.com/papers/heuristic/heuristic_list.html Usability audit – heuristics
Activities READ Chapter 7 “Usability Evaluation: summative techniques” in Le Peuple, J. & Scane, R. (2003). User Interface Design. Crucial. VISIT http://www.infodesign.com.au/usabilityresources/evaluation/conductingusabilityreviews.asp for information & downloads on usability reviews/audits
References & further reading Calder, B.J. (1977) Focus groups and the nature of qualitative marketing research. Journal of Marketing Research, XIV, 353-363. Callahan, J., Hopkins, D., Weiser, M., and Shneiderman, B. (1988) An empirical comparison of pie versus linear menus. Proceedings of CHI'88 Human Factors in Computer Systems. ACM, New York. http://www.donhopkins.com/drupal/node/100 (last viewed 22-11-09) Forsythe, C. Grose, E. & Ratner, J. Eds. (1998) Human Factors and Web Development. Lawrence Erlbaum Associates. Greene, J. and D'Oliveira, M. (1990) Learning to Use Statistical Tests in Psychology. Open University Press. Harris P, (2002), 2nd Ed, Designing and reporting Experiments in Psychology, Open University Press, UK Millward, L. J. (1995) in G. M Breakwell, S. Hammond & C. Fife-Schaw, Eds. Research Methods in Psychology. Sage. Nielsen, J. (1993) Usability Engineering. AP Professional. Nielsen J, AlertBox, useit.com, various(last viewed 22-11-09)
End
Summary: Slides to accompany lecture on summative evaluation techniques in a usability context.
| URL: |
No comments posted yet
Comments